In [1]:
import pandas as pd

In [2]:
students = ['Alice', 'Jack', 'Molly']
pd.Series(students)

0    Alice
1     Jack
2    Molly
dtype: object

The result is a Series object which is nicely rendered to the screen. We see here that the pandas has automatically identified the type of data in this Series as "object" and set the dytpe parameter as appropriate. We see that the values are indexed with integers,starting at zero

In [3]:
# We don't have to use strings. If we passed in a list of whole numbers, for instance, 
# we could see that panda sets the type to int64. Underneath panda stores series values in a 
# typed array using the Numpy library. This offers significant speedup when processing data 
# versus traditional python lists.

# Let's create a little list of numbers
numbers = [1, 2, 3]
# And turn that into a series
pd.Series(numbers)

0    1
1    2
2    3
dtype: int64

And we see on my architecture that the result is a dtype of int64 objects

There's some other typing details that exist for performance that are important to know. The most important is how Numpy and thus pandas handle missing data. 

In Python, we have the none type to indicate a lack of data. But what do we do if we want to have a typed list like we do in the series object?

In [4]:
students = ['Alice', 'Jack', None]
pd.Series(students)

0    Alice
1     Jack
2     None
dtype: object

In [5]:
# However, if we create a list of numbers, integers or floats, and put in the None type,
# pandas automatically converts this to a special floating point value designated as NaN, 
# which stands for "Not a Number".

# So let's create a list with a None value in it
numbers = [1, 2, None]
# And turn that into a series
pd.Series(numbers)

0    1.0
1    2.0
2    NaN
dtype: float64

A couple things can be seen here.

First, NaN is a different value. 

Second, pandas set the dytpe of this series to floating point numbers instead of object or ints. That's
maybe a bit of a surprise - why not just leave this as an integer? 

Because, underneath, pandas represents NaN as a floating point number, and because integers can be typecast to floats, pandas went and converted our integers to floats. 

In [6]:
# NaN is *NOT* equivilent to None and when we try the equality test, the result is False.
import numpy as np
np.nan == None

False

In [8]:
# It turns out that you actually can't do an equality test of NAN to itself. When you do, 
# the answer is always False. 

np.nan == np.nan

False

Instead, special functions need to be used to test for the presence of not a number, such as the Numpy library isnan().

In [9]:
np.isnan(np.nan)

True

In [10]:
# Here's an example using some data of students and their classes.

students_scores = {'Alice': 'Physics',
                   'Jack': 'Chemistry',
                   'Molly': 'English'}
s = pd.Series(students_scores)
s

Alice      Physics
Jack     Chemistry
Molly      English
dtype: object

We see that, since it was string data, pandas set the data type of the series to "object".
We see that the index, the first column, is also a list of strings.

In [11]:
s.index
#the index attribute is used to get the index object.

Index(['Alice', 'Jack', 'Molly'], dtype='object')

Now, this is kind of interesting. The dtype of object is not just for strings, but for arbitrary objects. Lets create a more complex type of data, say, a list of tuples.

In [12]:
students = [("Alice","Brown"), ("Jack", "White"), ("Molly", "Green")]
pd.Series(students)

0    (Alice, Brown)
1     (Jack, White)
2    (Molly, Green)
dtype: object

We see that each of the tuples is stored in the series object, and the type is object.

In [13]:
# You can also separate your index creation from the data by passing in the index as a 
# list explicitly to the series.

s = pd.Series(['Physics', 'Chemistry', 'English'], index=['Alice', 'Jack', 'Molly'])
s

Alice      Physics
Jack     Chemistry
Molly      English
dtype: object

So what happens if your list of values in the index object are not aligned with the keys in your dictionary for creating the series?

So it will ignore from the dictionary all keys which are not in your index, and pandas will add None or NaN type values for any index value you provide, which is not in your dictionary key list.


In [14]:
# Here's and example. I'll pass in a dictionary of three items, in this case students and
# their courses
students_scores = {'Alice': 'Physics',
                   'Jack': 'Chemistry',
                   'Molly': 'English'}
# When I create the series object though I'll only ask for an index with three students, and
# I'll exclude Jack
s = pd.Series(students_scores, index=['Alice', 'Molly', 'Sam'])
s

Alice    Physics
Molly    English
Sam          NaN
dtype: object