# Series

In [3]:
import pandas as pd

In [2]:
students = ['Alan', 'Molly', 'Jack']
pd.Series(students)

0     Alan
1    Molly
2     Jack
dtype: object

In [3]:
numbers = [1, 2, 3]
pd.Series(numbers)

0    1
1    2
2    3
dtype: int64

#### Missing Data -  
There's some other typing details that exist for performance that are important to know. 
The most important is how Numpy and thus pandas handle missing data. 

In Python, we have the none type to indicate a lack of data. But what do we do if we want 
to have a typed list like we do in the series object?

Underneath, pandas does some type conversion. If we create a list of strings and we have 
one element, a None type, pandas inserts it as a None and uses the type object for the 
underlying array. 

In [4]:
students = ['Alan', 'Molly', None]
pd.Series(students)

0     Alan
1    Molly
2     None
dtype: object

However, if we create a list of numbers, integers or floats, and put in the None type, pandas automatically converts this to a special floating point value designated as NaN, which stands for "Not a Number".

In [5]:
numbers = [1, 2, None]
pd.Series(numbers)

0    1.0
1    2.0
2    NaN
dtype: float64

You'll notice a couple of things. First, NaN is a different value. Second, pandas
set the dtype of this series to floating point numbers instead of object or ints. That's
maybe a bit of a surprise - why not just leave this as an integer? Underneath, pandas
represents NaN as a floating point number, and because integers can be typecast to
floats, pandas went and converted our integers to floats. So when you're wondering why the
list of integers you put into a Series is not floats, it's probably because there is some
missing data.

For those who might not have done scientific computing in Python before, it is important o stress that None and NaN might be being used by the data scientist in the same way, to denote missing data, but that underneath these are not represented by pandas in the same way.  
NaN is *NOT* equivilent to None and when we try the equality test, the result is False.

In [1]:
# Lets bring in numpy which allows us to generate an NaN value
import numpy as np
# And lets compare it to None
np.nan == None

False

In [7]:
# It turns out that you actually can't do an equality test of NAN to itself. 
# When you do, the answer is always False. 

np.nan == np.nan

False

In [6]:
# instead we can use isnan() -
np.isnan(np.nan)

True

In [14]:
# np.isnan(None)  #TypeError
# np.isnan(NULL)  #NameError - name 'NULL' is not defined
# np.isnan(NaN)   #NameError - name 'NaN' is not defined
pd.isnull(np.nan)

True

In [10]:
pd.isnull(None)

True

In [16]:
# pd.isnull(NULL)  #NameError - name 'NULL' is not defined
# pd.isnull(NaN)   #NameError - name 'NaN' is not defined

So keep in mind when you see NaN, it's meaning is similar to None, but it's a numeric value and treated differently for efficiency reasons.

In [12]:
students_subjects = {'Alice': 'Physics', 'Jack': 'Chemistry', 'Molly': 'English'}
s = pd.Series(students_subjects)
s

Alice      Physics
Jack     Chemistry
Molly      English
dtype: object

In [13]:
s.index

Index(['Alice', 'Jack', 'Molly'], dtype='object')

In [14]:
students = [('Alice', 'Brown'),('Molly', 'Green'),('Jack', 'Blue')]
pd.Series(students)

0    (Alice, Brown)
1    (Molly, Green)
2      (Jack, Blue)
dtype: object

In [15]:
s = pd.Series(['Physics', 'English', 'Maths'], 
              index = ['Alan', 'Molly', 'Jack'])
s

Alan     Physics
Molly    English
Jack       Maths
dtype: object

In [16]:
students_subjects

{'Alice': 'Physics', 'Jack': 'Chemistry', 'Molly': 'English'}

In [17]:
s = pd.Series(students_subjects, index=['Alice', 'Molly', 'Sam'])
s

Alice    Physics
Molly    English
Sam          NaN
dtype: object