In [3]:
import numpy as np
import pandas as pd
import matplotlib as mpl

## Introduction to pandas and series

In [10]:
liststart = np.arange(0.25,1.1,0.25)
data      = pd.Series( liststart )
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

In [17]:
print('values: ',data.values,'\t\t\t (type: ', type(data.values),')',
      '\nindices: ',data.index,'\t (type: ', type(data.values),')')

values:  [0.25 0.5  0.75 1.  ] 			 (type:  <class 'numpy.ndarray'> ) 
indices:  RangeIndex(start=0, stop=4, step=1) 	 (type:  <class 'numpy.ndarray'> )


Like with a NumPy array, data can be accessed by the associated index via the familiar Python square-bracket notation

In [21]:
print(data[1],' \t\t type: ',type(data[1]))

0.5  		 type:  <class 'numpy.float64'>


In [22]:
print(data[1:3],' \t\t type: ',type(data[1:3]))

1    0.50
2    0.75
dtype: float64  		 type:  <class 'pandas.core.series.Series'>


### Series as generalized NumPy array

**The essential difference is the presence of the index**: while the Numpy Array has an implicitly defined integer index used to access the values, the Pandas Series has an explicitly defined index associated with the values.

In [27]:
data = pd.Series(np.arange(0.25,1.1,0.25),
                index=['a','b','c',99]
                )
data

a     0.25
b     0.50
c     0.75
99    1.00
dtype: float64

Note different access via [] between using extrinsic (b, 99) and intrinsic (slicing :3)

In [34]:
print(data['b'],'\n\n',data[:3],'\n\n',data[99])

0.5 

 a    0.25
b    0.50
c    0.75
dtype: float64 

 1.0


This typing is important: just as the type-specific compiled code behind a NumPy array makes it more efficient than a Python list for certain operations, the type information of a Pandas Series makes it much more efficient than Python dictionaries for certain operations.

In [35]:
population_dict = {'California': 38332521,
                   'Texas': 26448193,
                   'New York': 19651127,
                   'Florida': 19552860,
                   'Illinois': 12882135}
population = pd.Series(population_dict)
population

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64

By default, a Series will be created where the index is drawn from the sorted keys. From here, typical dictionary-style item access can be performed:

In [36]:
population['California']

38332521

Unlike a dictionary, though, the Series also supports array-style operations such as slicing:

In [37]:
population['California':'Illinois']

California    38332521
Texas         26448193
New York      19651127
Florida       19552860
Illinois      12882135
dtype: int64

### Constructing Series objects
We've already seen a few ways of constructing a Pandas Series from scratch; all of them are some version of the following:

In [38]:
type(data)

pandas.core.series.Series

For example, data can be a list or NumPy array, in which case index defaults to an integer sequence:

In [39]:
pd.Series([2,4,6])

0    2
1    4
2    6
dtype: int64

alternatively data could be a scalar:

In [42]:
pd.Series( 5, np.arange(100,301,100,dtype=int ) )

100    5
200    5
300    5
dtype: int64

data can be a dictionary, in which index defaults to the sorted dictionary keys:

In [43]:
pd.Series({2:'a', 1:'b', 3:'c'})

2    a
1    b
3    c
dtype: object

In each case, the index can be explicitly set if a different result is preferred
(Notice that in this case, the Series is populated only with the explicitly identified keys)

In [44]:
pd.Series( {2:'a', 1:'b', 3:'c'}, index=[3,2] )

3    c
2    a
dtype: object