## Pandas (Python Data Analysis Library) -- Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index.

In [1]:
import pandas as pd

In [2]:
cars = ['Nissan', 'Ford', 'Toyota']
pd.Series(cars)

0    Nissan
1      Ford
2    Toyota
dtype: object

###### From dictionary:

In [3]:
models = {'Focus': 'Ford', '1.18i': 'Bmw', 'Golf': 'Volkswagen', 'Corolla': 'Toyota'}
s = pd.Series(models)
s

1.18i             Bmw
Corolla        Toyota
Focus            Ford
Golf       Volkswagen
dtype: object

###### Passing data with indexes

In [4]:
s = pd.Series(['Golf', 'Focus', 'Corolla'], index=['Volkswagen', 'Ford', 'Toyota'])
s

Volkswagen       Golf
Ford            Focus
Toyota        Corolla
dtype: object

In [5]:
models = {'Focus': 'Ford', '1.18i': 'Bmw', 'Golf': 'Volkswagen', 'Corolla': 'Toyota'}
s = pd.Series(models, index=['1.18i', 'Corolla', 'Focus'])
s

1.18i         Bmw
Corolla    Toyota
Focus        Ford
dtype: object

In [6]:
s = pd.Series(models, index=['1.18i', 'Corolla', 'Focus', 'Golf'])
s

1.18i             Bmw
Corolla        Toyota
Focus            Ford
Golf       Volkswagen
dtype: object

### Querying a Series

In [7]:
models = {'Focus': 'Ford', '1.18i': 'Bmw', 'Golf': 'Volkswagen', 'Corolla': 'Toyota'}
s = pd.Series(models)
s

1.18i             Bmw
Corolla        Toyota
Focus            Ford
Golf       Volkswagen
dtype: object

In [8]:
#Select row by integer location
s.iloc[2]

'Ford'

In [9]:
#Select row by label
s.loc['Golf']

'Volkswagen'

#### Performance Tricks

In [10]:
import numpy as np

#this creates a big series of random numbers
s = pd.Series(np.random.randint(0,1000,10000))
s.head()

0    374
1    620
2    659
3    739
4    111
dtype: int32

In [11]:
len(s)

10000

In [12]:
%%timeit -n 100
summary = 0
for item in s:
    summary+=item

100 loops, best of 3: 1.37 ms per loop


Vectorization works with most of the functions in the NumPy library, including the sum function.
After vectorization :

In [13]:
%%timeit -n 100
summary = np.sum(s)

100 loops, best of 3: 121 µs per loop


Another example is:

In [14]:
%%timeit -n 10
s = pd.Series(np.random.randint(0,1000,10000))
for label, value in s.iteritems():
    s.loc[label]= value+2

10 loops, best of 3: 931 ms per loop


In [15]:
%%timeit -n 10
s = pd.Series(np.random.randint(0,1000,10000))
s+=2

10 loops, best of 3: 315 µs per loop
