In [2]:
import pandas as pd
import numpy as np

### Pandas Series

Series is an ordered sequence of elements, support by numpy array that making it very fast to compute, and it also has index so we can return value by refer to that index.

We'll start analyzing "The group of Seven". Which is a political formed by Canda, France, Germany, Italy, Japan, UK, & USA. We'll start by analyzing population, and for that, we'll use a ```pandas.Series``` object.

In [19]:
g7_pop = pd.Series([35.467, 63.951, 80.940, 60.665, 127.061, 64.511, 318.523])
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
dtype: float64

In [20]:
g7_pop.name = 'G7 Population in millions'
g7_pop

0     35.467
1     63.951
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: G7 Population in millions, dtype: float64

In [21]:
g7_pop.index

RangeIndex(start=0, stop=7, step=1)

In [22]:
g7_pop.values

array([ 35.467,  63.951,  80.94 ,  60.665, 127.061,  64.511, 318.523])

In [23]:
type(g7_pop.values)

numpy.ndarray

In [24]:
g7_pop[0]

35.467

In [25]:
g7_pop[1]

63.951

In [26]:
# rename the index
g7_pop.index = [
    'Canada',
    'France',
    'Germany',
    'Italy',
    'Japan',
    'UK',
    'US',
]
g7_pop

Canada      35.467
France      63.951
Germany     80.940
Italy       60.665
Japan      127.061
UK          64.511
US         318.523
Name: G7 Population in millions, dtype: float64

In [28]:
# create pandas series & defined the index name from the beginning
pd.Series({
    'Canada': 35.467,
    'France': 63.951,
    'Germany': 80.940,
    'Italy': 60.665,
    'Japan': 127.061,
    'UK': 64.511,
    'US': 318.523
}, name = 'G7 Population in millions')

Canada      35.467
France      63.951
Germany     80.940
Italy       60.665
Japan      127.061
UK          64.511
US         318.523
Name: G7 Population in millions, dtype: float64

In [31]:
# another way of creating pandas series
pd.Series(
    [35.467, 63.951, 80.940, 60.665, 127.061, 64.511, 318.523],
    index  = ['Canada', 'France', 'Germany', 'Italy', 'Japan', 'UK', 'US'],
    name = 'G7 Population in millions'
)

Canada      35.467
France      63.951
Germany     80.940
Italy       60.665
Japan      127.061
UK          64.511
US         318.523
Name: G7 Population in millions, dtype: float64

In [32]:
# create pandas series out of other series, specifying indexes:
pd.Series(g7_pop, index=['France', 'Germany', 'Italy', 'Spain'])    

France     63.951
Germany    80.940
Italy      60.665
Spain         NaN
Name: G7 Population in millions, dtype: float64

### Indexing

Indexing works similarly to lists & dictionaries, you use the **index** of the element you're looking for:

In [34]:
g7_pop['Canada']

35.467

In [35]:
g7_pop['Japan']

127.061

Can use numeric position using ```iloc``` attribute:

In [37]:
g7_pop.iloc[0]

35.467

In [39]:
g7_pop.iloc[-1]

318.523

In [40]:
# selecting multiple elements at once
g7_pop[['Italy', 'France']]

Italy     60.665
France    63.951
Name: G7 Population in millions, dtype: float64

In [41]:
g7_pop.iloc[[0, 1]]

Canada    35.467
France    63.951
Name: G7 Population in millions, dtype: float64

In [43]:
# slicing in pandas series, the upper limnit (last parameter) is also included
g7_pop['Canada':'Italy']

Canada     35.467
France     63.951
Germany    80.940
Italy      60.665
Name: G7 Population in millions, dtype: float64

### Conditional Selection (boolean arrays)

The same boolean array techniques that applied to numpy arrays can be used for pandas series

In [45]:
g7_pop > 7

Canada     True
France     True
Germany    True
Italy      True
Japan      True
UK         True
US         True
Name: G7 Population in millions, dtype: bool

In [44]:
g7_pop[g7_pop > 7]

Canada      35.467
France      63.951
Germany     80.940
Italy       60.665
Japan      127.061
UK          64.511
US         318.523
Name: G7 Population in millions, dtype: float64

In [46]:
g7_pop.mean()

107.30257142857144

In [48]:
g7_pop[g7_pop > g7_pop.mean()]

Japan    127.061
US       318.523
Name: G7 Population in millions, dtype: float64

In [50]:
g7_pop[(g7_pop > 80) | (g7_pop < 40)]

Canada      35.467
Germany     80.940
Japan      127.061
US         318.523
Name: G7 Population in millions, dtype: float64

### Modifying Series

In [53]:
g7_pop['Canada'] = 40.5
g7_pop

Canada      40.500
France      63.951
Germany     80.940
Italy       60.665
Japan      127.061
UK          64.511
US         318.523
Name: G7 Population in millions, dtype: float64

In [55]:
g7_pop.iloc[-1] = 500
g7_pop

Canada      40.500
France      63.951
Germany     80.940
Italy       60.665
Japan      127.061
UK          64.511
US         500.000
Name: G7 Population in millions, dtype: float64

In [56]:
g7_pop[g7_pop < 70] = 99.99
g7_pop

Canada      99.990
France      99.990
Germany     80.940
Italy       99.990
Japan      127.061
UK          99.990
US         500.000
Name: G7 Population in millions, dtype: float64