# Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).

In [5]:
import numpy as np
import pandas as pd

## Creation

### From numpy ndarray

In [61]:
pd.Series(np.ones(4))

0    1.0
1    1.0
2    1.0
3    1.0
dtype: float64

In [62]:
pd.Series(np.random.randn(4))

0    1.268570
1    0.149908
2    0.400745
3   -0.454851
dtype: float64

### From dict

In [73]:
pd.Series({'a' : 0., 'b' : 1., 'c' : 2.})

a    0.0
b    1.0
c    2.0
dtype: float64

In [20]:
s.dtype

dtype('float64')

In [23]:
type(s.values)

numpy.ndarray

## Indexing Series

The axis labels are collectively referred to as the index (without unicity constraint).

In [81]:
s = pd.Series([1,3,5,np.nan], index=['a', 'b', 'c', 'a']) ;s

a    1.0
b    3.0
c    5.0
a    NaN
dtype: float64

In [76]:
s.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [78]:
pd.Series([1,2,3,4,5,6], index=pd.date_range('20170102', periods=6)).shift(2)

2017-01-02    NaN
2017-01-03    NaN
2017-01-04    1.0
2017-01-05    2.0
2017-01-06    3.0
2017-01-07    4.0
Freq: D, dtype: float64

In [83]:
print(s.a)
print(s['a'])

a    1.0
a    NaN
dtype: float64
a    1.0
a    NaN
dtype: float64


In [85]:
s + s

a     2.0
b     6.0
c    10.0
a     NaN
dtype: float64

In [88]:
s ** 2

a     1.0
b     9.0
c    25.0
a     NaN
dtype: float64

# DataFrame

## Basic operation

In [98]:
df = pd.DataFrame({'val': s, 
                   'squared_val': s ** 2}) ;df

Unnamed: 0,squared_val,val
a,1.0,1.0
b,3.0,3.0
c,5.0,5.0
a,,


In [161]:
df = pd.DataFrame({'AAA' : [4,5,6,7], 
                   'BBB' : [10,20,30,40],
                   'CCC' : [100,50,-30,-50]},
                 index=['a', 'b', 'c', 'd'])
df

Unnamed: 0,AAA,BBB,CCC
a,4,10,100
b,5,20,50
c,6,30,-30
d,7,40,-50


## Indexing

### .loc 

.loc is primarily label based, but may also be used with a boolean array. .loc will raise KeyError when the items are not found. Allowed inputs are:

- A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index. This use is not an integer position along the index)

- A list or array of labels ['a', 'b', 'c']

- A slice object with labels 'a':'f', (note that contrary to usual python slices, both the start and the stop are included!)

- A boolean array

- A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)

In [102]:
df.loc['a']

AAA      4
BBB     10
CCC    100
Name: a, dtype: int64

In [104]:
type(df.loc['a'])

pandas.core.series.Series

#### TODO: Try the other indexing methods

### .iloc

.iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with python/numpy slice semantics). Allowed inputs are:

- An integer e.g. 5

- A list or array of integers [4, 3, 0]

- A slice object with ints 1:7

- A boolean array

- A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)

In [107]:
df.iloc[0]

AAA      4
BBB     10
CCC    100
Name: a, dtype: int64

#### TODO: Try the other indexing methods

### .ix

.ix supports mixed integer and label based access. It is primarily label based, but will fall back to integer positional access unless the corresponding axis is of integer type. .ix is the most general and will support any of the inputs in .loc and .iloc. .ix also supports floating point label schemes. .ix is exceptionally useful when dealing with mixed positional and label based hierarchical indexes.

However, when an axis is integer based, ONLY label based access and not positional access is supported. Thus, in such cases, it’s usually better to be explicit and use .iloc or .loc.

In [169]:
df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50


In [173]:
df_mask = pd.DataFrame({'AAA' : [True] * 4, 'BBB' : [False] * 4,'CCC' : [True,False] * 2}) ;df_mask

Unnamed: 0,AAA,BBB,CCC
0,True,False,True
1,True,False,False
2,True,False,True
3,True,False,False


In [174]:
df.where(df_mask, 1000)

Unnamed: 0,AAA,BBB,CCC,logic
0,4,1000,100,1000
1,5,1000,1000,1000
2,6,1000,-30,1000
3,7,1000,1000,1000


In [175]:
df['logic'] = np.where(df['AAA'] > 5,'high','low'); df

Unnamed: 0,AAA,BBB,CCC,logic
0,4,10,100,low
1,5,20,50,low
2,6,30,-30,high
3,7,40,-50,high


In [177]:
df.ix[df.AAA > 5,['BBB','CCC']] = 555; df

Unnamed: 0,AAA,BBB,CCC,logic
0,4,10,100,low
1,5,555,555,low
2,6,555,555,high
3,7,555,555,high


In [141]:
df

Unnamed: 0,AAA,BBB,CCC,logic
a,4,10,100,low
b,5,20,50,low
c,6,30,-30,high
d,7,40,-50,high


## Multiindexing

In [218]:
df = pd.DataFrame({'One_X' : [1, 3, 2],
                   'One_Y' : [1, 2, 2],
                   'Two_X' : [1.11,1.12,1.11],
                   'Two_Y' : [1.22,1.22,1.22]}); df

Unnamed: 0,One_X,One_Y,Two_X,Two_Y
0,1,1,1.11,1.22
1,3,2,1.12,1.22
2,2,2,1.11,1.22


In [219]:
multiindex_columns = [tuple(c.split('_')) for c in df.columns] ;multiindex_columns

[('One', 'X'), ('One', 'Y'), ('Two', 'X'), ('Two', 'Y')]

In [220]:
df.columns = pd.MultiIndex.from_tuples(multiindex_columns);df

Unnamed: 0_level_0,One,One,Two,Two
Unnamed: 0_level_1,X,Y,X,Y
0,1,1,1.11,1.22
1,3,2,1.12,1.22
2,2,2,1.11,1.22


In [221]:
df.One.X

0    1
1    3
2    2
Name: X, dtype: int64

In [222]:
[c for c in df.One.columns]

['X', 'Y']

## Sorting

In [223]:
df.One.sort_values(['Y', 'X'])

Unnamed: 0,X,Y
0,1,1
2,2,2
1,3,2


In [224]:
df.sort_values([('One', 'Y'), ('Two', 'X')])

Unnamed: 0_level_0,One,One,Two,Two
Unnamed: 0_level_1,X,Y,X,Y
0,1,1,1.11,1.22
2,2,2,1.11,1.22
1,3,2,1.12,1.22


## Grouping 

In [225]:
df = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(),
                   'size': list('SSMMMLL'),
                   'weight': [8, 10, 11, 1, 20, 12, 12],
                   'adult' : [False] * 5 + [True] * 2}); df

Unnamed: 0,adult,animal,size,weight
0,False,cat,S,8
1,False,dog,S,10
2,False,cat,M,11
3,False,fish,M,1
4,False,dog,M,20
5,True,cat,L,12
6,True,cat,L,12


In [231]:
df.groupby('animal').get_group('cat')

Unnamed: 0,adult,animal,size,weight
0,False,cat,S,8
2,False,cat,M,11
5,True,cat,L,12
6,True,cat,L,12


In [232]:
df.groupby('animal').apply(lambda sub_df: sub_df['size'][sub_df['weight'].idxmax()])

animal
cat     L
dog     M
fish    M
dtype: object

In [252]:
df['total_weight'] = df[['animal', 'weight']].groupby('animal').transform(sum) ;df

Unnamed: 0,adult,animal,size,weight,Total_wight,total_weight
0,False,cat,S,8,43,43
1,False,dog,S,10,30,30
2,False,cat,M,11,43,43
3,False,fish,M,1,1,1
4,False,dog,M,20,30,30
5,True,cat,L,12,43,43
6,True,cat,L,12,43,43


### TODO: add mean weight column

In [253]:
df['mean_weight'] = ...

Unnamed: 0,adult,animal,size,weight,Total_wight,total_weight,mean_weight
0,False,cat,S,8,43,43,10.75
1,False,dog,S,10,30,30,15.0
2,False,cat,M,11,43,43,10.75
3,False,fish,M,1,1,1,1.0
4,False,dog,M,20,30,30,15.0
5,True,cat,L,12,43,43,10.75
6,True,cat,L,12,43,43,10.75


## Merge

In [265]:
df = pd.DataFrame(data={'country_code' : ['FRA'] * 5 + ['GER'] * 2,
                        'Bins' : [110] * 2 + [160] * 3 + [40] * 2,
                        'Test_0' : [0, 1, 0, 1, 2, 0, 1],
                        'Data' : np.random.randn(7)});df

Unnamed: 0,Bins,Data,Test_0,country_code
0,110,-0.275874,0,FRA
1,110,0.094987,1,FRA
2,160,-0.736642,0,FRA
3,160,-1.988,1,FRA
4,160,0.49237,2,FRA
5,40,-1.129882,0,GER
6,40,0.647245,1,GER


In [266]:
df['Test_1'] = df['Test_0'] - 1 ;df

Unnamed: 0,Bins,Data,Test_0,country_code,Test_1
0,110,-0.275874,0,FRA,-1
1,110,0.094987,1,FRA,0
2,160,-0.736642,0,FRA,-1
3,160,-1.988,1,FRA,0
4,160,0.49237,2,FRA,1
5,40,-1.129882,0,GER,-1
6,40,0.647245,1,GER,0


### TODO: Create country table

In [271]:
country_table = ...

In [272]:
df.merge(country_table, on='country_code')

Unnamed: 0,Bins,Data,Test_0,country_code,Test_1,country_name
0,110,-0.275874,0,FRA,-1,France
1,110,0.094987,1,FRA,0,France
2,160,-0.736642,0,FRA,-1,France
3,160,-1.988,1,FRA,0,France
4,160,0.49237,2,FRA,1,France
5,40,-1.129882,0,GER,-1,Germany
6,40,0.647245,1,GER,0,Germany
