# Operations
## Stats
Operations in general *exclude* missing data.

Performing a descriptive statistic:

In [8]:
import pandas as pd
import numpy as np
dates = pd.date_range('20190101',periods=10)
df = pd.DataFrame(np.random.randn(10,4), 
                  index=dates, 
                  columns=list('ABCD'))

df = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])
df.loc[dates[0]:dates[1], 'E'] = 1

In [9]:
df.head()

Unnamed: 0,A,B,C,D,E
2019-01-01,-0.914745,-1.169413,0.445567,-1.612202,1.0
2019-01-02,1.38235,1.083761,-1.132513,-0.282442,1.0
2019-01-03,-1.022387,2.451375,1.505668,0.344774,
2019-01-04,0.907028,-0.532535,-0.730059,-0.608755,


In [10]:
df.mean()

A    0.088062
B    0.458297
C    0.022166
D   -0.539656
E    1.000000
dtype: float64

Same operation on the other axis:

In [12]:
df.mean(axis=1)

2019-01-01   -0.450158
2019-01-02    0.410231
2019-01-03    0.819858
2019-01-04   -0.241081
Freq: D, dtype: float64

# Apply
Applying functions to the data:

In [13]:
df.apply(np.cumsum)

Unnamed: 0,A,B,C,D,E
2019-01-01,-0.914745,-1.169413,0.445567,-1.612202,1.0
2019-01-02,0.467606,-0.085651,-0.686946,-1.894644,2.0
2019-01-03,-0.554781,2.365724,0.818722,-1.54987,
2019-01-04,0.352247,1.833188,0.088663,-2.158625,


In [14]:
df.apply(lambda x: x.max() - x.min())

A    2.404737
B    3.620788
C    2.638181
D    1.956975
E    0.000000
dtype: float64

# Histogramming

In [15]:
s = pd.Series(np.random.randint(0, 7, size=10))

In [16]:
s

0    4
1    0
2    5
3    5
4    3
5    2
6    2
7    5
8    4
9    2
dtype: int64

In [22]:
s.value_counts()

5    3
2    3
4    2
3    1
0    1
dtype: int64

# String Methods
Series is equipped with a set of string processing methods in the str attribute that make it easy to operate on each element of the array, as in the code snippet below. Note that pattern-matching in str generally uses regular expressions by default.

In [25]:
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])

In [26]:
s

0       A
1       B
2       C
3    Aaba
4    Baca
5     NaN
6    CABA
7     dog
8     cat
dtype: object

In [28]:
s.str.title()


0       A
1       B
2       C
3    Aaba
4    Baca
5     NaN
6    Caba
7     Dog
8     Cat
dtype: object