In [37]:
### Indexing rows in a DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3), index = ['a','b','c','d'])

In [38]:
df

Unnamed: 0,0,1,2
a,0.197091,-0.311928,-0.19411
b,0.298253,-1.615327,-0.664313
c,2.675111,-0.903066,0.199791
d,1.225794,2.756029,-1.466165


In [39]:
# Get index b alone
df.ix['b']

0    0.298253
1   -1.615327
2   -0.664313
Name: b, dtype: float64

Pandas objects are equipped with a set of common mathematical and statistical methods. Most of these fall into the category of reducing methods or summary statistics, methods that extract a single value (like sum or mean) from a Series or a series of values from the rows or columns of a DF. They are all built up from ground up to exclude missing data.

### Get the sum
The sum method adds up the elements of a row together cos axis 0 is the default axis.

In [40]:
df.sum()

0    4.396248
1   -0.074292
2   -2.124797
dtype: float64

In [41]:
# If you wanna add over the columns, this is how you do it
df.sum(axis=1)

a   -0.308947
b   -1.981386
c    1.971835
d    2.515657
dtype: float64

In [42]:
df

Unnamed: 0,0,1,2
a,0.197091,-0.311928,-0.19411
b,0.298253,-1.615327,-0.664313
c,2.675111,-0.903066,0.199791
d,1.225794,2.756029,-1.466165


In [43]:
## Wanna see where the minimum value is found?
# Use idxmin()
df.idxmin()

0    a
1    b
2    d
dtype: object

In [45]:
# To get the max, use idxmax()
df.idxmax()

0    c
1    d
2    c
dtype: object

The above methods are used to return indirect statistics. There are other methods for accumulation.

In [46]:
df.cumsum()

Unnamed: 0,0,1,2
a,0.197091,-0.311928,-0.19411
b,0.495344,-1.927255,-0.858423
c,3.170455,-2.830321,-0.658632
d,4.396248,-0.074292,-2.124797


In [23]:
### As you can see, cummulative sum works in this way. x, x1 = x+x+1, x2= x1+x+2, x3 = x2+x+3

The methods that return a single value are called reduction methods. cumsum is an example of accumulation (because it returns accumulated results). There are methods that can neither be categorized as reduction or accumulation. "Describe" is one such example:

In [48]:
df.describe()

Unnamed: 0,0,1,2
count,4.0,4.0,4.0
mean,1.099062,-0.018573,-0.531199
std,1.148163,1.924961,0.716439
min,0.197091,-1.615327,-1.466165
25%,0.272963,-1.081131,-0.864776
50%,0.762023,-0.607497,-0.429211
75%,1.588123,0.455061,-0.095635
max,2.675111,2.756029,0.199791


In [49]:
### On non-numeric data, Describe produces alternate summary stats

s1 = pd.Series(['a','a','b','c'] * 4)

In [50]:
s1

0     a
1     a
2     b
3     c
4     a
5     a
6     b
7     c
8     a
9     a
10    b
11    c
12    a
13    a
14    b
15    c
dtype: object

In [51]:
s1.describe()

count     16
unique     3
top        a
freq       8
dtype: object