# Pandas

# Pandas Introduction
    Pandas is named after panel data (an econometric term) and Python data analysis, and is a popular open source Python project.
    pandas will be a major tool of interest throughout much of the rest of the book. It contains data structures and data manipulation tools designed to make data cleaning and analysis fast and easy in Python. pandas is often used in tandem with numerical computing tools like NumPy and SciPy, analytical libraries like statsmodels and scikit-learn, and data visualization libraries like matplotlib.
    

# Introduction to Pandas Data Structures
     Pandas has two popular Data structures : Series and DataFrame.They provide a solid,easy-to-use basis for most applications.
     * Series
     * DataFrame

In [2]:
#import pandas and its data structures
import pandas as pd
from pandas import Series, DataFrame

## Series
    A Series is a one-dimensional array-like object containing a sequence of values (of similar types to NumPy types) and an associated array of data labels, called its index. The simplest Series is formed from only an array of data:

In [9]:
ser = pd.Series([4,1,2,4,-3])
ser

0    4
1    1
2    2
3    4
4   -3
dtype: int64

In [3]:
ser = pd.Series([1,2,3,4,4,1,2,4,-3])
ser

0    1
1    2
2    3
3    4
4    4
5    1
6    2
7    4
8   -3
dtype: int64

    Left hand side represents indexes and right hand side represents values,
    we haven't specify index yet so here is the defauld index starting from 0 to  N-1 where N is the length of Data.

In [10]:
ser.values

array([ 4,  1,  2,  4, -3], dtype=int64)

In [11]:
ser.index

RangeIndex(start=0, stop=5, step=1)

In [16]:
#initialize Data with indexes of desire
obj = pd.Series([1,2,3,4,5], index = ['a','b','c','d','e'])
print(obj)
obj.index

a    1
b    2
c    3
d    4
e    5
dtype: int64


Index([u'a', u'b', u'c', u'd', u'e'], dtype='object')

In [19]:
obj['e'] #selecting a particular index value

5

In [22]:
obj['r']=8 # add new element in  data 
obj

a    1
b    2
c    3
d    4
e    5
r    8
dtype: int64

In [24]:
obj['d']=10 #change value to a specific index
obj

a     1
b     2
c     3
d    10
e     5
r     8
dtype: int64

In [27]:
obj['a':'d'] #specify range 

a     1
b     2
c     3
d    10
dtype: int64

In [32]:
obj[['r','b','e']] # particular indices

r    8
b    2
e    5
dtype: int64

In [34]:
obj[obj>1] # select from obj greater than 1

b     2
c     3
d    10
e     5
r     8
dtype: int64

In [36]:
obj * 2 # multiple with 2 each element 

a     2
b     4
c     6
d    20
e    10
r    16
dtype: int64

In [6]:
pd.Series([11,21,32,41,52], index = ['a','b','c','d','e']) * 10 # multiple with 2 each element 

a    110
b    210
c    320
d    410
e    520
dtype: int64

In [42]:
import numpy as np 
np.exp(obj) # e ^ obj.values

a        2.718282
b        7.389056
c       20.085537
d    22026.465795
e      148.413159
r     2980.957987
dtype: float64

In [50]:

# Series is a fixed-length ordered dict, as it is mapping of index values to
# data values it can be used in contexts where you might use a dict:
print(obj)
print('b' in obj)  


a     1
b     2
c     3
d    10
e     5
r     8
dtype: int64
True


In [55]:
# you can create a Series from it by passing the dict:
sdata = {'a':100 , 'b':200 , 'c':300 , 'd':400 , 'e':500}
sdata

{'a': 100, 'b': 200, 'c': 300, 'd': 400, 'e': 500}

In [56]:
obj3 = pd.Series(sdata)
obj3

a    100
b    200
c    300
d    400
e    500
dtype: int64

In [59]:
indexes = ['q','a','e','d','t']

In [60]:
obj4 = pd.Series(sdata,index=indexes)
obj4

q      NaN
a    100.0
e    500.0
d    400.0
t      NaN
dtype: float64

     Since only three values found against indexes in obj3 , and places in appropriate location,but since no value for 'q' and 't' was found, it appears as NaN (not a number), which is considered in pandas to mark missing or NA values. 

In [63]:
pd.isnull(obj4) # Return true for indexes whose value is NA

q     True
a    False
e    False
d    False
t     True
dtype: bool

In [1]:
pd.notnull(obj4) # Return true for indexes whose values is Available

NameError: name 'pd' is not defined

In [2]:
obj4.notnull() # series also has these as instances methods

NameError: name 'obj4' is not defined

In [70]:
obj

a     1
b     2
c     3
d    10
e     5
r     8
dtype: int64

In [71]:
obj3

a    100
b    200
c    300
d    400
e    500
dtype: int64

In [74]:
obj + obj3

a    101.0
b    202.0
c    303.0
d    410.0
e    505.0
r      NaN
dtype: float64

In [76]:
 states = ['California', 'Ohio', 'Oregon', 'Texas']

In [77]:
pop = [35000,16000,20000,10000,]

In [79]:
obj1 = pd.Series(pop,states)
obj1

California    35000
Ohio          16000
Oregon        20000
Texas         10000
dtype: int64

In [80]:
obj1.name = 'population'

In [82]:
obj1.index.name = 'state'

In [84]:
obj1

state
California    35000
Ohio          16000
Oregon        20000
Texas         10000
Name: population, dtype: int64

## DataFrame
    A DataFrame represents a rectangular table of data and contains an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.). The DataFrame has both a row and column index; it can be thought of as a dict of Series all sharing the same index. Under the hood, the data is stored as one or more two-dimensional blocks rather than a list, dict, or some other collection of one-dimensional arrays. 

In [95]:
#We can construct dataFrame in many ways,most common are using dict
#of equal length or numpy arrays.
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
        'year': [2000, 2001, 2002, 2001, 2002, 2003],
        'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
# we have declared a dict( containing 3 keys 3 values(which is a list))
frame = pd.DataFrame(data)  #convert Dict into DataFrame
frame

Unnamed: 0,pop,state,year
0,1.5,Ohio,2000
1,1.7,Ohio,2001
2,3.6,Ohio,2002
3,2.4,Nevada,2001
4,2.9,Nevada,2002
5,3.2,Nevada,2003


In [97]:
frame.head()
# if you have a large data, head method is used to display only first 5 rows.

Unnamed: 0,pop,state,year
0,1.5,Ohio,2000
1,1.7,Ohio,2001
2,3.6,Ohio,2002
3,2.4,Nevada,2001
4,2.9,Nevada,2002


In [98]:
pd.DataFrame(data, columns = ['year','pop','state']) 
#rearrange columns of data

Unnamed: 0,year,pop,state
0,2000,1.5,Ohio
1,2001,1.7,Ohio
2,2002,3.6,Ohio
3,2001,2.4,Nevada
4,2002,2.9,Nevada
5,2003,3.2,Nevada


In [102]:
frame2 = pd.DataFrame(data , columns = ['year','state','pop','debt'])
#Pass a column with doesn't contained in dict , will appear as Missing values
frame2

Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,
1,2001,Ohio,1.7,
2,2002,Ohio,3.6,
3,2001,Nevada,2.4,
4,2002,Nevada,2.9,
5,2003,Nevada,3.2,


In [103]:
frame2.columns

Index([u'year', u'state', u'pop', u'debt'], dtype='object')

In [105]:
frame2['pop'] #retreive a specific column

0    1.5
1    1.7
2    3.6
3    2.4
4    2.9
5    3.2
Name: pop, dtype: float64

In [106]:
frame2.year

0    2000
1    2001
2    2002
3    2001
4    2002
5    2003
Name: year, dtype: int64

In [108]:
#retrive a row
frame2.loc[0]

year     2000
state    Ohio
pop       1.5
debt      NaN
Name: 0, dtype: object

In [112]:
frame2['debt']=10.4
#assiging a values to a column , it can be a scalar of array
frame2

Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,10.4
1,2001,Ohio,1.7,10.4
2,2002,Ohio,3.6,10.4
3,2001,Nevada,2.4,10.4
4,2002,Nevada,2.9,10.4
5,2003,Nevada,3.2,10.4


In [122]:
frame2['debt']=np.arange(0.,12.,2)
frame2

Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,0.0
1,2001,Ohio,1.7,2.0
2,2002,Ohio,3.6,4.0
3,2001,Nevada,2.4,6.0
4,2002,Nevada,2.9,8.0
5,2003,Nevada,3.2,10.0


In [131]:
val = pd.Series([-1.2, -1.5, -1.7], index=[2, 4, 5])
frame2['debt'] = val
frame2 

Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,
1,2001,Ohio,1.7,
2,2002,Ohio,3.6,-1.2
3,2001,Nevada,2.4,
4,2002,Nevada,2.9,-1.5
5,2003,Nevada,3.2,-1.7


In [136]:
frame2['eastern']=frame2.state == 'Ohio'
#insert a new column of bool value where state column equal to 'Ohio'
frame2

Unnamed: 0,year,state,pop,debt,eastern
0,2000,Ohio,1.5,,True
1,2001,Ohio,1.7,,True
2,2002,Ohio,3.6,-1.2,True
3,2001,Nevada,2.4,,False
4,2002,Nevada,2.9,-1.5,False
5,2003,Nevada,3.2,-1.7,False


In [137]:
# delete a column using a del keyword
del frame2['eastern']
frame2

Unnamed: 0,year,state,pop,debt
0,2000,Ohio,1.5,
1,2001,Ohio,1.7,
2,2002,Ohio,3.6,-1.2
3,2001,Nevada,2.4,
4,2002,Nevada,2.9,-1.5
5,2003,Nevada,3.2,-1.7


In [141]:
 pop = {'Nevada': {2001: 2.4, 2002: 2.9},
        'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
#create a frame using nested dict outer key is as column
# and inner keys are interpret as row indices
frame3 = pd.DataFrame(pop)
frame3

Unnamed: 0,Nevada,Ohio
2000,,1.5
2001,2.4,1.7
2002,2.9,3.6


In [142]:
#Transpose of DataFrame
frame3.T

Unnamed: 0,2000,2001,2002
Nevada,,2.4,2.9
Ohio,1.5,1.7,3.6


In [144]:
pd.DataFrame(pop,index=[2001,2002,2003])
#if explict indexes are defined inner keys will not be considered as row index

Unnamed: 0,Nevada,Ohio
2001,2.4,1.7
2002,2.9,3.6
2003,,


In [145]:
frame3.index.name = 'year'; frame3.columns.name = 'states'
frame3
#specify names of rows and columns

states,Nevada,Ohio
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2000,,1.5
2001,2.4,1.7
2002,2.9,3.6


In [148]:
frame3.values # retrive values from DataFrames array

array([[nan, 1.5],
       [2.4, 1.7],
       [2.9, 3.6]])

In [149]:
frame2.values

array([[2000L, 'Ohio', 1.5, nan],
       [2001L, 'Ohio', 1.7, nan],
       [2002L, 'Ohio', 3.6, -1.2],
       [2001L, 'Nevada', 2.4, nan],
       [2002L, 'Nevada', 2.9, -1.5],
       [2003L, 'Nevada', 3.2, -1.7]], dtype=object)

## Index Objects
    pandas’s Index objects are responsible for holding the axis labels and other metadata (like the axis name or names). Any array or other sequence of labels you use when constructing a Series or DataFrame is internally converted to an Index

In [155]:
obj = pd.Series(range(3),index=['a','b','c'])
obj

a    0
b    1
c    2
dtype: int64

In [157]:
index = obj.index
#read index of obj and store them in index
index

Index([u'a', u'b', u'c'], dtype='object')

In [158]:
index[1:] #slicing

Index([u'b', u'c'], dtype='object')

In [159]:
index[1] = 'd'

TypeError: Index does not support mutable operations

In [162]:
labels = pd.Index(np.arange(3))
labels

Int64Index([0, 1, 2], dtype='int64')

In [163]:
obj2 = pd.Series([1.5,-2.6,0],index = labels)
obj2

0    1.5
1   -2.6
2    0.0
dtype: float64

In [164]:
obj2.index is labels

True

In [165]:
frame3

states,Nevada,Ohio
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2000,,1.5
2001,2.4,1.7
2002,2.9,3.6


In [167]:
frame3.columns

Index([u'Nevada', u'Ohio'], dtype='object', name=u'states')

In [175]:
'Ohio' in frame3.columns

True

In [176]:
2002 in frame3.index

True

In [177]:
dup_labels = pd.Index(['foo','foo','bar','bar'])
#pandas can have duplicate labels:

### Reindexing
    create a new object with the data conformed to a new index

In [179]:
obj = pd.Series([1.3,7.2,-2.5,3.6],index = ['d','b','a','c'])
obj

d    1.3
b    7.2
a   -2.5
c    3.6
dtype: float64

In [182]:
obj2 = obj.reindex(['a','b','c','d','e'])
obj2
#Calling reindex  on serires, rearrange the data according to new index

a   -2.5
b    7.2
c    3.6
d    1.3
e    NaN
dtype: float64

### Droping
    drop method will return a new object with the indicated value or values deleted from an axis:

In [185]:
obj = pd.Series(np.arange(5.), index = ['a','b','c','d','e'])
obj

a    0.0
b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

In [187]:
new_obj = obj.drop('c')
new_obj

a    0.0
b    1.0
d    3.0
e    4.0
dtype: float64

In [189]:
new_obj = obj.drop(['d','c'])
new_obj

a    0.0
b    1.0
e    4.0
dtype: float64

In [191]:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
index=['Ohio', 'Colorado', 'Utah', 'New York'],           
columns=['one', 'two', 'three', 'four'])
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [192]:
data.drop(['Ohio','Utah']) # delete multiple rows

Unnamed: 0,one,two,three,four
Colorado,4,5,6,7
New York,12,13,14,15


In [197]:
data.drop('two',axis = 1) # drop column you can use axis = 'columns'

Unnamed: 0,one,three,four
Ohio,0,2,3
Colorado,4,6,7
Utah,8,10,11
New York,12,14,15


### Indexing, Selection, and Filtering

In [200]:
obj


a    0.0
b    1.0
c    2.0
d    3.0
e    4.0
dtype: float64

In [202]:
obj['b']

1.0

In [203]:
obj[1]

1.0

In [204]:
obj[2:4]

c    2.0
d    3.0
dtype: float64

In [205]:
obj[obj<2]

a    0.0
b    1.0
dtype: float64

In [207]:
obj['b':'c']=5
obj

a    0.0
b    5.0
c    5.0
d    3.0
e    4.0
dtype: float64

In [208]:
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [209]:
data.loc['Colorado',['two','three']] #location using orignal index

two      5
three    6
Name: Colorado, dtype: int32

In [210]:
data.iloc[2,[3,0,1]] #location using integer index

four    11
one      8
two      9
Name: Utah, dtype: int32

In [211]:
data.iloc[2]

one       8
two       9
three    10
four     11
Name: Utah, dtype: int32

### Arithmetic and Data Alignment 
    An important pandas feature for some applications is the behavior of arithmetic between objects with different indexes. When you are adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the index pairs

In [215]:
 s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
s1

a    7.3
c   -2.5
d    3.4
e    1.5
dtype: float64

In [217]:
 s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1],
index=['a', 'c', 'e', 'f', 'g'])
s2

a   -2.1
c    3.6
e   -1.5
f    4.0
g    3.1
dtype: float64

In [218]:
# adding s1 and s2
s1+s2
# series make NA for mising values

a    5.2
c    1.1
d    NaN
e    0.0
f    NaN
g    NaN
dtype: float64

In [220]:
 df1 = pd.DataFrame(np.arange(9.).reshape((3, 3)), columns=list('bcd'),
index=['Ohio', 'Texas', 'Colorado'])
df1

Unnamed: 0,b,c,d
Ohio,0.0,1.0,2.0
Texas,3.0,4.0,5.0
Colorado,6.0,7.0,8.0


In [222]:
 df2 = pd.DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),
index=['Utah', 'Ohio', 'Texas', 'Oregon'])
df2

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


In [223]:
# make two dataframes and add them
df1 + df2

Unnamed: 0,b,c,d,e
Colorado,,,,
Ohio,3.0,,6.0,
Oregon,,,,
Texas,9.0,,12.0,
Utah,,,,


In [224]:
df1.add(df2 , fill_value = 0)
# to avoid Nan we use add function and fills 0

Unnamed: 0,b,c,d,e
Colorado,6.0,7.0,8.0,
Ohio,3.0,1.0,6.0,5.0
Oregon,9.0,,10.0,11.0
Texas,9.0,4.0,12.0,8.0
Utah,0.0,,1.0,2.0


In [226]:
df2.add(df1 , fill_value = 0)

Unnamed: 0,b,c,d,e
Colorado,6.0,7.0,8.0,
Ohio,3.0,1.0,6.0,5.0
Oregon,9.0,,10.0,11.0
Texas,9.0,4.0,12.0,8.0
Utah,0.0,,1.0,2.0


In [228]:
1 / df1 # divide elements by 1

Unnamed: 0,b,c,d
Ohio,inf,1.0,0.5
Texas,0.333333,0.25,0.2
Colorado,0.166667,0.142857,0.125


In [230]:
df1.rdiv(1) # same as divide by 1

Unnamed: 0,b,c,d
Ohio,inf,1.0,0.5
Texas,0.333333,0.25,0.2
Colorado,0.166667,0.142857,0.125


In [231]:
df1.reindex(columns = df2.columns,fill_value = 0)

Unnamed: 0,b,d,e
Ohio,0.0,2.0,0
Texas,3.0,5.0,0
Colorado,6.0,8.0,0


In [236]:
frame = pd.DataFrame(np.arange(12.).reshape((4, 3)),
columns=list('bde'),
index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


In [237]:
series = frame.iloc[0]
series

b    0.0
d    1.0
e    2.0
Name: Utah, dtype: float64

In [238]:
frame - series

Unnamed: 0,b,d,e
Utah,0.0,0.0,0.0
Ohio,3.0,3.0,3.0
Texas,6.0,6.0,6.0
Oregon,9.0,9.0,9.0


In [239]:
 series2 = pd.Series(range(3), index=['b', 'e', 'f'])
series2

b    0
e    1
f    2
dtype: int64

In [240]:
frame + series2

Unnamed: 0,b,d,e,f
Utah,0.0,,3.0,
Ohio,3.0,,6.0,
Texas,6.0,,9.0,
Oregon,9.0,,12.0,


In [242]:
series3 = frame['d']
series3

Utah       1.0
Ohio       4.0
Texas      7.0
Oregon    10.0
Name: d, dtype: float64

In [243]:
frame.sub(series3, axis = 'index')

Unnamed: 0,b,d,e
Utah,-1.0,0.0,1.0
Ohio,-1.0,0.0,1.0
Texas,-1.0,0.0,1.0
Oregon,-1.0,0.0,1.0


In [244]:
frame

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


### Function Application and Mapping 

In [245]:
frame = pd.DataFrame(np.random.randn(4, 3),
columns=list('bde'),
index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame

Unnamed: 0,b,d,e
Utah,0.735952,2.30895,-0.410424
Ohio,-0.296374,-0.048523,0.089371
Texas,1.317074,-0.018494,0.895776
Oregon,0.265081,0.897885,-0.400032


In [246]:
np.abs(frame)

Unnamed: 0,b,d,e
Utah,0.735952,2.30895,0.410424
Ohio,0.296374,0.048523,0.089371
Texas,1.317074,0.018494,0.895776
Oregon,0.265081,0.897885,0.400032


In [247]:
f = lambda x: x.max() - x.min() # lamda function or mapping
frame.apply(f)

b    1.613448
d    2.357473
e    1.306200
dtype: float64

In [250]:
frame.apply(f , axis = 'columns')

Utah      2.719374
Ohio      0.385745
Texas     1.335568
Oregon    1.297916
dtype: float64

In [253]:
def f(x):
    return pd.Series([x.min(), x.max()], index=['min', 'max'])
frame.apply(f)

Unnamed: 0,b,d,e
min,-0.296374,-0.048523,-0.410424
max,1.317074,2.30895,0.895776


### Sorting and Ranking
    Sorting a dataset by some criterion is another important built-in operation. To sort lexicographically by row or column index, use the sort_index method, which returns a new, sorted object:


In [255]:
 obj = pd.Series(range(4), index=['d', 'a', 'b', 'c'])
obj

d    0
a    1
b    2
c    3
dtype: int64

In [256]:
obj.sort_index()

a    1
b    2
c    3
d    0
dtype: int64

In [257]:
frame

Unnamed: 0,b,d,e
Utah,0.735952,2.30895,-0.410424
Ohio,-0.296374,-0.048523,0.089371
Texas,1.317074,-0.018494,0.895776
Oregon,0.265081,0.897885,-0.400032


In [258]:
frame.sort_index()

Unnamed: 0,b,d,e
Ohio,-0.296374,-0.048523,0.089371
Oregon,0.265081,0.897885,-0.400032
Texas,1.317074,-0.018494,0.895776
Utah,0.735952,2.30895,-0.410424


In [259]:
frame.sort_index(axis =1)

Unnamed: 0,b,d,e
Utah,0.735952,2.30895,-0.410424
Ohio,-0.296374,-0.048523,0.089371
Texas,1.317074,-0.018494,0.895776
Oregon,0.265081,0.897885,-0.400032


In [261]:
frame.sort_values(by='b')

Unnamed: 0,b,d,e
Ohio,-0.296374,-0.048523,0.089371
Oregon,0.265081,0.897885,-0.400032
Utah,0.735952,2.30895,-0.410424
Texas,1.317074,-0.018494,0.895776


###  Summarizing and Computing Descriptive Statistics
    pandas objects are equipped with a set of common mathematical and statistical methods. Most of these fall into the category of reductions or summary statistics, methods that extract a single value (like the sum or mean) from a Series or a Series of values from the rows or columns of a DataFrame. Compared with the similar methods found on NumPy arrays, they have built-in handling for missing data.

In [266]:
df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5],
[np.nan, np.nan], [0.75, -1.3]],
index=['a', 'b', 'c', 'd'], 
columns=['one', 'two'])
df

Unnamed: 0,one,two
a,1.4,
b,7.1,-4.5
c,,
d,0.75,-1.3


In [267]:
df.sum()
#Return column Returns

one    9.25
two   -5.80
dtype: float64

In [270]:
df.sum(axis = 'columns') #sum of row

a    1.40
b    2.60
c    0.00
d   -0.55
dtype: float64

In [272]:
df.mean(axis='columns',skipna= False) # to ignore NaN

a      NaN
b    1.300
c      NaN
d   -0.275
dtype: float64

In [275]:
df

Unnamed: 0,one,two
a,1.4,
b,7.1,-4.5
c,,
d,0.75,-1.3


In [274]:
df.idxmax() #returns the index of max values

one    b
two    d
dtype: object

In [276]:
df.idxmin()

one    d
two    b
dtype: object

In [277]:
df.cumsum() # accumulated sum (add sum in next row)

Unnamed: 0,one,two
a,1.4,
b,8.5,-4.5
c,,
d,9.25,-5.8


In [278]:
df.describe() #return many statistical things

Unnamed: 0,one,two
count,3.0,2.0
mean,3.083333,-2.9
std,3.493685,2.262742
min,0.75,-4.5
25%,1.075,-3.7
50%,1.4,-2.9
75%,4.25,-2.1
max,7.1,-1.3


In [279]:
obj = pd.Series(['a','a','b','c']*4)
obj

0     a
1     a
2     b
3     c
4     a
5     a
6     b
7     c
8     a
9     a
10    b
11    c
12    a
13    a
14    b
15    c
dtype: object

In [280]:
obj.describe()

count     16
unique     3
top        a
freq       8
dtype: object