# 10 Minute to pandas 

## Object Creation

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Creating a Series by passing a list of values, letting pandas create a default integer index:

In [4]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Creating a DataFrame by passing a numpy array, with a datetime index and labeled columns:

In [11]:
dates = pd.date_range('20130101', periods=6)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [10]:
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
2013-01-01,1.778996,-1.30209,-0.076834,-2.948765
2013-01-02,-1.152555,1.149984,-0.350048,-1.291639
2013-01-03,-1.118857,0.827117,-0.659799,0.495834
2013-01-04,-0.081111,1.103899,-0.805387,-0.653863
2013-01-05,-0.244077,0.308557,0.672543,0.248797
2013-01-06,0.108173,0.369332,0.778794,0.098405


Creating a DataFrame by passing a dict of objects that can be converted to series-like.

In [18]:
df2 = pd.DataFrame({'A' : 1.,
                    'B' : pd.Timestamp('20130102'),
                    'C' : pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D' : np.array([3] * 4, dtype='int32'),
                    'E' : pd.Categorical(['test', 'train', 'test', 'train']),
                    'F' : 'foo'})
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,1.0,3,test,foo
1,1.0,2013-01-02,1.0,3,train,foo
2,1.0,2013-01-02,1.0,3,test,foo
3,1.0,2013-01-02,1.0,3,train,foo


Having specific dtypes  
查看详细的数据类型

In [19]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

If you’re using IPython, tab completion for column names (as well as public attributes) is automatically enabled. Here’s a subset of the attributes that will be completed:

## Viewing Data 

See the top & bottom rows of the frame  
查看结构顶部和底部的行

In [23]:
df.head()

Unnamed: 0,A,B,C,D
2013-01-01,1.778996,-1.30209,-0.076834,-2.948765
2013-01-02,-1.152555,1.149984,-0.350048,-1.291639
2013-01-03,-1.118857,0.827117,-0.659799,0.495834
2013-01-04,-0.081111,1.103899,-0.805387,-0.653863
2013-01-05,-0.244077,0.308557,0.672543,0.248797


In [24]:
df.tail()

Unnamed: 0,A,B,C,D
2013-01-02,-1.152555,1.149984,-0.350048,-1.291639
2013-01-03,-1.118857,0.827117,-0.659799,0.495834
2013-01-04,-0.081111,1.103899,-0.805387,-0.653863
2013-01-05,-0.244077,0.308557,0.672543,0.248797
2013-01-06,0.108173,0.369332,0.778794,0.098405


Display the index,columns,and the underlying numpy data  
显示索引,列,和底层的数据

In [25]:
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [26]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [28]:
df.values

array([[ 1.77899556, -1.30208994, -0.07683388, -2.94876541],
       [-1.15255494,  1.14998403, -0.35004848, -1.29163854],
       [-1.11885661,  0.82711677, -0.65979928,  0.49583419],
       [-0.08111117,  1.10389865, -0.80538695, -0.65386275],
       [-0.24407703,  0.30855745,  0.67254253,  0.2487969 ],
       [ 0.10817293,  0.36933191,  0.7787936 ,  0.09840524]])

Describe shows a quick statistic summary of your data

In [29]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,-0.118239,0.409466,-0.073455,-0.675205
std,1.071356,0.910632,0.669161,1.294072
min,-1.152555,-1.30209,-0.805387,-2.948765
25%,-0.900162,0.323751,-0.582362,-1.132195
50%,-0.162594,0.598224,-0.213441,-0.277729
75%,0.060852,1.034703,0.485198,0.211199
max,1.778996,1.149984,0.778794,0.495834


Transposing your data

In [31]:
df.T

Unnamed: 0,2013-01-01 00:00:00,2013-01-02 00:00:00,2013-01-03 00:00:00,2013-01-04 00:00:00,2013-01-05 00:00:00,2013-01-06 00:00:00
A,1.778996,-1.152555,-1.118857,-0.081111,-0.244077,0.108173
B,-1.30209,1.149984,0.827117,1.103899,0.308557,0.369332
C,-0.076834,-0.350048,-0.659799,-0.805387,0.672543,0.778794
D,-2.948765,-1.291639,0.495834,-0.653863,0.248797,0.098405


sort by an axis

In [33]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,D,C,B,A
2013-01-01,-2.948765,-0.076834,-1.30209,1.778996
2013-01-02,-1.291639,-0.350048,1.149984,-1.152555
2013-01-03,0.495834,-0.659799,0.827117,-1.118857
2013-01-04,-0.653863,-0.805387,1.103899,-0.081111
2013-01-05,0.248797,0.672543,0.308557,-0.244077
2013-01-06,0.098405,0.778794,0.369332,0.108173


sorting by values

In [34]:
df.sort_values(by='B')

Unnamed: 0,A,B,C,D
2013-01-01,1.778996,-1.30209,-0.076834,-2.948765
2013-01-05,-0.244077,0.308557,0.672543,0.248797
2013-01-06,0.108173,0.369332,0.778794,0.098405
2013-01-03,-1.118857,0.827117,-0.659799,0.495834
2013-01-04,-0.081111,1.103899,-0.805387,-0.653863
2013-01-02,-1.152555,1.149984,-0.350048,-1.291639
