# 10 minutes pandas
Source: [http://pandas.pydata.org/pandas-docs/version/0.15.2/10min.html](http://pandas.pydata.org/pandas-docs/version/0.15.2/10min.html)

This is a short introduction to pandas, geared mainly for new users. You can see more complex recipes in the Cookbook

Customarily, we import as follows

In [7]:
import pandas as pd
import numpy as np
import matplotlib as plt

## Object Creation

See the [Data Structure Intro section](http://pandas.pydata.org/pandas-docs/version/0.15.2/dsintro.html#dsintro)

Creating a Series by passing a list of values, letting pandas create a default integer index

In [8]:
s = pd.Series([1,3,5,np.nan,6,8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Creating a DataFrame by passing a numpy array, with a datetime index and labeled columns.

In [9]:
dates = pd.date_range('20130101', periods=6)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [10]:
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
2013-01-01,0.732062,-0.872924,-0.029987,-0.428751
2013-01-02,0.223329,1.104988,-1.143353,1.367178
2013-01-03,0.430158,0.754639,-0.506686,-1.618767
2013-01-04,0.489252,2.327406,1.459776,1.582406
2013-01-05,-0.97274,-0.174383,1.023746,1.307901
2013-01-06,0.642879,-1.502393,1.080234,-1.233327


Creating a DataFrame by passing a dict of objects that can be converted to series-like.

In [22]:
df2 = pd.DataFrame({
        'A': 1.,
        'B': pd.Timestamp('20130102'),
        'C': pd.Series(1, index=list(range(4)), dtype='float32'),
        'D': np.array([3] * 4, dtype='int32'),
        'E': pd.Categorical(["test", "train", "test", "train"]),
        'F': 'foo'
    })
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,1.0,3,test,foo
1,1.0,2013-01-02,1.0,3,train,foo
2,1.0,2013-01-02,1.0,3,test,foo
3,1.0,2013-01-02,1.0,3,train,foo


Having specific [dtypes](http://pandas.pydata.org/pandas-docs/version/0.15.2/basics.html#basics-dtypes)

In [23]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

In [24]:
df2.A

0    1.0
1    1.0
2    1.0
3    1.0
Name: A, dtype: float64

## Viewing Data

See the [Basics section](http://pandas.pydata.org/pandas-docs/version/0.15.2/basics.html#basics)

See the top & bottom rows of the frame

In [29]:
df.head(3)

Unnamed: 0,A,B,C,D
2013-01-01,0.732062,-0.872924,-0.029987,-0.428751
2013-01-02,0.223329,1.104988,-1.143353,1.367178
2013-01-03,0.430158,0.754639,-0.506686,-1.618767


In [27]:
df.tail(3)

Unnamed: 0,A,B,C,D
2013-01-04,0.489252,2.327406,1.459776,1.582406
2013-01-05,-0.97274,-0.174383,1.023746,1.307901
2013-01-06,0.642879,-1.502393,1.080234,-1.233327


Display the index,columns, and the underlying numpy data

In [30]:
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [31]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [32]:
df.values

array([[ 0.73206226, -0.87292427, -0.02998722, -0.42875054],
       [ 0.22332874,  1.10498803, -1.14335343,  1.36717802],
       [ 0.43015796,  0.75463924, -0.50668561, -1.61876721],
       [ 0.48925244,  2.32740612,  1.45977631,  1.58240583],
       [-0.97274017, -0.17438326,  1.0237459 ,  1.30790104],
       [ 0.64287938, -1.5023928 ,  1.08023424, -1.23332652]])

Describe shows a quick statistic summary of your data

In [33]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,0.25749,0.272889,0.313955,0.162773
std,0.627998,1.40072,1.031444,1.431792
min,-0.97274,-1.502393,-1.143353,-1.618767
25%,0.275036,-0.698289,-0.387511,-1.032183
50%,0.459705,0.290128,0.496879,0.439575
75%,0.604473,1.017401,1.066112,1.352359
max,0.732062,2.327406,1.459776,1.582406


Transposing your data

In [34]:
df.T

Unnamed: 0,2013-01-01 00:00:00,2013-01-02 00:00:00,2013-01-03 00:00:00,2013-01-04 00:00:00,2013-01-05 00:00:00,2013-01-06 00:00:00
A,0.732062,0.223329,0.430158,0.489252,-0.97274,0.642879
B,-0.872924,1.104988,0.754639,2.327406,-0.174383,-1.502393
C,-0.029987,-1.143353,-0.506686,1.459776,1.023746,1.080234
D,-0.428751,1.367178,-1.618767,1.582406,1.307901,-1.233327


Sorting by an axis

In [45]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,D,C,B,A
2013-01-01,-0.428751,-0.029987,-0.872924,0.732062
2013-01-02,1.367178,-1.143353,1.104988,0.223329
2013-01-03,-1.618767,-0.506686,0.754639,0.430158
2013-01-04,1.582406,1.459776,2.327406,0.489252
2013-01-05,1.307901,1.023746,-0.174383,-0.97274
2013-01-06,-1.233327,1.080234,-1.502393,0.642879


Sorting by values

In [50]:
df.sort_values(by='B')

Unnamed: 0,A,B,C,D
2013-01-06,0.642879,-1.502393,1.080234,-1.233327
2013-01-01,0.732062,-0.872924,-0.029987,-0.428751
2013-01-05,-0.97274,-0.174383,1.023746,1.307901
2013-01-03,0.430158,0.754639,-0.506686,-1.618767
2013-01-02,0.223329,1.104988,-1.143353,1.367178
2013-01-04,0.489252,2.327406,1.459776,1.582406


## Selection

**Note:** While standard Python / Numpy expressions for selecting and setting are intuitive and come in handy for interactive work, for production code, we recommend the optimized pandas data access methods, .at, .iat, .loc, .iloc and .ix.

See the indexing documentation [Indexing and Selecing Data](http://pandas.pydata.org/pandas-docs/version/0.15.2/indexing.html#indexing) and [MultiIndex / Advanced Indexing](http://pandas.pydata.org/pandas-docs/version/0.15.2/advanced.html#advanced)

## Getting

Selecting a single column, which yields a Series, equivalent to df.A

In [51]:
df['A']

2013-01-01    0.732062
2013-01-02    0.223329
2013-01-03    0.430158
2013-01-04    0.489252
2013-01-05   -0.972740
2013-01-06    0.642879
Freq: D, Name: A, dtype: float64

In [52]:
df[0:3]

Unnamed: 0,A,B,C,D
2013-01-01,0.732062,-0.872924,-0.029987,-0.428751
2013-01-02,0.223329,1.104988,-1.143353,1.367178
2013-01-03,0.430158,0.754639,-0.506686,-1.618767


In [54]:
df['20130103':'20130104']

Unnamed: 0,A,B,C,D
2013-01-03,0.430158,0.754639,-0.506686,-1.618767
2013-01-04,0.489252,2.327406,1.459776,1.582406
