#### This is a short introduction to pandas for new users, adapted from "10 Minutes to Pandas": http://pandas.pydata.org/pandas-docs/stable/10min.html

### 0. Import pandas and related packages

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### 1. Object creation

Creating a Series by passing a list of values, letting pandas create a default integer index:

In [3]:
s = pd.Series([1,3,5,np.nan,6,8])
print s

0     1
1     3
2     5
3   NaN
4     6
5     8
dtype: float64


Creating a DataFrame by passing a numpy array, with a datetime index and labeled columns:

In [None]:
dates = pd.date_range('20130101', periods=6)
dates

In [None]:
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
df

Creating a DataFrame by passing a dict of objects that can be converted to series-like.

In [None]:
df2 = pd.DataFrame({ 'A' : 1.,
    'B' : pd.Timestamp('20130102'),
    'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
    'D' : np.array([3] * 4,dtype='int32'),
    'E' : pd.Categorical(["test","train","test","train"]),
    'F' : 'foo' })
df2

Having specific dtypes

In [None]:
df2.dtypes

### 2. Viewing data

Let's create a toy dataset

In [None]:
index = pd.date_range('1/1/2000', periods=8)
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))

To view a small sample of a Series or DataFrame object, use the head() and tail() methods. The default number of elements to display is five, but you may pass a custom number.

In [None]:
long_series = pd.Series(np.random.randn(1000))
long_series.head()

In [None]:
long_series.tail(3)

Display the index, columns, and the underlying numpy data

In [None]:
df.index

In [None]:
df.columns

In [None]:
df.values

Describe shows a quick statistic summary of your data

In [None]:
df.describe()

Transposing your data

In [None]:
df.T

Sorting by an axis

In [None]:
df.sort_index(axis=1, ascending=False)

Sorting by values

In [None]:
df.sort(columns='B')

### 3. Selection

While standard Python / Numpy expressions for selecting and setting are intuitive and come in handy for interactive work, for production code, we recommend the optimized pandas data access methods, .at, .iat, .loc, .iloc and .ix.

#### Selecting a single column, which yields a Series, equivalent to df.A

In [None]:
df['A']

Selecting via [  ], which slices the rows.

In [None]:
df[0:3]

In [None]:
df['20130102':'20130104']

#### Selection by Label¶

For getting a cross section using a label

In [None]:
df.loc[dates[0]]

Selecting on a multi-axis by label

In [None]:
df.loc[:,['A','B']]

Showing label slicing, both endpoints are included

In [None]:
df.loc['20130102':'20130104',['A','B']]

Reduction in the dimensions of the returned object

In [None]:
df.loc['20130102',['A','B']]

For getting a scalar value

In [None]:
df.loc[dates[0],'A']

For getting fast access to a scalar (equiv to the prior method)

In [None]:
df.at[dates[0],'A']

#### Selection by Position

Select via the position of the passed integers

In [None]:
df.iloc[2]

By integer slices, acting similar to numpy/python

In [None]:
df.iloc[3:5,0:2]

By lists of integer position locations, similar to the numpy/python style

In [None]:
df.iloc[[1,2,4],[0,2]]

For slicing rows explicitly

In [None]:
df.iloc[1:3,:]

For slicing columns explicitly

In [None]:
df.iloc[:,1:3]

For getting a value explicitly

In [None]:
df.iloc[1,1]

For getting fast access to a scalar (equiv to the prior method)

In [None]:
df.iat[1,1]

#### Boolean Indexing

Using a single column’s values to select data.

In [None]:
df[df.A > 0]

A where operation for getting.

In [None]:
df[df > 0]

Using the isin( ) method for filtering:

In [None]:
df2 = df.copy()
df2['E'] = ['one', 'one','two','three','four','three']
df2

In [None]:
df2[df2['E'].isin(['two','four'])]

#### Setting

Setting a new column automatically aligns the data by the indexes

In [None]:
s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6))
s1

In [None]:
df['F'] = s1
df

Setting values by label

In [None]:
df.at[dates[0],'A'] = 0
df

Setting values by position

In [None]:
df.iat[0,1] = 0
df

Setting by assigning with a numpy array

In [None]:
df.loc[:,'D'] = np.array([5] * len(df))
df

A where operation with setting.

In [None]:
df2 = df.copy()
df2[df2 > 0] = -df2
df2

#### Now you are familiar with the basics of pandas in IPython Notebook, finish the rest of the tutorial at: http://pandas.pydata.org/pandas-docs/stable/10min.html. 

#### Make sure to read the following sections:

- Missing data: drop rows/columns containing missing data; fill missing data; find missing data

- Operations: descriptive statistic, apply row/column/cell wise functions, string manipulation

- Merge: concat, join, append

- Grouping: split-apply-combine, very powerful!

- Reshaping: stack, unstack, pivot tables

- Getting data in/out: pd.read_csv( ), df.to_csv( )

#### More complex recipes here: http://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook