http://pandas.pydata.org/pandas-docs/stable/10min.html

# 10 Minutes to pandas

This is a short introduction to pandas, geared mainly for new users. You can see more complex recipes in the [Cookbook](http://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook) 

Customarily, we import as follows:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
%matplotlib inline

## Object Creation

See the [Data Structure Intro section](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dsintro) 

Creating a Series by passing a list of values, letting pandas create a default integer index:

In [None]:
s = pd.Series([1,3,5,np.nan,6,8])

In [None]:
# TODO: Print s here:


Creating a DataFrame by passing a numpy array, with a datetime index and labeled columns:

In [None]:
dates = pd.date_range('20130101', periods=6)

In [None]:
# TODO: Print dates here: 


In [None]:
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))

In [None]:
# TODO: Print df here:


Creating a DataFrame by passing a dict of objects that can be converted to series-like.

In [None]:
df2 = pd.DataFrame({'A':1.,
                   'B':pd.Timestamp('20130102'),
                   'C':pd.Series(1,index=list(range(4)),dtype='float32'),
                   'D':np.array([3]*4,dtype='int32'),
                   'E':pd.Categorical(["test","train","test","train"]),
                   'F':'foo'})

In [None]:
df2

Having specific [dtypes](http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dtypes)

In [None]:
# TODO: Show the types of df2's columns using dtypes:


## Viewing Data

See the [Basics section](http://pandas.pydata.org/pandas-docs/stable/basics.html#basics) 

See the top & bottom rows of the frame

In [None]:
# TODO: Print the head of the data:


In [None]:
# TODO: Print the tail of the data:
df.tail(3)

Display the index, columns, and the underlying numpy data

In [None]:
# TODO: Print the index of df


In [None]:
# TODO: Print the columns of df


In [None]:
# TODO: Print the underlying numpy data of df


Describe shows a quick statistic summary of your data

In [None]:
# TODO: Describe df


Transposing your data

In [None]:
# TODO: Transpose and print df, do not override the object


Sorting by value

In [None]:
# TODO: Sort the rows of df by the values of column B, descending


## Selection

**Note:** While standard Python / Numpy expressions for selecting and setting are intuitive and come in handy for interactive work, for production code, we recommend the optimized pandas data access methods, .at, .iat, .loc, .iloc and .ix.

See the indexing documentation [Indexing and Selecting Data](http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing) and [MultiIndex / Advanced Indexing](http://pandas.pydata.org/pandas-docs/stable/advanced.html#advanced)

### Getting

Selecting a single column, which yields a Series, equivalent to df.A

In [None]:
df['A']

Selecting via [], which slices the rows.

In [None]:
df[0:3]

In [None]:
df['20130102':'20130104']

### Selection by Label

See more in [Selection by Label](Selection by Label)

For getting a cross section using a label

In [None]:
df.loc[dates[0]]

Selection by Label

In [None]:
df.loc[:,['A','B']]

Showing label slicing, both endpoints are included

In [None]:
df.loc['20130102':'20130104',['A','B']]

Reduction in the dimensions of the returned object

In [None]:
df.loc['20130102',['A','B']]

For getting a scalar value

In [None]:
df.loc[dates[0],'A']

### Selection by Position

See more in [Selection by Position](http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer)

Select via the position of the passed integers

In [None]:
df.iloc[3]

By integer slices, acting similar to numpy/python

In [None]:
df.iloc[3:5,0:2]

By lists of integer position locations, similar to the numpy/python style

In [None]:
df.iloc[[1,2,4],[0,2]]

For slicing rows explicitly

In [None]:
df.iloc[1:3,:]

For slicing columns explicitly

In [None]:
df.iloc[:,1:3]

For getting a value explicitly

In [None]:
df.iloc[1,1]

For getting fast access to a scalar (equiv to the prior method)

In [None]:
df.iat[1,1]

## Boolean Indexing

Using a single column’s values to select data.

In [None]:
df[df.A > 0]

In [None]:
# TODO: print df where the values of B are larger to 0.1


A where operation for getting.

In [None]:
df[df > 0]

Using the isin() method for filtering:

In [None]:
df2 = df.copy()

In [None]:
df2['E'] = ['one','one', 'two','three','four','three']

In [None]:
df2

In [None]:
# TODO: print df2 where df2[E]'s values are either 'two' or 'four'
df2[df2['E'].isin(['two','four'])]

### Stats

Operations in general exclude missing data.

Performing a descriptive statistic

In [None]:
df.mean()

Same operation on the other axis

In [None]:
df.mean(1)

Operating with objects that have different dimensionality and need alignment. In addition, pandas automatically broadcasts along the specified dimension.

In [None]:
s = pd.Series([1,3,5,np.nan,6,8], index=dates).shift(2)

In [None]:
s

In [None]:
df.sub(s, axis='index')

### Apply

Applying functions to the data

In [None]:
df.apply(np.cumsum)

In [None]:
df.apply(lambda x: x.max() - x.min())

In [None]:
# TODO: apply np.square only on column A of the dataframe: (use axis=1, lambda row: row['A'])


### Histogramming

See more at [Histogramming and Discretization](http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-discretization)

In [None]:
s = pd.Series(np.random.randint(0, 7, size=10))

In [None]:
s

In [None]:
s.value_counts()

### String Methods

Series is equipped with a set of string processing methods in the str attribute that make it easy to operate on each element of the array, as in the code snippet below. Note that pattern-matching in str generally uses [regular expressions](https://docs.python.org/2/library/re.html) by default (and in some cases always uses them). See more at [Vectorized String Methods](http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods).

In [None]:
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])

In [None]:
s.str.lower()

## Grouping

By “group by” we are referring to a process involving one or more of the following steps

* **Splitting** the data into groups based on some criteria
* **Applying** a function to each group independently
* **Combining** the results into a data structure

See the [Grouping section](http://pandas.pydata.org/pandas-docs/stable/groupby.html#groupby)

In [None]:
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                                    'B' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
                                    'C' : np.random.randn(8),
                                     'D' : np.random.randn(8)})

In [None]:
df

Grouping and then applying a function sum to the resulting groups.

In [None]:
df.groupby('A').sum()

In [None]:
df.groupby(['A','B']).sum()