### Rapid Overview
- build intuition about pandas
- details later

documentation: http://pandas.pydata.org/pandas-docs/stable/10min.html

In [1]:
import pandas as pd
import numpy as np

##### Basic series; default integer index
documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html

In [2]:
my_series = pd.Series([1,3,5,np.nan,6,8])
my_series

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

##### datetime index
documentation: http://pandas.pydata.org/pandas-docs/stable/timeseries.html

In [3]:
my_dates_index = pd.date_range('20160101', periods=6)
my_dates_index

DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2016-01-06'],
              dtype='datetime64[ns]', freq='D')

##### sample NumPy data


In [4]:
sample_numpy_data = np.array(np.arange(24)).reshape((6,4))
sample_numpy_data

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

##### sample data frame, with column headers; uses our dates_index
documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html

In [7]:
sample_df = pd.DataFrame(sample_numpy_data, index=my_dates_index, columns=list('ABCD'))
sample_df

Unnamed: 0,A,B,C,D
2016-01-01,0,1,2,3
2016-01-02,4,5,6,7
2016-01-03,8,9,10,11
2016-01-04,12,13,14,15
2016-01-05,16,17,18,19
2016-01-06,20,21,22,23


##### data frame from a Python dictionary

In [8]:
df_from_dictionary = pd.DataFrame({ 
                         'float' : 1.,
                         'time' : pd.Timestamp('20160825'),
                         'series' : pd.Series(1,index=list(range(4)),dtype='float32'),
                         'array' : np.array([3] * 4,dtype='int32'),
                         'categories' : pd.Categorical(["test","train","taxes","tools"]),
                         'dull' : 'boring data' 
                      })
df_from_dictionary

Unnamed: 0,array,categories,dull,float,series,time
0,3,test,boring data,1.0,1.0,2016-08-25
1,3,train,boring data,1.0,1.0,2016-08-25
2,3,taxes,boring data,1.0,1.0,2016-08-25
3,3,tools,boring data,1.0,1.0,2016-08-25


##### pandas retains data type for each column

In [9]:
df_from_dictionary.dtypes

array                  int32
categories          category
dull                  object
float                float64
series               float32
time          datetime64[ns]
dtype: object

##### head and tail; default is 5 rows

In [10]:
sample_df.head()

Unnamed: 0,A,B,C,D
2016-01-01,0,1,2,3
2016-01-02,4,5,6,7
2016-01-03,8,9,10,11
2016-01-04,12,13,14,15
2016-01-05,16,17,18,19


In [11]:
sample_df.tail(2)

Unnamed: 0,A,B,C,D
2016-01-05,16,17,18,19
2016-01-06,20,21,22,23


##### underlying data: values, index and columns

In [12]:
sample_df.values

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

In [13]:
sample_df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [14]:
sample_df.describe

<bound method NDFrame.describe of              A   B   C   D
2016-01-01   0   1   2   3
2016-01-02   4   5   6   7
2016-01-03   8   9  10  11
2016-01-04  12  13  14  15
2016-01-05  16  17  18  19
2016-01-06  20  21  22  23>

##### describe(): a quick statistical summary
- notice: integer data summarized with floating point numbers

##### control precision of floating point numbers
for options and settings, please see: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.set_option.html

##### transpose rows and columns

##### sort by axis

##### sort by data within a column (our data was already sorted)