# 10 Minutes to pandas

This is a short introduction to pandas, geared mainly for new users. You can see more complex recipes in the Cookbook

Customarily, we import as follows:

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Object Creation

See the Data Structure Intro section

Creating a Series by passing a list of values, letting pandas create a default integer index:

In [5]:
s = pd.Series([1,3,5,np.nan,6,8])

Creating a DataFrame by passing a numpy array, with a datetime index and labeled columns:

In [9]:
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

In [11]:
dates = pd.date_range('20170701', periods = 6)

In [12]:
dates

DatetimeIndex(['2017-07-01', '2017-07-02', '2017-07-03', '2017-07-04',
               '2017-07-05', '2017-07-06'],
              dtype='datetime64[ns]', freq='D')

In [15]:
df = pd.DataFrame(np.random.rand(6,4), index = dates, columns=list('ABCD'))

In [16]:
df

Unnamed: 0,A,B,C,D
2017-07-01,0.098242,0.580357,0.519971,0.6311
2017-07-02,0.123892,0.155063,0.374889,0.800185
2017-07-03,0.575473,0.612724,0.300137,0.96875
2017-07-04,0.255296,0.787767,0.92893,0.525105
2017-07-05,0.829087,0.489498,0.28877,0.978314
2017-07-06,0.441466,0.647599,0.843196,0.554706


Creating a DataFrame by passing a dict of objects that can be converted to series-like.

In [24]:
df2 = pd.DataFrame({
    'A':1,
    'B' : pd.Timestamp('20130102'),
    'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
    'D' : np.array([3] * 4,dtype='int32'),
    'E' : pd.Categorical(["test","train","test","train"]),
    'F' : 'foo' 
})

In [26]:
df2

Unnamed: 0,A,B,C,D,E,F
0,1,2013-01-02,1.0,3,test,foo
1,1,2013-01-02,1.0,3,train,foo
2,1,2013-01-02,1.0,3,test,foo
3,1,2013-01-02,1.0,3,train,foo


In [29]:
pd.DataFrame(
{
    'C' : pd.Series(1,index=list(range(4)),dtype='float32')
}
)

Unnamed: 0,C
0,1.0
1,1.0
2,1.0
3,1.0


In [30]:
df2.dtypes

A             int64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

If you’re using IPython, tab completion for column names (as well as public attributes) is automatically enabled. Here’s a subset of the attributes that will be completed: