# Importing Pandas

In [1]:
import pandas as pd
pd.__version__

'0.21.0'

# Pandas Series class
A Pandas Series object is like a generalized array where the indexes are explicit, don't have to be contiguous or increasing, and in fact don't have to be integers.  The object also behaves like a specialised dictionary where the keys are in order.

In [2]:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

In [3]:
data.index # a pd.Index object

RangeIndex(start=0, stop=4, step=1)

In [4]:
data.values # a NumPy array

array([ 0.25,  0.5 ,  0.75,  1.  ])

In [5]:
data[1] # similar to numpy

0.5

In [6]:
data[1:3] # similar to numpy

1    0.50
2    0.75
dtype: float64

### Non-contiguous indexes
The difference between a Pandas Series and a NumPy array is that the Pandas object has an **explicit** index. Indexes need not be contiguous or even monitonic.

In [7]:
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=[2,5,3,7])
data

2    0.25
5    0.50
3    0.75
7    1.00
dtype: float64

In [8]:
data[5]

0.5

In [9]:
# data[1] ## does not exist; will raise error

### Non-integer indexes

Any data type can be used as an index.

In [10]:
data = pd.Series([0.25, 0.5, 0.75, 1.], index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [11]:
data['b']

0.5

## Series as a dictionary
We can construct a Series object directly from a Python dictionary.

In [12]:
pop_dict = {'China': 13888170000,
            'India': 1325460000,
            'USA': 326309000,
            'Indonesia': 261890900}
pop = pd.Series(pop_dict)
pop

China        13888170000
India         1325460000
Indonesia      261890900
USA            326309000
dtype: int64

In [13]:
pop['China']

13888170000

However, unlike a dictionary, a Series supports slicing:

In [14]:
pop['China':'India'] # the last index is INCLUDED

China    13888170000
India     1325460000
dtype: int64

## Constructing Series

In [15]:
# From a list
pd.Series([1,2,3,4])

0    1
1    2
2    3
3    4
dtype: int64

In [16]:
# From a scalar, repeated to fill the specified index
pd.Series(10, index=[100, 200, 300])

100    10
200    10
300    10
dtype: int64

In [17]:
# From a dictionary, indexes are sorted dictionary keys
pd.Series({2:'a', 1:'b', 3:'c'})

1    b
2    a
3    c
dtype: object

In [18]:
# Index can be explicitly set with a dictionary
pd.Series({2:'a', 1:'b', 3:'c'}, index=[3,2]) # Note: indexes in order as specified, not sorted

3    c
2    a
dtype: object

# DataFrames

A Pandas DataFrame is like a generalized two-dimensional matrix, where the row indices are flexible like a Series and column have names. It can be thought of a a collection of Series objects, where each Series object shares the same column names as its own index.

In [19]:
area_dict = {'China': 9596960,
             'India': 3287590,
             'Indonesia': 1905000,
             'USA': 9631418}
area = pd.Series(area_dict)
area

China        9596960
India        3287590
Indonesia    1905000
USA          9631418
dtype: int64

In [20]:
countries = pd.DataFrame({'pop': pop, 'area': area})
countries

Unnamed: 0,area,pop
China,9596960,13888170000
India,3287590,1325460000
Indonesia,1905000,261890900
USA,9631418,326309000


In [21]:
countries.index

Index(['China', 'India', 'Indonesia', 'USA'], dtype='object')

In [22]:
countries.columns

Index(['area', 'pop'], dtype='object')

### DataFrame as a specialized dictionary
A DataFrame also behaves like a dictionary.  Its keys are *column names*.  This is different from a numpy array, where `data[0]` would return the first row.  In a DataFrame you index by columns: `data['colname']`.

In [23]:
countries['area']

China        9596960
India        3287590
Indonesia    1905000
USA          9631418
Name: area, dtype: int64

In [24]:
countries['pop']

China        13888170000
India         1325460000
Indonesia      261890900
USA            326309000
Name: pop, dtype: int64

## Constructing DataFrames

In [25]:
# From a list of dicts
lst_dicts = [ {'a': i, 'b': i+1} for i in range(3)]
lst_dicts

[{'a': 0, 'b': 1}, {'a': 1, 'b': 2}, {'a': 2, 'b': 3}]

In [26]:
pd.DataFrame(lst_dicts)

Unnamed: 0,a,b
0,0,1
1,1,2
2,2,3


In [27]:
# From a NumPy 2-dimensional array
import numpy as np
np_array = np.random.rand(3,2)
np_array

array([[ 0.05096764,  0.32868137],
       [ 0.77827398,  0.45462375],
       [ 0.89895685,  0.2005755 ]])

In [28]:
pd.DataFrame(np_array, columns=['foo', 'bar'], index=['a', 'b', 'c'])

Unnamed: 0,foo,bar
a,0.050968,0.328681
b,0.778274,0.454624
c,0.898957,0.200575


In [29]:
# From a single Series object
pd.DataFrame(area, columns=['area'])

Unnamed: 0,area
China,9596960
India,3287590
Indonesia,1905000
USA,9631418


In [30]:
# From a dictionary of Series objects
pd.DataFrame({'pop': pop, 'area': area})

Unnamed: 0,area,pop
China,9596960,13888170000
India,3287590,1325460000
Indonesia,1905000,261890900
USA,9631418,326309000


## Indexes

In [31]:
idx = pd.Index([2, 3, 5, 7, 11])
idx

Int64Index([2, 3, 5, 7, 11], dtype='int64')

In [32]:
idx[1]

3

In [34]:
idx[::3]

Int64Index([2, 7], dtype='int64')

In [35]:
(idx.size, idx.shape, idx.ndim, idx.dtype)

(5, (5,), 1, dtype('int64'))