# Pandas

The panadas module is one of the most powerful tools for data analysis.  Pandas was designed to work with tabular and heterogeneous data.  It is standard to use the alias ``pd`` when importing pandas.
~~~
import pandas as pd
~~~
I usually import numpy at the same time since pandas and numpy are often used in tandem.

In [2]:
import numpy as np
import pandas as pd

The two main data structures that we will use from pandas are the *Series* and the *DataFrame*.  

## Series
A Series is a one-dimensional array-like object containing a sequence of values and an associated array of data labels, called the *index*.  

### Creating a Series
A Series can be created from a list, a numpy ndarray, or a dictionary using the function ``pd.Series``.

In [3]:
my_list = [45, 17, 16, 44, 28]  
labels = ['Utah', 'Ohio', 'Tennessee', 'Wyoming', 'Texas']

In [4]:
pd.Series(my_list)

0    45
1    17
2    16
3    44
4    28
dtype: int64

In [8]:
pd.Series(data = my_list, index = labels)

Utah         45
Ohio         17
Tennessee    16
Wyoming      44
Texas        28
dtype: int64

In [6]:
d = {'Utah':45, 'Ohio':17, 'Tennessee':16, 'Wyoming':44, 'Texas':28}

In [7]:
pd.Series(d)

Utah         45
Ohio         17
Tennessee    16
Wyoming      44
Texas        28
dtype: int64

You can kind of think about a Series as an ordered dictionary where the labels are the key and the data are the values.

The data in a Series need not be numeric

In [10]:
pd.Series(data=labels)

0         Utah
1         Ohio
2    Tennessee
3      Wyoming
4        Texas
dtype: object

## DataFrames
DataFrames are the main data structure of pandas and were directly inspired by the R programming language.  DataFrames are a bunch of Series objects put together to share the same (row) index.  A DataFrame has both a row and a column index.  

In [3]:
np.random.seed(229)

In [4]:
df = pd.DataFrame(np.random.randn(5,4), index='A B C D E'.split(), columns='W X Y Z'.split())

In [5]:
df

Unnamed: 0,W,X,Y,Z
A,0.260778,-0.440511,-1.130954,0.800912
B,-0.159565,1.509733,1.475873,-1.115098
C,0.293871,-0.09197,1.494585,1.776313
D,-0.28439,0.562143,0.616958,-0.962247
E,0.714923,-1.257296,0.561833,1.03131


In [None]:
df[]