## Pandas Series Object

A Pandas series is a one-dimensional array of indexed data

In [None]:
import pandas as pd
import numpy as np

In [None]:
unemployment_rates = pd.Series([4.2,4.4,6.3,4.75])
unemployment_rates

we see that the output is a sequence of indices and a sequence of values:

In [None]:
unemployment_rates.values

In [None]:
unemployment_rates.index

Like `Numpy` we can access data by slicing using the famililar square bracket notation

In [None]:
unemployment_rates[1]

In [None]:
unemployment_rates[2:4]

### Setting the Index

By default, the index is set for us by pandas. However, we can set it ourselves

In [None]:
unemployment_rates = pd.Series([4.2,4.4,6.3,4.75],
                              index=['California', 'New York', 'Alabama', 'Washington'])
unemployment_rates

and we can index as expected:

In [None]:
unemployment_rates['Alabama']

or maybe, unexpectedly:   

In [None]:
unemployment_rates['New York':'Washington']

We are not limited to indexing by continuous numbers or strings:

In [None]:
unemployment_rates = pd.Series([4.2,4.4,6.3,4.75],
                              index=[1983, 1985, 1990, 2016])
unemployment_rates

In [None]:
unemployment_rates[2016]

## The Pandas Data Frame

A `DataFrame` is a two dimensional array that allows multiple columns. It can be thought of as a sequence of aligned arrays, or aligned pandas Series. "aligned" here refers to sharing the same index

In [None]:
unemployment_rates = pd.Series([4.2,4.4,6.3,4.75],
                              index=['California', 'New York', 'Alabama', 'Washington'])
unemployment_rates

In [None]:
participation_rates = pd.Series([70.5,68.7,62.3,64.0],
                              index=['California', 'New York', 'Alabama', 'Washington'])
participation_rates

In [None]:
state_employment = pd.DataFrame({'unemployment_rate': unemployment_rates,
                                    'participation_rates': participation_rates})
state_employment

the `DataFrame` has attributes

In [None]:
state_employment.index

In [None]:
state_employment.columns

Notice, DataFrames will cope when some indices do not match:

In [None]:
participation_rates = pd.Series([70.5,68.7,62.3,64.0],
                              index=['California', 'New York', 'Alabama', 'Nebraska'])
state_employment = pd.DataFrame({'unemployment_rate': unemployment_rates,
                                    'participation_rates': participation_rates})
state_employment

by using `NaNs`, i.e  `Not a Number` to deal with missing values. We will return to missing data later on..

### Making DataFrames from other objects

Pandas DataFrames don't have to come from collecting a bunch of `Series` together. You can assemble them from many different objects.

The most useful for us, is probably transforming Numpy Arrays into DataFrame and the vice versa:

In [None]:
df = pd.DataFrame(np.random.rand(3, 2),
                 columns=['col1', 'col2'],
                 index=['row1', 'row2', 'row3'])
df



In [None]:
array = np.array(df)
array

In [None]:
type(array)

## A little more on the Pandas Index Object

The Pandas object has interesting structure in itself, and is probably worth understanding.. it can be thought of as an immutable array, or as an ordered set. This leads to interesting consequences...

In [None]:
ind = pd.Index([2, 3, 5, 7, 11])
ind

it can be sliced:

In [None]:
ind[1]

In [None]:
ind[::2]

and has attributes familiar to NumPy arrays

In [None]:
print(ind.size, ind.shape, ind.ndim, ind.dtype)

but they are immutable:

In [None]:
ind[1] = 70

being immutable has desirable properties when the indices are shared across multiple DataFrames

Indexes are also ordered sets:

In [None]:
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

In [None]:
indA & indB  # intersection

In [None]:
indA | indB  # union

In [None]:
indA ^ indB  # symmetric difference

these are important concepts when thinking about joins across multiple data sets