

# Introduction to Pandas

![pandas Logo](https://github.com/pandas-dev/pandas/raw/master/web/pandas/static/img/pandas.svg "pandas Logo")

## Questions
1. What are the important pandas data structures?
1. How do I interact with these?
1. What else can pandas do for me?

In [None]:
import pandas as pd

## The pandas [`DataFrame`](https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe)...
... is a labeled, two dimensional columnal structure not unlike a table, spreadsheet, or the R `data.frame`.

![dataframe schematic](https://github.com/pandas-dev/pandas/raw/master/doc/source/_static/schemas/01_table_dataframe.svg "Schematic of a pandas DataFrame")

The `columns` that make up our `DataFrame` can be lists, dictionaries, NumPy arrays, pandas `Series`, or more. Within these `columns` our data can be any texts, numbers, dates and times, or many other data types you may have encountered in Python and NumPy. Shown here on the left in dark gray, our very first `column`  is uniquely referrred to as an `Index`, and this contains information characterizing each row of our `DataFrame`. Similar to any other `column`, the `index` can label our rows by text, numbers, `datetime`s (a popular one!), or more.

Let's take a look by reading in some `.csv` data [[ref](https://www.ncdc.noaa.gov/teleconnections/enso/indicators/sst/)].

In [None]:
df = pd.read_csv('data/enso_data.csv')

df

In [None]:
df.index

Our indexing column isn't particularly helpful currently. pandas is clever! A few optional keyword arguments later, and...

In [None]:
df = pd.read_csv('data/enso_data.csv', index_col=0, parse_dates=True)

df

In [None]:
df.index

... now we have our data helpfully organized by a proper `datetime`-like object. Each of our multiple columns of data can now be referenced by their date! This sneak preview at the pandas `DatetimeIndex` also unlocks for us much of pandas most useful time series functionality. Don't worry, we'll get there. What are the actual columns of data we've read in here?

In [None]:
df.columns

## The pandas [`Series`](https://pandas.pydata.org/docs/user_guide/dsintro.html#series)...

... is essentially any one of the columns of our `DataFrame`, with its accompanying `Index`.

![pandas Series](https://github.com/pandas-dev/pandas/raw/master/doc/source/_static/schemas/01_table_series.svg "Schematic of a pandas Series")

The pandas `Series` is a fast and capable 1-dimensional list of nearly any data type we could want an array of, and it can behave very similarly to a NumPy `ndarray` or a Python `dict`. You can take a look at any of the `Series` that make up your `DataFrame` with its label and Python `dict`-like notation, as well as dot-shorthand:

In [None]:
df["Nino34"]

In [None]:
df.Nino34

## Investigating the `DataFrame` and `Series`

Pandas has some helpful shorthand for quickly investigating our `DataFrame`.

In [None]:
df.head()

In [None]:
df.tail()

Let's take a look at a `Series` on its own. Recall selecting just one of our columns.

In [None]:
nino34_series = df["Nino34"]

nino34_series

`Series` can be indexed, selected, and subset as both `ndarray`-like,

In [None]:
nino34_series[14]

and `dict`-like,

In [None]:
nino34_series["1993-04-01"]

and these can be extended in both ways you might expect and ways you might not:

In [None]:
# numpy-like interval slices
nino34_series[5:25:4]

In [None]:
# label-based slicing
nino34_series["2000-12-01":"2002-04-01"]

In [None]:
df.describe()

## The Powers of Pandas

### Quick Plots of Your Data

In [None]:
df.Nino12.plot()

### Calculations

In [None]:
df.describe()

### Slicing and Dicing

In [None]:
df[df.index.month == 1]

In [None]:
df.loc[df.index.year == 1995]

In [None]:
df[df.Nino34anom > 2]

### Resampling

In [None]:
df.Nino34.resample('1Y').mean()

In [None]:
df.resample('1Y').mean()

In [None]:
df.groupby?

In [None]:
temp_cols = []
anom_cols = []

for column in df.columns:
    if 'anom' in column:
        anom_cols.append(column)
        
    else:
        temp_cols.append(column)

In [None]:
df[temp_cols].plot()

In [None]:
df[anom_cols].plot()