<p style="float:right">
<img src="../../images/logos/cu.png" style="display:inline" />
<img src="../../images/logos/cires.png" style="display:inline" />
<img src="../../images/logos/nasa.png" style="display:inline" />
<img src="../../images/logos/nsidc_daac.png" style="display:inline" />
</p>

## Python, Jupyter & pandas: Solutions for Modules 4 & 5

Run the following cell as-is to do some initial setup. Some steps from the setup for Exercise 3 are repeated here, as well as some pieces of Module 4. Since pandas excels at working with timeseries data, rather than gridded data, we are ultimately interested in the total sea ice area for each day in the dataset. Here, we'll save that to a variable called `total_area` before plugging it into pandas.

In [None]:
%matplotlib inline
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import netCDF4
import numpy as np
import pandas as pd

data_file = '../../data/september-concentration.nc'
dataset = netCDF4.Dataset(data_file)
variables = dataset.variables

area = variables['area']
sic = variables['sea_ice_concentration']

# the time variable in the netCDF file is days since some epoch,
# let's just work with datetime objects
time = netCDF4.num2date(variables['time'][:], variables['time'].units)

def seaice_area_km2(grid, area):
    # get rid of flagged values and convert 0-100% to 0.0 to 1.0
    decimal = (np.ma.masked_outside(grid, 0, 100) / 100)
    
    return np.sum(area * decimal)

days = sic.shape[0]
grid_area = area[:]
total_area = np.ma.zeros(days)
for i in np.arange(days):
    total_area[i] = seaice_area_km2(sic[i, :, :], grid_area)

## Big list to a DataFrame

`total_area` is a list with values representing the total area of sea ice on a given day. Print out the value of `total_area`.

In [None]:
total_area

Construct a pandas DataFrame using `total_area` as the data, `time` as the index, and `['area']` as the columns. Assign it to the variable `df`.

In [None]:
df = pd.DataFrame({'area': total_area}, index=time)

`DataFrame` has a method `min()` that returns the minimum value in the DataFrame, and a method `idxmin()` that returns the index where that minimum value occurs. What is the lowest sea ice area found in this dataset, and on which date did it occur?

In [None]:
df.min()

In [None]:
df.idxmin()

## DataFrame with a column for each year

Since we're interested in plotting multiple years of data, it would be useful to have our data arranged such that each year is in its own column indexed by day of the month.  

The first step is to create a `DataFrame` indexed by both year and day. We can access the year and the day in the DateTimeIndex with `df.index.year` and `df.index.day`. 

Create this DataFrame using set_index to create a `MultiIndex` by year and day and store this reindexed dataframe into a new variable `df2`. 
 
*Look in Module 4 for hints on using set_index*

In [None]:
df2 = df.set_index([df.index.year, df.index.day])
df2.head()

The type of new `DataFrame`'s index should be a pandas `MultiIndex`. Verify that it is.

In [None]:
type(df2.index)

pandas `DataFrame`s have a method `unstack()` to pivot values from indexes to columns. Use this on `df2` to create a `DataFrame` with an index of `days` and a column for each `year`. 

Save this new `DataFrame` to `df3`

In [None]:
df3 = df2.unstack(level=0)
df3.head()

## Plots

Let's plot the 2002 and 2012 data on the same graph.

Caution: Normally you can select columns from a `DataFrame` with `DataFrame[colname]`. However, after `unstack`ing the index, our columns are a `MultiIndex` and you'll have to account for that.

The easiest way to work with this will be to transform the `MultiIndex` columns into a single column index by dropping the useless level (named 'area' if you're following directions)

This command will drop your column level 0 and give you normally indexed year columns.  If you're feeling bold try the following examples without it.

`df3.columns = df3.columns.droplevel(0)`


Tell `plt` to produce a 10" by 10" figure, subset `df3` to the years we're interested in, and use `DataFrame.plot()` to render the graph.

In [None]:
df3.columns = df3.columns.droplevel(0)

In [None]:
plt.figure(figsize=(10,10))
df3[[2002, 2012]].plot()


We can see the more recent year is lower, but comparing values from just 2 years is not terribly informative. Let's plot how the September mean changes over the years.

First, compute the mean value of each year and store in a `Series` named `mean`.

In [None]:
mean = df3.mean()
mean

Since we've got a simple Series now, we can just call `plot()` on it to get a
sense of how the mean sea ice area is changing over time.

Do this now:

In [None]:
mean.plot(marker='.')

In [None]:
# Finished with Exercises for Module 4

In [None]:
# Begin exercises for Module 5

Let's add a trend line to this graph.

First, put the `mean` Series into a DataFrame (since it's easy to plot multiple lines when they're just columns in a DataFrame).

In [None]:
df4 = pd.DataFrame(mean, columns=['mean'])
df4

Compute a trendline for this data and add it to your data frame as a new column

In [None]:
slope, intercept = np.polyfit(x=mean.index, y=mean.values, deg=1)
best_fit_fn = np.poly1d([slope, intercept])
df4['best-fit'] = best_fit_fn(mean.index)

Plot the September mean sea ice extents along with a trendline. 

In [None]:
df4.plot()