<p style="float:right">
<img src="images/logos/cu.png" style="display:inline" />
<img src="images/logos/cires.png" style="display:inline" />
<img src="images/logos/nasa.png" style="display:inline" />
</p>

# Python, Jupyter & pandas: Module 5

## Inference and Visualization

In [None]:
%matplotlib inline
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd


read in the csv file we saved from the last module

In [None]:
monthly = pd.read_csv('monthly-extents.csv', index_col='date', parse_dates=True)

In [None]:
monthly.head()

Look for a trend in the Northern Hemisphere June snowcover.

reset my DataFrame to months columns indexed by years.

In [None]:
year_by_month = monthly.set_index([monthly.index.year, monthly.index.month]).unstack(1)
year_by_month.head()

In [None]:
june_anomalies = year_by_month['snowcover'][6] - year_by_month['snowcover'][6].mean()
june_anomalies = june_anomalies.dropna()

In [None]:
with mpl.rc_context(rc={'figure.figsize': (15, 4)}):
    june_anomalies.plot(title='Northern Hemisphere Snow Cover Anomalies: June',kind='bar', color='r')
    

Compute a least squares linear fit

In [None]:
slope, intercept = np.polyfit(june_anomalies.index.values, june_anomalies.values, 1)
fit_function = np.poly1d([ slope, intercept])
best_fit = fit_function(june_anomalies.index)

In [None]:
with mpl.rc_context(rc={'figure.figsize': (15, 4)}):
    june_anomalies.plot(title='Northern Hemisphere Snow Cover Anomalies: June',kind='Bar', color='r')
    plt.plot(best_fit, color='b', linestyle='--')


[xarray](http://xarray.pydata.org/en/stable/)

"xarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures."

With xarray you can open a netcdf file as an `xarray.Dataset` and a lot of the grunt work of setting up dimensions is done for you.

In [None]:
import xarray as xr  # import as xr by convention
import pandas as pd
import numpy as np

In [None]:
snowcover_url = 'http://www.ncdc.noaa.gov/thredds/dodsC/cdr/snowcover/nhsce_v01r01_19661004_latest.nc'
dataset = xr.open_dataset(snowcover_url)

We're now attatched to an `xarray.DataSet`

In [None]:
print(dataset)

You can see the dimensions

In [None]:
dataset.dims

and the indexes

In [None]:
dataset.indexes

You can see xarray has already taken care of converting the time coordinate into a `DatetimeIndex` (as opposed to how we handled it by hand in Module-4)

You can see what variables are in the file.

In [None]:
dataset.data_vars

You can access the variables as attributes or dictionary keys.

Accessing a `DataSet` attribute yields a `DataArray`

In [None]:
dataset['land']

So just like we did before we have access to all of the data and indexes from the endpoint.

In [None]:
sc_extent = dataset['snow_cover_extent']
print(sc_extent)

Look at the second line of output.  These are the data values of the
DataArray.  When it says `[19933056 values with dtype=float64]`, this is
telling you that the operation of downloading the data has been deferred, we
have not fetched all of the values from the endpoint, just the metadata.
This allows you to work with just the data you are interested in, without
having to download an entire file.

You can access data in `DataArray`s a number of ways.

By indexing positionally by integer.  (time is the first coordinate)

In [None]:
sc_extent.dims

In [None]:
a_slice = sc_extent[2400:2403, 1:5, 1:6]
a_slice

In [None]:
print(sc_extent)

And you can see again, this operation to retrieve `a_slice` has retrieved only the data necessary from the remote file or endpoint.

You can also grab a slice by integer along a named index with `DataArray.isel`

In [None]:
sc_extent.isel(rows=slice(0, 5), time=slice(7, 9), cols=slice(40, 42))

Or you can use slices of the native value of the index with `DataArray.sel`

In [None]:
sc_extent.sel(time=slice('2010-01-01', '2011-01-02'))

In [None]:
print(sc_extent)

In [None]:
z = dataset.sel(time=slice('2003-01-01', '2003-02-01'))

In [None]:
print(sc_extent)

In [None]:
from mpl_toolkits.basemap import Basemap
from ipywidgets import interact
import ipywidgets as widgets

@interact(longitude_0=widgets.IntSlider(min=-165,max=-15,step=30,value=-105))
def plot_land(longitude_0=-80):
    plt.figure(figsize=(10, 10))
    m = Basemap(projection='npstere', boundinglat=30, lon_0=longitude_0)
    m.drawcoastlines()

    parallels = np.arange(0, 90, 20)
    m.drawparallels(parallels, labels=[True])
    meridians = np.arange(-180, 180, 45)
    m.drawmeridians(meridians, labels=[True, True,True,True,True])

    m.pcolor(dataset.longitude.values, dataset.latitude.values, dataset.land.values, latlon=True, cmap='Accent')
    plt.draw()



In [None]:
dataset.latitude