## Continuing with xarray



Let's import the modules



In [1]:
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
%matplotlib inline

Let's get the temperature data again



In [1]:
tmax = xr.open_dataarray("http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis.derived/surface_gauss/tmax.2m.mon.mean.nc")
tmax = tmax.isel(lat=slice(21, 35), lon=slice(123, 160)).load()
tmin = xr.open_dataarray("http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis.derived/surface_gauss/tmin.2m.mon.mean.nc")
tmin = tmin.isel(lat=slice(21, 35), lon=slice(123, 160)).load()
ds = xr.Dataset({'tmax': tmax, 'tmin': tmin})
ds

## Plot the probability of freezing by month for three locations



-   Calculate the freezing probability
-   Stack the DataArray to create the location coordinate
-   Select three locations and plot (Hint: look at the `to_pandas` method)



## Calculate monthly anomalies and plot them



Plot spatial average as well as a map



## Now for a different kind of exercise



-   Go to [http://xarray.pydata.org/en/stable/examples/monthly-means.html](http://xarray.pydata.org/en/stable/examples/monthly-means.html)
-   Let's walk through each section
-   Take a few minutes to understand the code and then discuss!



## Dask



![img](images/dask.png)



## A flexible library for parallel computing



-   Dynamic task scheduling
-   "Big data" collections
-   Interoperability with existing libraries (numpy, pandas, xarray)



## Dask's distributed scheduler



![img](images/collections-schedulers.png)



## Dask arrays



-   Dask divides arrays into many small pieces, called chunk
-   Each of which is presumed to be small enough to fit into memory



## Before we start computing



Start a Dask client



In [1]:
from dask.distributed import Client, progress
client = Client(processes=False, threads_per_worker=4, n_workers=4)
client

## Let's read some data



from the [Gridded Ensemble Precipitation and Temperature Estimates over the Contiguous United States](https://www.earthsystemgrid.org/dataset/gridded_precip_and_temp.html)



In [1]:
ds = xr.open_mfdataset('nc/*.nc4', engine='netcdf4', concat_dim='ensemble', chunks={'time': 366})

How big are our data?



In [1]:
print('ds size in GB {:0.2f}\n'.format(ds.nbytes / 1e9))

In [None]:
ds.info()

## What's our domain?



In [1]:
ds['mask'].plot()

## What do our arrays look like?



In [1]:
for name, da in ds.data_vars.items():
    print(name, da.data)

## Let's calculate some things



In [1]:
da_mean = ds['t_mean'].mean(dim='time')
da_mean

In [1]:
da_spread = da_mean.max(dim='ensemble') - da_mean.min(dim='ensemble')
da_spread

## What's happening?



In [1]:
from dask import visualize
visualize(da_mean)

## Actually computing results



In [None]:
out = da_spread.load()


## Or plotting results



In [1]:
da_spread.plot(robust=True, figsize=(10, 6))

## Homework



Download Sea Surface Temperature (SST) monthly data from [http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCDC/.ERSST/.version3b/.sst/datafiles.html](http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCDC/.ERSST/.version3b/.sst/datafiles.html).

-   Change the 'T' dimension and coordinates to 'time' and convert from `months since 1960-01-01` to `datetime` type.
-   Plot a map of the time-averaged SST.
-   Plot the zonal time-average SST.
-   Plot the time series of SST at latitude 0 and longitude 230 degrees.
-   Calculate and plot a map of the El Niño 3.4 from the SST data (the index is the area averaged SST anomaly from 5S-5N and 170-120W).

