# Day 2, after lunch: Gridded data analysis and plotting
Focus: netCDF files. Very useful standard, because
 * Provide meta information
 * Provide structure (dimensions versus variables)

## Raw access to netCDF

Using the ``netCDF4`` library.

In [None]:
import netCDF4 as nc
f = nc.Dataset('d2s2/ei.ans.2010072912.800.T.nc', 'r')

So far, so easy. But what's in this file?

In [None]:
print('Variables:', f.variables)
print('Attributes:', f.ncattrs())

Retrieving variables is straight-forward, but one has to be aware of a subtlety:

In [None]:
temp = f.variables['t']
# Retrieving the variable object versus the data
print(type(temp), type(temp[:]))

There is a difference between the variable object ``temp`` and the ``numpy.ndarray`` that contains the actual data! The variable object contains all meta-data while the ``numpy`` array does not.

In [None]:
temp = temp[:]
print(temp.shape, temp.min(), temp.max())

This raw access is not super difficult or cumbersome, but may be there is an even simpler way?

## Access through ``xarray``

The xarray package is very convenient for working with netCDF data, because it merges the variable object and the data array in one structure. 

Todo: Better names for ``t_da``.

In [None]:
import xarray as xr
f = xr.open_dataset('d2s2/ei.ans.2010072912.800.T.nc')
temp_var = f.variables['t']
t_da = f['t']
print(type(t_da),type(temp_var))
print(t_da)

The access pattern is very much the same as through the ``netCDF4`` module. However, there is the additional option of accessing ``f['t']`` which contains the both the data and the metadata and can directly be used for analysis and calculations.

For example, the ``DataArray`` allows array indexing through dimension values rather than grid point indexes.

In [None]:
print(t_da.sel(latitude=60.5))
print(t_da.sel(latitude=60.5, longitude=5.5))

The results of these indexing operations are again of type ``DataArray``, and can hence be further manipulated and analysed.

In addition, they contain up-to-date metadata, so they can directly be saved to netCDF files.

In [None]:
from datetime import datetime as dt
tosave = t_da.sel(time=dt(2010,7,29,6))
tosave.to_netcdf('example.nc')
!ls -l example.nc

If you have ncdump available, you can also have a direct look at what we saved, and to verify that all metadata has automatically been transferred (and even adapted!) to correctly describe the new netCDF file.

In [None]:
!ncdump -h example.nc

## Plotting maps

Now that we found an easy way to read and analyse gridded data, let's see how to visualise it. One of the most common types of visualisation is through maps. To plot data on a map, we use the ``cartopy`` module, which is based on ``matplotlib``.

So let's see how to set up a map to plot on.

In [None]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt

# Create a matplotlib axes using the Lambert Conic Conformal projection
proj = ccrs.LambertConformal(
    central_longitude=-30, 
    central_latitude=58, 
    standard_parallels=(58, 58)
)
ax = plt.axes(projection=proj)

# Add coastlines and the lat-lon grid
ax.coastlines()
ax.gridlines(ylocs=range(15,76,15))

# set map boundaries by setting the x- and y-limits in meters
#ax.set_xlim([-5.4e6, 5.4e6])
#ax.set_ylim([-3.6e6, 3.6e6])

A lot of text for an empty hull of a map. Now let's fill it with some data. Fortunately, that is very easy, using the standard plotting commands of ``matplotlib``, like ``contourf`` for a filled-contour plot.

In [None]:
# Explicitly set the Figure size
fig = plt.figure(figsize=(10.0,6.0), dpi=96) 
ax = plt.axes(projection=proj)

# A temperature map
showdate = t_da.time.values[0]
cs = ax.contourf(t_da.longitude, t_da.latitude, t_da.sel(time=showdate), 
            range(266,301,2),
            cmap='RdBu_r',
            transform=ccrs.PlateCarree(), # the projection of the data grid, here lat/lon, i.e. PlateCarree
)
ax.coastlines()
ax.gridlines(ylocs=range(15,76,15))

ax.set_xlim([-5.4e6, 5.4e6])
ax.set_ylim([-3.6e6, 3.6e6])

plt.colorbar(cs, fraction=0.1, shrink=0.9)

This temperature data is from the peak of the Russian heat wave in 2010, so let's shift the map focus towards Eurasia, and mark some cities on the map. We'll use the cities where we have temperature time series available.

For now we're only interested in the location of these cities (given in the meta data of the netCDF file), but later we'll also include the temperature time series in the plot.

ToDo: Have a separate look at the time series netCDF files before.

In [None]:
fig = plt.figure(figsize=(10.0,6.0), dpi=96) 

# new projection, centred on Moscow
proj = ccrs.LambertConformal(
    central_longitude=30, 
    central_latitude=55, 
    standard_parallels=(55, 55)
)

# Plot map
ax = plt.axes(projection=proj) # Map axes covers everything but the rightmost column
cs = ax.contourf(t_da.longitude, t_da.latitude, t_da.sel(time=showdate), 
            range(266,301,2),
            cmap='RdBu_r',
            transform=ccrs.PlateCarree(), # the projection of the data grid, here lat/lon, i.e. PlateCarree
)
ax.coastlines()
ax.gridlines(ylocs=range(15,76,15))

ax.set_xlim([-3.2e6, 3.2e6])
ax.set_ylim([-2.4e6, 2.4e6])
plt.colorbar(cs, fraction=0.1, shrink=0.9)


# obs and info from different places
stations = ['StPeter', 'Moscow', 'NizNovg']

# Add locations of stations on the map
for sidx, station, name in zip(range(len(stations)), stations, stationnames):
    f = xr.open_dataset('d2s2/ei.ans.2010.800.%s.nc' % station)
    lat = f['lat'].values[0]
    lon = f['lon'].values[0]
    ax.scatter(lon, lat, s=100, facecolor='C%d' % sidx, edgecolor='w', 
               linewidth=1, transform=ccrs.PlateCarree())

Finally, let's extend the plot by adding panels showing the temperature time series at these three locations. We'll set up a 4x3 matrix of subplots, using the entire 3x3 square on the left for the map, and the 1x3 column on the right for the time series.

This kind of panel setup can be achieved through the ``gridspec`` module of ``matplotlib``.

Further, we'll also add text annotations in the lower left corner of each panel as a very brief description of what is shown.


ToDo: Keep this as an open-ended suggestion for how to make a fancy plot. Keep the hints and suggestions for individual parts, may be as a list (as there are probably many hints required for the below). Encourage to experient beyond what is shown.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.dates as mdates
import matplotlib.gridspec as gridspec

# width with cb: 6.48 / 0.9 = 7.2, leaving 2.8 for the temperature curves
fig = plt.figure(figsize=(10.0,4.32), dpi=96)

# Setup panels
gs = gridspec.GridSpec(3,4)
gs.update(hspace=0.0, wspace=0.0)

# Plot map
axm = plt.subplot(gs[:,:-1], projection=proj) # Map axes covers everything but the rightmost column
cs = axm.contourf(t_da.longitude, t_da.latitude, t_da.sel(time=showdate), 
            range(266,301,2),
            cmap='RdBu_r',
            transform=ccrs.PlateCarree(), # the projection of the data grid, here lat/lon, i.e. PlateCarree
)
axm.coastlines()
axm.gridlines(ylocs=range(15,76,15))

axm.set_xlim([-3.2e6, 3.2e6])
axm.set_ylim([-2.4e6, 2.4e6])
plt.colorbar(cs, fraction=0.1, shrink=0.9)

# Convert showdate to datetime, to then be able to format a string
datestr = pd.to_datetime(showdate).strftime('%Y-%m-%d, %H UTC')
axm.text(0.05, 0.05, datestr, transform=axm.transAxes, 
         bbox=dict(boxstyle="round", fc='w', alpha=0.7))

# Plot temperature time series, 3 weeks before and after the peak of the heat wave
showperiod = (showdate - np.timedelta64(21, 'D'), showdate + np.timedelta64(21, 'D'))
for sidx, station, name in zip(range(len(stations)), stations, stationnames):
    color = 'C%d' % sidx
    f = xr.open_dataset('d2s2/ei.ans.2010.800.%s.nc' % station)
    T = f['T'].squeeze() # Remove length-1 dimensions (lev,lat,lon), leaving only time in this case

    lat = f['lat'].values[0]
    lon = f['lon'].values[0]
    axm.scatter(lon, lat, s=100, facecolor=color, edgecolor='w', 
               linewidth=1, transform=ccrs.PlateCarree())    
    
    ax = plt.subplot(gs[sidx,-1])
    ax.plot([showdate, showdate], [278,296], color='0.2', linestyle='--') # vertical line
    slc = xr.ufuncs.logical_and(T.time >= showperiod[0], T.time <= showperiod[1])
    #T[slc].plot(color='red') 
    ax.plot(T.time[slc], T[slc], color)
    ax.yaxis.tick_right()
    
    if station == stations[-1]:
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%d'))
        ax.set_xlabel('Day of Jul/Aug 2010')
    else:
        ax.set_xticks([])
    
    ax.text(0.05, 0.05, name, color=color, transform=ax.transAxes)

# After all this work, finally, a title for the entire figure.
fig.suptitle('800 hPa Temperatures [K]')