# Day 3, after lunch: Gridded data analysis and plotting
Focus: netCDF files. Very useful standard, because
 * Provide meta information
 * Provide structure (dimensions versus variables)

We will work with temperatures at 850 hPa from the ERA5 reanalysis; (c) ECMWF. Data accessed through the Copernicus Climate Data Store (2020) and licensed under the Licence to use Copernicus Products.

## Raw access to netCDF

Using the ``netCDF4`` library.

In [None]:
import netCDF4 as nc

# interactive, code-along

So far, so easy. But what's in this file?

In [None]:
# interactive, code-along

Retrieving variables is straight-forward, but one has to be aware of a subtlety:

In [None]:
# Retrieving the variable object versus the data

# interactive, code-along

There is a difference between the variable object ``temp`` and the ``numpy.ndarray`` that contains the actual data! The variable object contains all meta-data while the ``numpy.ndarray`` does not.

In [None]:
# interactive, code-along

This raw access is not super difficult or cumbersome, but may be there is an even simpler way?

## Access through ``xarray``

The xarray package is very convenient for working with netCDF data, because it merges the variable object and the data array in one structure. 

In [None]:
import xarray as xr

# interactive, code-along

The access pattern is very much the same as through the ``netCDF4`` module. However, there is the additional option of accessing ``f['t']`` which contains the both the data and the metadata and can directly be used for analysis and calculations.

For example, the ``DataArray`` allows array indexing through dimension values rather than grid point indexes.

In [None]:
# interactive, code-along

The results of these indexing operations are again of type ``DataArray``, and can hence be further manipulated and analysed.

In addition, they contain up-to-date metadata, so they can directly be saved to netCDF files.

In [None]:
from datetime import datetime as dt

# interactive, code-along

If you have ncdump available, you can also have a direct look at what we saved, and to verify that all metadata has automatically been transferred (and even adapted!) to correctly describe the new netCDF file.

In [None]:
# Won't work here, because ncdump is only available with another module
# So may be open a second terminal, ssh into cyclone, module load ncview 
# and have a look with ncview and ncdump at the example.nc you created
!ncdump -h example.nc

## Plotting maps

Now that we found an easy way to read and analyse gridded data, let's see how to visualise it. One of the most common types of visualisation is through maps. To plot data on a map, we use the ``cartopy`` module, which is based on ``matplotlib``.

So let's see how to set up a map to plot on.

In [None]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt

# interactive, code-along

A lot of text for an empty hull of a map. Now let's fill it with some data. Fortunately, that is very easy, using the standard plotting commands of ``matplotlib``, like ``contourf`` for a filled-contour plot.

In [None]:
# interactive, code-along

This temperature data is from the peak of the Russian heat wave in 2010, so let's shift the map focus towards Eurasia, and mark some cities on the map. We'll use the cities where we have temperature time series available.

For now we're only interested in the location of these cities (given in the meta data of the netCDF file), but later we'll also include the temperature time series in the plot.

As a first step, we'll have a brief exploratory look at the data. It's also netCDF files, so let's use ``xarray`` again.

In [None]:
ds = xr.open_dataset('d2s2/e5.ans.2010.850.Moscow.nc')
ds

So, we got three four-dimensional arrays with the dimensions time, level, latitude and longitude. All dimensions except time are of length 1, so we essentially have three time series from one location. Let's have a quick look at the temperature data before we continue.

In [None]:
# interactive, code-along

Now that we got a first impression of the data, let's put the cities on the map.

## Exercise 1: Plotting city locations on the map

 * Adapt the map domain to show Eurasia rather than the North Atlantic
 * Take the location information for the three cities Moscow, St. Petersburg and Nizhny Novgorod and mark the respective locations on the map in different colors.

In [None]:
# new projection, centred roughly on Moscow
proj = ccrs.LambertConformal(
    central_longitude=30, 
    central_latitude=55, 
    standard_parallels=(55, 55)
)

# do the plotting

## Final plot exercise: Fancy plot with subpanels 

Finally, extend the plot by adding panels showing the temperature time series at these three locations.

 * Set up a 4x3 matrix of subplots, using the entire 3x3 square on the left for the map, and the 1x3 column on the right for the three temperature time series. This kind of panel setup can be achieved through the ``gridspec`` module of ``matplotlib``.
 * Add annotations to each panel to give a brief description of what is shown. ``plt.text`` provides the required functionality.
 * Match the colors of the temperature time series to the color of the markers on the map. 
 * You can add a title for the entire figure rather than the individual panels by ``matplotlib.figure.suptitle``. 

Don't worry if you do not finish this exercise during the session, and feel very much free to experiment making your own kind of fancy plot!

In [None]:
import numpy as np
import matplotlib.dates as mdates
import matplotlib.gridspec as gridspec
from datetime import timedelta as td

# width with cb: 6.48 / 0.9 = 7.2, leaving 2.8 for the temperature curves
fig = plt.figure(figsize=(10.0,4.32), dpi=96)

# Setup panels
gs = gridspec.GridSpec(3,4) # set up a 3-by-4 grid of subpanels
gs.update(hspace=0.0, wspace=0.0) # no space between subpanels

# do the plotting