# ERA5

## Import packages

In [None]:
# To reload external files automatically (ex: utils)
%load_ext autoreload
%autoreload 2

import xarray as xr
import dask
import pandas as pd
import numpy as np
import calendar as cld
import matplotlib.pyplot as plt
import proplot as plot # New plot library (https://proplot.readthedocs.io/en/latest/)
plot.rc['savefig.dpi'] = 300 # 1200 is too big! #https://proplot.readthedocs.io/en/latest/basics.html#Creating-figures
from scipy import stats
import xesmf as xe # For regridding (https://xesmf.readthedocs.io/en/latest/)

# Import some extra functions from utils folder
import sys
sys.path.insert(1, 'utils') # to include the util directory
import utils as u # my personal functions
u.check_python_version()
u.check_virtual_memory()

## Download ERA5

1. Go to https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5
2. Search `ERA5 monthly averaged data on single levels from 1979 to present`
3. Download:
    - Product type: Monthly averaged reanalysis
    - Variable: 2m temperature
    - Select all years (except 2021) / months / time
    - Geographical area: Whole available region
    - Format: NetCDF (experimental)
4. Login/register to submit request (create an account if you don't have one)
5. Go back down on the page and click on Submit Form
6. Click on download, cancel, then right click on the download button and copy the link path, then paste it on the cell bellow besides the `wget` command

Remark: It should be around 1Go

Other Remark: using cdsapi...

Copernicus also provides a python module `cdsapi` to handle files download. 

On the same screeen where you clicked `download`, clicking `Show API request`provides you the code copied in the next cell. *Here, it is commented out, because cdsapi does not work in binder. But it would work on a local installation. see https://cds.climate.copernicus.eu/api-how-to for details*

This API ([Application Programming Interface](https://www.howtogeek.com/343877/what-is-an-api/)) is valid for the entire Copernicus climate and atmosphere data store, and particularly useful for systematic downloads



In [None]:
"""
import cdsapi

c = cdsapi.Client()

c.retrieve(
    'reanalysis-era5-single-levels-monthly-means',
    {
        'product_type': 'monthly_averaged_reanalysis',
        'variable': '2m_temperature',
        'year': [
            '1979', '1980', '1981',
            '1982', '1983', '1984',
            '1985', '1986', '1987',
            '1988', '1989', '1990',
            '1991', '1992', '1993',
            '1994', '1995', '1996',
            '1997', '1998', '1999',
            '2000', '2001', '2002',
            '2003', '2004', '2005',
            '2006', '2007', '2008',
            '2009', '2010', '2011',
            '2012', '2013', '2014',
            '2015', '2016', '2017',
            '2018', '2019', '2020',
            '2021',
        ],
        'month': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
        ],
        'time': '00:00',
        'format': 'netcdf',
    },
    'download.nc')


"""

In [None]:
!wget https://download-0008.copernicus-climate.eu/cache-compute-0008/cache/data6/adaptor.mars.internal-1642500706.3691368-19649-5-5a435782-6700-4aa6-991b-f43bad55b9eb.nc

7. Rename the downloaded file to `ERA5.nc`

## Read ERA5 file

In [None]:
ds = xr.open_dataset('ERA5.nc')
ds # ds as dataset

In [None]:
# Let's try to get the temperature in °C (uncomment this cell and run this once)
# da = ds.t2m - 273.15
# da

Oups if you are on Binder the kernel as shut down! This is because we have a very limited available amount of RAM (2Go). So we are going to have to trick to make this computation... Hopefully `xarray` comes with `dask` that allows easy parallel computation. Here we are not going to use the parallelization, but we are going to take advantage of Dask for splitting our data into multiple chunks to reduce to RAM usage!

See more: http://xarray.pydata.org/en/stable/user-guide/dask.html

## Check size

In [None]:
# https://stackoverflow.com/a/14822210/6344670
import math

def convert_size(size_bytes):
    if size_bytes == 0:
        return "0B"
    size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
    i = int(math.floor(math.log(size_bytes, 1024)))
    p = math.pow(1024, i)
    s = round(size_bytes / p, 2)
    return "%s %s" % (s, size_name[i])

In [None]:
convert_size(ds.t2m.nbytes)

## Make chunks
http://xarray.pydata.org/en/stable/user-guide/dask.html#what-is-a-dask-array

![](http://xarray.pydata.org/en/stable/_images/dask_array.png)

In [None]:
ds.dims

In [None]:
ds = ds.chunk(chunks={"longitude": 360, "latitude": 360})
ds

In [None]:
ds.t2m.data

You can see that our dataset have been splited into 12 chunks of about 250 MB what is better than the 2 Go full array! So let's try back to convert to °C!

In [None]:
da = ds.t2m - 273.15
da

As you can see the execution of the cell is almost instantaneous. Why is this?Because the advantage of `dask.array` is that it is not loaded into memory until explicitly requested with `.load()`, `.compute()` or `.values`. It only builds graphs to prepare the computation. 

In [None]:
dask.visualize(da)

## Compute and plot climatology
Using the examples in the notebook `01_xarray_get_started.ipynb` try to calculate the climatology and make a graph with a projection (directly with Cartopy or Proplot). 

Remember to check the size of your `dask.array` before loading it into memory (`clim.load()`), which will make it easier to produce the graph (otherwise it will redo the calculation every time you make a graph).

In [None]:
clim = da.mean('time')
clim

In [None]:
clim.load()

In [None]:
clim.plot(robust=True)

### Exercise
Try to make a figure with a geographical projection of your choice using proplot (or directly matplotlib/cartopy). See back `01_xarray_get_started.ipynb` to help you.

### Solution
Example of a solution with proplot

In [None]:
cmap='RdBu_r'
levels=plot.arange(-30,30,5)
extend='both'

fig, axs = plot.subplots(nrows=1, ncols=1, proj='cyl', axwidth=5)

axs[0].contourf(
    clim, colorbar='r', cmap=cmap, levels=levels, extend=extend, 
    colorbar_kw={'label': 'Near-surface air temperature [°C]'}
)

axs.format(
    labels=True, coast=True, borders=True,
    suptitle='ERA5 near-surface air temperature climatology (1979-2020)'
)

## Seasonal and regional plots
### Exercise
Try to make seasonal climatology plots focused on the country you come from.

Since the longitude data goes from 0 to 360, it is a bit more complicated if your region is around longitude 0. Two solutions, either you use the `.roll()` function to shift your whole dataset, or you use a mask with `.where()` (this last solution seems to me the easiest). You can also use the option `globe=True` in your `contourf()` with proplot to fill the 0 longitude.

Also note that the latitudes are in descending order. So you have to reverse the values in the `.slice()`

### Solution
Example over France

#### Select zone

In [None]:
cmap='RdBu_r'
levels=plot.arange(-4,20,2)
extend='both'
latmin=38 ; latmax=56 ; lonmin=-10 ; lonmax=15

fig, axs = plot.subplots(nrows=1, ncols=1, proj='cyl', axwidth=4)

axs[0].contourf(
    clim.sel(latitude=slice(latmax,latmin)).where( (clim.longitude > 360+lonmin) | (clim.longitude < lonmax)), 
    colorbar='r', cmap=cmap, levels=levels, extend=extend, globe=True, 
    norm='div', # norm=plot.Norm('diverging', fair=False),
    colorbar_kw={'label': 'Near-surface air temperature [°C]'}
)

axs.format(
    labels=True, coast=True, borders=True, reso='med',
    latlim=(latmin, latmax), lonlim=(lonmin, lonmax),
    suptitle='France annual climatology ERA5 (1979-2020)'
)

#### Compute seasonal climatologies

In [None]:
clim_seas = da.sel(latitude=slice(latmax,latmin)).where( (clim.longitude > 360+lonmin) | (clim.longitude < lonmax)) \
                .groupby('time.season').mean('time').load()

#### Plot seasonal climatologies

In [None]:
cmap='RdBu_r'
levels=plot.arange(-15,25,2)
extend='both'

fig, axs = plot.subplots(nrows=2, ncols=2, proj='cyl')

seasons = ['DJF', 'MAM', 'JJA', 'SON']
for i, ax in enumerate(axs):
    m = ax.contourf(
        clim_seas.sel(season=seasons[i]), cmap=cmap, levels=levels, extend=extend, globe=True, 
        norm=plot.Norm('diverging', fair=False)
    )
    ax.format(title=seasons[i])

axs.format(
    labels=True, coast=True, abc=True, reso='med',
    latlim=(latmin, latmax), lonlim=(lonmin, lonmax),
    suptitle='France seasonal climatologies ERA5 (1979-2020)'
)

fig.colorbar(m, label='Near-Surface Air Temperature [°C]')
fig.save('img/france_seasonal_clim_t2m.jpg')

## Share your plots!

Go to the following link and upload your figure for your country!

https://app.mural.co/t/variabiliteclimatique4363/m/variabiliteclimatique4363/1638958705203/74c3a90eb11e8e6898545872d80b8a41f0cd90ff?sender=ufcbfba826e94d93c633c7410

## Compute trends

### Exercise: Make yearly mean

Start by resampling the data to annual frequency.

In [None]:
da_year = ###

### Solution

In [None]:
da_year = da.resample(time='Y').mean('time')
da_year.load()

### Exercise: Spatially averaged time series
Make spatial average.

In [None]:
ts = ###

### Solution

In [None]:
# Spatially averaged time series (I use a function in the utils folder, same as seen in the tutorial)
ts = u.spatial_average(da_year)
ts.load()

### Plot it

In [None]:
ts.plot()

### Make linear regression
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html

In [None]:
# Make linear regression
reg = stats.linregress(ts['time.year'], ts)
reg

In [None]:
fig, axs = plot.subplots(axwidth=5, aspect=2)

x = ts['time.year']

# Plot time serie
axs[0].plot(x, ts)

# Plot regression
y = reg.slope*x + reg.intercept
axs[0].plot(x, y, color='k', linewidth=1, linestyle='--')

# Show regression
axs[0].format(
    ultitle='{:.2f} °C/dec (p-value: {:.2f})'.format(reg.slope*10, reg.pvalue)
)

axs.format(
    xlabel='year',
    ylabel='Near-Surface Air Temperature [°C]',
    suptitle='ERA5 global temperature time serie'
)

### Spatial trends
It would be very long to make a loop on each lat/lon, so we can vectorize the calculation with `apply_ufunc` of xarray

In [None]:
def trend(x, y, dim):
    return xr.apply_ufunc(
        stats.linregress, x, y,
        input_core_dims=[[dim], [dim]],
        output_core_dims=[[], [], [], [], []],
        vectorize=True
    )

In [None]:
%%time
for arr_name, arr in zip(
    ['slope', 'intercept', 'rvalue', 'pvalue', 'stderr'], 
    trend(da_year['time.year'], da_year, 'time')
):
    da_year[arr_name] = arr

In [None]:
fig, axs = plot.subplots(proj='cyl', axwidth=6)

cmap='ColdHot'
levels=plot.arange(-1,1,0.1)
extend='both'

m = axs[0].contourf(da_year.slope*10, cmap=cmap, levels=levels, extend=extend)
axs[0].contourf(da_year.pvalue.where(da_year.pvalue>0.05), hatches=['////'], alpha=0)

fig.colorbar(m, label='Near-Surface Air Temperature trends [°C/dec]', formatter=('simple', 3))

# Format
axs.format(
    labels=True, coast=True,
    suptitle='ERA5 Near-Surface Air Temperature annual trends (1979-2020)'
)

### Exercise: Show Arctic Amplification
Try to show Arctic Amplification with time series

### Solution

In [None]:
ts_arctic = u.spatial_average(da_year.sel(latitude=slice(90,60)))

In [None]:
fig, axs = plot.subplots(axwidth=5, aspect=2)

x = ts['time.year']

ts_list = [ts, ts_arctic]
labels = ['Global', 'Arctic (>60°N)']

for i, t in enumerate(ts_list):
    # Compute clim on reference period to make anomalies
    clim = t.sel(time=slice('1979', '2000')).mean()
    
    # Plot time serie
    axs[0].plot(x, t-clim, color='C'+str(i), label=labels[i])

    # Plot regression
    reg = stats.linregress(t['time.year'], t-clim)
    y = reg.slope*x + reg.intercept
    axs[0].plot(x, y, color='C'+str(i), linewidth=1, linestyle='--', label='{:.2f} °C/dec ({:.2f})'.format(reg.slope*10, reg.pvalue))

    
axs[0].legend(ncol=2)

# Add 0 line
axs[0].hlines(0, x.min(), x.max(), linewidth=0.5, alpha=0.5)

axs.format(
    xlabel='year',
    ylabel='Temperature anomalies [°C]',
    suptitle='ERA5 global temperature anomalies (with respect to 1979-2000)',
    xlim=(x.min(), x.max())
)