# ERA5 Datasets

In this colab, we will describe the ERA5 datasets associated with the GenFocal 
paper and present examples of how to load and visualize each dataset.  These
datasets have been sampled to match the LENS2 grid.  




First, we need to install some packages so we can access the datasets in Google Cloud and visualize them.

In [None]:
# @title
!pip install -q zarr xarray[complete] fsspec aiohttp requests gcsfs cartopy \
  cfgrib eccodes cf_xarray pint_xarray


In [None]:
# @title Imports
import h5py
import gcsfs
import matplotlib.pyplot as plt
from google.colab import auth
from google.cloud import storage

from datetime import datetime
import pandas as pd
import numpy as np
from cartopy import config
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import xarray as xr
import cf_xarray.units
import pint_xarray


In [None]:
# @title Plotting Functions

def plot_scalar(temp_data1, lat_min, lat_max, lon_min, lon_max):
    """
    Plots two scalar arrays on the same plot with a shared colorbar.

    Args:
        temp_data1: xarray DataArray of the first temperature data.
        temp_data2: xarray DataArray of the second temperature data.
        lat_min: Minimum latitude for the plot.
        lat_max: Maximum latitude for the plot.
        lon_min: Minimum longitude for the plot.
        lon_max: Maximum longitude for the plot.
    """

    fig, axs = plt.subplots(nrows=1, ncols=1,
                            subplot_kw={'projection': ccrs.PlateCarree()},
                            figsize=(12, 6))

    vmin = temp_data1.min()
    vmax = temp_data1.max()

    # Plot the temperature data
    im1 = temp_data1.plot(
        ax=axs, transform=ccrs.PlateCarree(), add_colorbar=False,
           x='longitude', y='latitude',
           vmin=vmin, vmax=vmax,
           cmap='viridis'
    )

    # Add coastlines and gridlines
    axs.coastlines()
    axs.add_feature(cfeature.BORDERS)
    # Set plot extent
    axs.set_extent([lon_min, lon_max, lat_min, lat_max], crs=ccrs.PlateCarree())
    cbar = plt.colorbar(im1, ax=axs, shrink=0.7)  # Use im1 for the colorbar
    cbar.set_label('Temperature (K)')

    plt.show()


We now need to authenticate with Google Cloud so we can access the GenFocal bucket

In [None]:
auth.authenticate_user()

## ERA5 Datasets
We include the copy of the ERA5 dataset used to train the lens2 model (era5_1980_2020_dataset),
and a small subset  (era5_2010_2011_dataset) for demonstration purposes.

In [None]:
era5_1980_2020_dataset = xr.open_zarr(
            "gs://genfocal/data/era5/era5_240x121_lonlat_1980-2020_10_vars.zarr",
            consolidated=True
)

era5_1980_2020_dataset

In [None]:
era5_2010_2011_dataset = xr.open_zarr(
            "gs://genfocal/data/era5/era5_240x121_lonlat_2010-2011_10_vars.zarr",
            consolidated=True
)

era5_2010_2011_dataset

In [None]:
# @title Example Plot
surface_variable_name = "2m_temperature" # @param ["2m_temperature","geopotential","mean_sea_level_pressure"]
date = "2015-08-01" # @param {type:"date"}
time_slice=slice(date, date)

scalar_array_daily = era5_1980_2020_dataset[surface_variable_name].sel(time=time_slice).squeeze().compute()
era5_lat = era5_2010_2011_dataset.latitude
era5_lon = era5_2010_2011_dataset.longitude
plot_scalar(scalar_array_daily, era5_lat.min(), era5_lat.max(),
             era5_lon.min(), era5_lon.max())
