# Demo 1.2.1 - Downloading Data

In this demo, we will download some available ERA5 data to be used for the model predictions.
We will use the CDSAPI. 
You will need an account so follow [these instructions](https://cds.climate.copernicus.eu/api-how-to) to get started.

In this demo, we will do the following:
* Get the necessary variables for each model
* Download the Single Level Variables from the CDS
* Download the Multiple Level Variables from the CDS

In this demo, we do **not** do:
* Download Multiple Time stamps
* Smart Downloading of Spatiotemporal Fields
* Use other APIs

In [1]:
import autoroot
import ee
import xarray as xr
import cdsapi
import climetlab as cml
from pathlib import Path
import pprint
from ddf._src.data.era5.ops import parse_single_levels, parse_pressure_levels, parse_all_variables
from ddf._src.models.variables import EARTH2MIP_MODEL_VARIABLES
from ddf._src.variables.era5 import VARIABLE_CODES

%load_ext autoreload
%autoreload 2

  from .autonotebook import tqdm as notebook_tqdm


### Model Variables

Firstly, let's gather all of the variables needed for each of the models we will do experiments for.
In this case, we only have the following available:
* `pangu`
* `fcnv2_sm`

We will get the set of all "channel names" for each model.

In [2]:
all_variables_names = list(set(EARTH2MIP_MODEL_VARIABLES["pangu"] + EARTH2MIP_MODEL_VARIABLES["fcnv2_sm"]))

Now, we will use a custom parser that will go through and try to figure out which ones are surface variables and which ones are pressure level variables.

In [3]:
# parse single level variables
sl_variables = parse_single_levels(all_variables_names)

# parse pressure level variables
pl_variables = parse_pressure_levels(all_variables_names)

### Inspection

This will create a custom datastructure which tries to capture all of the information of that variable with all of the naming conventions.

In [4]:
sl_variables[0]

VWind10m(name='v10m', short_name='v10', long_name='10 metre V wind component', era5_name='10m_v_component_of_wind', ecmwf_gid=166, units='meters / second')

In [5]:
pl_variables[0]

WindU(short_name='u', long_name='U Component of Wind', era5_name='u_component_of_wind', standard_name='eastward_wind', ecmwf_gid=131, cmip_name='', level=1000, units='meters / second')

For example, we can check to see which are the short name variables based on the `short_name` property of each data structure.

In [6]:
set(map(lambda x: x.short_name, pl_variables))

{'q', 'r', 't', 'u', 'v', 'z'}

Another example: we can check to see which are all of the levels that are named at least once for all of the variables.
This does not mean that we need each level per variable (although very likely) but this makes it easier to download.

In [7]:
set(map(lambda x: x.level, pl_variables))

{50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000}

### Creating Requests

Now, we will need to create requests via the CDSAPI.
They have a standard procedure but I attempted to automate this a bit.
Here, we will need to separate the requests:
* single level variables
* multi-level variables

In [8]:
from ddf._src.dtypes.time import Time
from ddf._src.dtypes.grid import Grid, RES025
from ddf._src.dtypes.region import Region, GLOBE
from ddf._src.data.era5.download import (
    create_request_single_level, 
    create_request_pressure_level,
    create_request_single_level_multi,
    create_request_pressure_level_multi
)

Some synatic sugar to define some key things that we need.
The most important would be the time.
For our API, we save everything via a timestamp and day.
Something else you may need would be some boundaries as well as some grid size.

In [9]:
import datetime

client = None
d = None

time = datetime.datetime(year=2021, month=8, day=1)
time = Time(time) 
format = "netcdf"
region = GLOBE
grid = RES025

### CDS Requests - Single Levels

Now, we can generate the dictionary that's needed to make the request via the API.
I also spit out a naming convention based on the time.
However, feel free to change this as you see fit.

In [10]:
# create request

dataset, request, save_name = create_request_single_level_multi(
    sl_variables, 
    time=time,
    region=region,
    grid=grid,
    save_format="netcdf"
    
)
dataset, pprint.pprint(request), save_name

{'area': (90, -180, -90, 180),
 'day': ['01'],
 'format': 'netcdf',
 'grid': (0.25, 0.25),
 'month': ['08'],
 'param': '166/137/165/167/228247/151/228246/134',
 'product_type': 'reanalysis',
 'time': ['00:00'],
 'year': ['2021']}


('reanalysis-era5-single-levels', None, 'reanalysis-202108010000-sl.nc')

Side note: we can verify that the request makes sense by inspecting the variables based on the parameter codes.

In [11]:
print(f"Variable Names:", set(map(lambda x: x.short_name, sl_variables)))
vars = list(map(lambda x: VARIABLE_CODES[int(x)]().short_name, request["param"].split("/")))
print(f"Requests Names:", vars)

Variable Names: {'u10', 'u100', 'tcwv', 'v100', 'msl', 'v10', 't2m', 'sp'}
Requests Names: ['v10', 'tcwv', 'u10', 't2m', 'v100', 'msl', 'u100', 'sp']


Now, we can make a request by using the CDSAPI client that they provide.

In [12]:
c = cdsapi.Client()
save_dir = "/pool/proyectos/CLINT/sa4attrs/data/raw/events/test"
save_path = Path(save_dir).joinpath(save_name)
c.retrieve(dataset, request, save_path) 

2024-06-26 17:52:52,665 INFO Welcome to the CDS
2024-06-26 17:52:52,666 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-single-levels
2024-06-26 17:52:52,737 INFO Request is queued
2024-06-26 17:52:53,772 INFO Request is running
2024-06-26 17:53:01,005 INFO Request is completed
2024-06-26 17:53:01,005 INFO Downloading https://download-0015-clone.copernicus-climate.eu/cache-compute-0015/cache/data6/adaptor.mars.internal-1719417178.5173264-8212-15-f5c94c85-8341-49ba-b7c0-4e758b407091.nc to /pool/proyectos/CLINT/sa4attrs/data/raw/events/test/reanalysis-202108010000-sl.nc (15.9M)
2024-06-26 17:53:01,728 INFO Download rate 22M/s    


Result(content_length=16623508,content_type=application/x-netcdf,location=https://download-0015-clone.copernicus-climate.eu/cache-compute-0015/cache/data6/adaptor.mars.internal-1719417178.5173264-8212-15-f5c94c85-8341-49ba-b7c0-4e758b407091.nc)

### CDS Requests - Multiple Pressure Levels

We will do the same thing except now we will do this for the multiple pressure level variables.
All of the steps are exactly the same as the above section.

In [13]:
dataset, request, save_name = create_request_pressure_level_multi(
    codes=pl_variables, 
    time=time, 
    save_format="netcdf",
)
pprint.pprint(request), save_name

{'area': (90, -180, -90, 180),
 'day': ['01'],
 'format': 'netcdf',
 'grid': (0.25, 0.25),
 'month': ['08'],
 'param': '129/133/130/132/157/131',
 'pressure_level': [100,
                    1000,
                    200,
                    300,
                    400,
                    50,
                    850,
                    500,
                    150,
                    600,
                    250,
                    700,
                    925],
 'product_type': 'reanalysis',
 'time': ['00:00'],
 'year': ['2021']}


(None, 'reanalysis-202108010000-pl.nc')

Again, just to check..

In [18]:
print(f"Variable Names:", set(map(lambda x: x.short_name, pl_variables)))
vars = list(map(lambda x: VARIABLE_CODES[int(x)]().short_name, request["param"].split("/")))
print(f"Requests Names:", vars)

Variable Names: {'q', 't', 'r', 'u', 'z', 'v'}
Requests Names: ['z', 'q', 't', 'v', 'r', 'u']


In [17]:
print(f"Variable Names:", set(map(lambda x: x.level, pl_variables)))
print(f"Requests Names:", request["pressure_level"])

Variable Names: {100, 1000, 200, 300, 400, 50, 850, 500, 150, 600, 250, 700, 925}
Requests Names: [100, 1000, 200, 300, 400, 50, 850, 500, 150, 600, 250, 700, 925]


In [15]:
c = cdsapi.Client()
save_dir = "/pool/proyectos/CLINT/sa4attrs/data/raw/events/test"
save_path = Path(save_dir).joinpath(save_name)
c.retrieve(dataset, request, save_path) 

2024-06-26 17:53:06,927 INFO Welcome to the CDS
2024-06-26 17:53:06,927 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-pressure-levels
2024-06-26 17:53:06,991 INFO Request is queued
2024-06-26 17:53:09,560 INFO Request is running
2024-06-26 17:53:20,358 INFO Request is completed
2024-06-26 17:53:20,359 INFO Downloading https://download-0011-clone.copernicus-climate.eu/cache-compute-0011/cache/data6/adaptor.mars.internal-1719417195.7378318-312-6-13b124be-dc8a-4901-b409-d79bff872232.nc to /pool/proyectos/CLINT/sa4attrs/data/raw/events/test/reanalysis-202108010000-pl.nc (154.5M)
2024-06-26 17:53:28,856 INFO Download rate 18.2M/s 


Result(content_length=161976836,content_type=application/x-netcdf,location=https://download-0011-clone.copernicus-climate.eu/cache-compute-0011/cache/data6/adaptor.mars.internal-1719417195.7378318-312-6-13b124be-dc8a-4901-b409-d79bff872232.nc)