# Climate Data Store Download Demo

Illustrating how to use the Climate Data Store (CDS) API to download data files from the CDS directly to HPC Orion. This example downloads files from the AgERA5 dataset.\|

Created: March 2024

By: Kerrie Geil, Associate Research Professor, Geosystems Research Institute, Mississippi State University

# Set up

Prerequisites for running this notebook include: 
1) Self register at the [CDS registration page](https://cds.climate.copernicus.eu/user/register?destination=%2F%23!%2Fhome) if you don't already have an account
2) Create a file called .cdsapirc containing the cdsapi url and your key to your home directory. [Instructions for installing the CDS API key](https://cds.climate.copernicus.eu/api-how-to#install-the-cds-api-key). Remember you can use the nano text editor to create files at the command line.
3) Create a conda environment with the cdsapi and ipykernel packages installed if you don't already have this
4) Create a jupyter kernel from the conda environment of the previous step if you don't already have this
5) Launch a jupyter session:
    - on a development node,
    - with kernel path in additional slurm parameters (e.g. /your-personal-work-directory/envs/cdsapi/share/jupyter)


In [None]:
import cdsapi
import os
import sys
import glob
import subprocess

In [None]:
basedir='/path/to/our/shared/datasets/dir/'
outputdir=basedir+'temporary/AgERA5/'
year_start=2015
year_end=2016

if not os.path.exists(outputdir):
    os.makedirs(outputdir)

Let's say we want to download daily 2m maximum temperature data for 3 years from the AgERA5 dataset. Go to the [CDS data download page for AgERA5](https://cds.climate.copernicus.eu/cdsapp#!/dataset/sis-agrometeorological-indicators?tab=form). Select 2m temperature, 24 hour maximum, a single year, all months, all days, the latest version, global geographic area, compressed tar file. Then scroll to the bottom of the page to see the API request. This API request is the code that we have copied below to which we have added some modifications.

Modifications
1) We change the name of the file from download.tar.gz and include the full path of where we want to download it to
2) We loop the API call by year in order to download multiple years. You will find that trying to get multiple years of daily data at once is too large a request for the Climate Data Store and for this reason we loop our requests so we don't run into the CDS request size limit.


In [None]:
# connect to the Climate Data Store
c=cdsapi.Client()

In [None]:
# api call
for year in range (year_start,year_end+1):
    c.retrieve(
    'sis-agrometeorological-indicators',
    {
        'variable': '2m_temperature',
        'year': str(year),
        'month': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
        ],
        'day': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
            '13', '14', '15',
            '16', '17', '18',
            '19', '20', '21',
            '22', '23', '24',
            '25', '26', '27',
            '28', '29', '30',
            '31',
        ],
        'version': '1_1',
        'format': 'tgz',
        'statistic': '24_hour_maximum',        
    },
    outputdir+'AgERA5_'+str(year)+'.tar.gz')

if you want to unpack the tar.gz files with python instead of at the command line you can do the following

In [None]:
# unzip/untar into directories by year
for year in range (year_start,year_end+1):
    print(year)
    
    # create the dir if it doesn't exist
    if not os.path.exists(outputdir+str(year)):
        os.makedirs(outputdir+str(year))
    
    # get the file name
    try:
        filename=glob.glob(outputdir+'AgERA5_'+str(year)+'.tar*')[0]
    except:
        sys.exit(str(year)+' problem finding file '+filename)
        
    # bash command to untar into the yearly directories  
    os.chdir(outputdir)
    subprocess.run(['tar', 'xf', filename, '-C', str(year)],check=True, text=True)

and the following will delete the tar files

In [None]:
# remove the tar files since we don't need them any more
for year in range (year_start,year_end+1):
    f=glob.glob(outputdir+'AgERA5_'+str(year)+'.tar*')[0]
    subprocess.run(['rm', f],check=True, text=True)