Install the cdsapi package to download the data. We will need xarray to concatenate the data. netCDF4, eccodes, cfgrib, ecmwflibs, and netcdf4 are needed to open the GRIB file and also convert to NetCDF.

When you wish to execute shell commands in a jupyter notebook, start the command with "!". I am using a virtual environment to run this notebook, so it won't install the packages globally.

In [None]:
! python -m pip install cdsapi xarray pandas eccodes cfgrib ecmwflibs netcdf4

In [2]:
import cdsapi
import xarray as xr

In [3]:
c = cdsapi.Client()

As of writing this, due to the recent CDS server migrations and slow queues for downloading the data, we can only download a dataset one month at a time. Thus, we will have to loop through every month and year we wish to download. **Important**: Make sure to change the range in 'years_list' to the decades you wish to download. For example, if you're downloading 1951 - 1960, then the range would be range(1951, 1961). In this notebook, I decided to download city data from Ulaanbaatar, Mongolia. The coordinates are 30.2672° N, 97.7431° W, respectively, or 30.3 and -97.7, respectively. **Important**: Since our dataset is specific to land data, it's likely that if the region includes any oceans or large bodies of water, the temperature recorded at those pixels will be inaccurate or missing.

The link to the dataset: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land?tab=overview 

Dataset name: ERA5-Land hourly data from 1950 to present

In [6]:
years_list = [str(year) for year in range(2010, 2024)]
leap_years = [str(year) for year in range(2010, 2024, 4)]

make_two_digits = ['01', '02', '03', '04', '05', '06', '07', '08', '09'] # API accepts single digit days in this format

thirty_one = make_two_digits + [str(year) for year in range(10, 32)]
thirty = make_two_digits + [str(year) for year in range(10, 31)]
feb_days_leap = make_two_digits + [str(year) for year in range(10, 30)]
feb_days_non_leap = make_two_digits + [str(year) for year in range(10, 29)]

months_list = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12'] # API aceepts months in this format
month_days = {
	'01':thirty_one,
	'02':feb_days_non_leap,
	'03':thirty_one,
	'04':thirty,
	'05':thirty_one,
	'06':thirty,
	'07':thirty_one,
	'08':thirty_one,
	'09':thirty,
	'10':thirty_one,
	'11':thirty,
	'12':thirty_one,
}

# Currently downloading data for a model for Mongolia. These are the coords for its capital Ulaanbaatar.
city_long = 106.6
city_lat = 47.6

# Region boundaries for the capital. Will produce a 5x5 grid of values each time step. The spatial resolution of the dataset is 0.1° x 0.1°.
bottom_lat = str(round(city_lat - 0.2, 1))
top_lat = str(round(city_lat + 0.2, 1))
left_long = str(round(city_long - 0.2, 1))
right_long = str(round(city_long + 0.2, 1))

print(bottom_lat, top_lat, left_long, right_long)
print(leap_years)
print(thirty_one)
print(thirty)
print(feb_days_leap)
print(feb_days_non_leap)

47.4 47.8 106.4 106.8
['2010', '2014', '2018', '2022']
['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31']
['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30']
['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29']
['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28']


In [None]:
yearly_dataset = []

# IMPORTANT: Change 'ds_path' to wherever you originally store the data
ds_path = 'C:/Users/sguti/preprocessing/Scorched Earth Data/'

for year in years_list:
    # Name of the NetCDF file to convert to after concatenating all monthly datasets
    nc_filename = ds_path + 'Mongolia_2D_ERA5_' + year + '.nc'
    for month in months_list:
        # Take care of leap years
        if month == '02' and year in leap_years:
            month_days['02'] = feb_days_leap
        else:
            month_days['02'] = feb_days_non_leap
        # Name of the raw monthly GRIB file
        grib_filename = ds_path + 'Mongolia_2D_ERA5_' + year + '_' + month + '.grib'
        c.retrieve(
            'reanalysis-era5-land',
            {
                'format': 'grib',
                'variable': [
                    '2m_temperature', '2m_dewpoint_temperature', 'surface_pressure', '10m_u_component_of_wind', '10m_v_component_of_wind'
                ],
                'year': year,
                'month': month,
                'day': month_days[month],
                'time': [
                    '00:00', '01:00', '02:00',
                    '03:00', '04:00', '05:00',
                    '06:00', '07:00', '08:00',
                    '09:00', '10:00', '11:00',
                    '12:00', '13:00', '14:00',
                    '15:00', '16:00', '17:00',
                    '18:00', '19:00', '20:00',
                    '21:00', '22:00', '23:00',
                ],
                'area': [
                    top_lat, left_long, bottom_lat,
                    right_long,
                ],
            },
            grib_filename)
        
        # Open the GRIB file. Using 'with' will make sure to close the file before execution leaves the 'with' block.
        with xr.open_dataset(grib_filename) as ds:
            # The GRIB file data contains mostly empty labels [number, surface, valid_time]. We'll drop them before we concatenate with the yearly dataset to save space.
            monthly_dataset = ds.drop(['number', 'surface', 'valid_time'], dim=None)
            if len(yearly_dataset) == 0:
                # If yearly_dataset is empty, there's nothing to concatenate with
                yearly_dataset = monthly_dataset
            else: 
                # We will concatenate the two datasets along the time dimension.
                yearly_dataset = xr.concat([yearly_dataset, monthly_dataset], dim="time")
    # Finally, save the yearly_dataset to a NetCDF file. While GRIB is the native format, GRIB data is generally “messier” than data in a self-describing format, such as NetCDF. 
    print("Storing to... ", nc_filename)
    yearly_dataset.to_netcdf(nc_filename)
    yearly_dataset = []