# Carbon Plan CMIP6 downscaling example

In this notebook I'll show you a simple way to access carbonplan's downscaled CMIP6 data. This takes heavy inspiration from their notebook, [here](https://github.com/carbonplan/cmip6-downscaling/blob/main/notebooks/accessing_data_example.ipynb).

The first thing we're going to do is import the relevant python libraries

In [35]:
import intake
import matplotlib.pyplot as plt
import numpy as np
import pdb

Now let's open the Carbon Plan datastore as a catalogue, so that I can subset it for the data I need:

In [36]:
cat = intake.open_esm_datastore(
    "https://cpdataeuwest.blob.core.windows.net/cp-cmip/version1/catalogs/global-downscaled-cmip6.json"
)

In the next section I'm looking at what data is available in the entire catalogue:

In [37]:

cat_df = cat.df #First convert the catalogue to a pandas dataframe

#Look for unique entries in each of the columns of the dataframe

models_avail = np.unique([key for key in cat_df['source_id']])

methods_avail = np.unique([key for key in cat_df['method']])

exps_avail = np.unique([key for key in cat_df['experiment_id']])

vars_avail = np.unique([key for key in cat_df['variable_id']])

Now I have an idea of all the unique models that are available, all the methods etc. But it's going to turn out that not all the downscaling methods are available for each of the models, so we have to check what's available. To do that, I'm going to use my list of models to check what's available for each of them, and print out the results:

In [38]:
#Loop over each of the models available
for model_to_check in models_avail:

    #Search the catalogue for all the data related to that specific model
    cat_subset = cat.search(
        source_id = model_to_check,
    )

    #Turn it into a pandas dataframe again
    cat_subset_df = cat_subset.df

    #Summarise the data available for that model:

    methods_avail_subs = np.unique([key for key in cat_subset_df['method']])
    exps_avail_subs = np.unique([key for key in cat_subset_df['experiment_id']])
    vars_avail_subs = np.unique([key for key in cat_subset_df['variable_id']])

    print(f'For the {model_to_check} climate model, the following data is available:') 
    print(f'The downscaling methods available are {methods_avail_subs}')
    print(f'The experiments that are available are {exps_avail_subs}')
    print(f'The downscaled variables that are available are {vars_avail_subs}')
    print(f'\n')

For the BCC-CSM2-MR climate model, the following data is available:
The downscaling methods available are ['GARD-SV']
The experiments that are available are ['historical' 'ssp245' 'ssp370' 'ssp585']
The downscaled variables that are available are ['tasmax' 'tasmin']


For the CanESM5 climate model, the following data is available:
The downscaling methods available are ['DeepSD' 'DeepSD-BC' 'GARD-SV']
The experiments that are available are ['historical' 'ssp245' 'ssp370' 'ssp585']
The downscaled variables that are available are ['pr' 'tasmax' 'tasmin']


For the MIROC6 climate model, the following data is available:
The downscaling methods available are ['GARD-SV']
The experiments that are available are ['historical' 'ssp245' 'ssp370' 'ssp585']
The downscaled variables that are available are ['pr' 'tasmax' 'tasmin']


For the MPI-ESM1-2-HR climate model, the following data is available:
The downscaling methods available are ['GARD-SV']
The experiments that are available are ['historical

Now that we know what's available, let's choose a specific model, specific downscaling method, a specific experiment and a specfic variable to look at.

In [39]:
model_to_search = 'MIROC6' #This is the name of the climate model
method_to_search = 'GARD-SV' #This is the downscaling method to be used
experiment_to_search = 'historical' #This is the name of the CMIP6 experiment I want to look at
variable_to_search = 'tasmax' #This is the name of the variable I want to look at, e.g. tasmax, which is the maximum surface air temperature.

In this project we're interested in how the weather changes at the locations of our sporting events. With that in mind, let's choose the latitude and longitude of our event, and the season we want to study (be careful when considering southern hemisphere vs northern hemisphere!)

In [40]:
latitude_of_location = 50.7260
longitude_of_location = -3.5275
season_to_study ='JJA'

Now that I've made all these specifications, let's begin by searching the catalogue for data that match my criteria. Note that if you comment out any of the lines in the following then you'll see more datasets. E.g. if you comment out the line below specifying `experiment_id=experiment_to_search` then the catalogue search will return all the experiments that are available, rather than just one. This might be useful to you later on.

In [41]:
#Search the catalogue for my search criteria
cat_search_subset = cat.search(
    source_id=model_to_search, 
    method=method_to_search, 
    experiment_id=experiment_to_search, 
    variable_id=variable_to_search,
)

#Turn my catalogue search into xarray datasets
dsets = cat_search_subset.to_dataset_dict()


--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.timescale.method'


Sometimes I might make a search that doesn't correspond to any data (e.g. if I choose a downscaling method that isn't available for the climate model I've chosen. Let's check if that's the case or not:

In [42]:
#Perform a check to see if I have any results that match my search.
if len(dsets)==0:
    raise NotImplementedError('The catalogue search that you specified has returned zero results. One reason for this could be that you have mis-spelled a variable name somewhere, or that the data does not exist in the archive')
else:
    print(f'You have found {len(dsets)} datasets that match your search criteria. These are:')
    for dset_name in dsets.keys():
        print(f'{dset_name} \n')

You have found 1 datasets that match your search criteria. These are:
CMIP.MIROC.MIROC6.historical.day.GARD-SV 



Now for the intensive part. Now that I've made all of my specifications, it's time to download the data and put it into a csv file for me to analyse. 

In [43]:
#Loop over each dataset in my list to find a timeseries at that location, subset it for summer
#and write it out to a csv file.

for dset_name in dsets.keys():

    ds = dsets[dset_name] #open the relevant dataset

    exp_id_of_ds = ds.attrs['experiment_id'] #find out which experiement it is

    member_to_use = ds['member_id'].values[0] #Always look at the first ensemble member

    #Actually subset the data based on location and ensemble member
    ds_at_location = ds[variable_to_search].sel(lat=latitude_of_location, lon=longitude_of_location, method='nearest').sel(member_id=member_to_use)


    #Specify the time periods I want to look at
    if exp_id_of_ds=='historical':
        #If using historical data then just look at whole thing
        time_slice = ds_at_location.sel(time=slice('1950-01-01', '2000-12-31'))
    else:
        #If using a climate change scenario then look at the end of the century        
        time_slice = ds_at_location.sel(time=slice('2080-01-01', '2099-12-31'))

    #Select only the summer months (JJA in the northern hemisphere)
    time_slice_summer = time_slice.where(time_slice.time.dt.season==season_to_study, drop=True)

    #Specify the name of my output file
    filename_out = f'{dset_name}_{latitude_of_location:.2f}_{longitude_of_location:.2f}_{season_to_study}.csv'

    #Send it to a csv file
    time_slice_summer.to_dataframe().to_csv(filename_out)
    print(f'File sucessfully written with name {filename_out}')

File sucessfully written with name CMIP.MIROC.MIROC6.historical.day.GARD-SV_50.73_-3.53_JJA.csv


Now that this is all done, I should find there's a file or set of files that's just been written that I can download and analyse using python or R or whatever you choose to use.