# Running into the future with GCM data

This notebook is based on [this OGGM tutorail notebook](https://tutorials.oggm.org/stable/notebooks/10minutes/run_with_gcm.html).

In this example, we illustrate how to perform a typical projection run, i.e. using GCM data. Here, we’ll use already bias-corrected CMIP6 data from [ISIMIP3b](https://www.isimip.org/gettingstarted/isimip3b-bias-adjustment). OGGM also supports using raw CMIP5 or CMIP6 data ([as explained in this OGGM tutorial](https://tutorials.oggm.org/stable/notebooks/10minutes/run_with_gcm.html)). However, running projections with those directly requires downloading several gigabytes of data.

Fortunately, OGGM provides a set of standard projections that use all available climate models combined with the default OGGM workflow. We will explore these pre-run projections in the next session.

For now, we’ll focus on creating our own projection run.

There are three important steps:
- Download the OGGM preprocessed directories that include a pre-calibrated and spun-up glacier model. These contain all the results from the steps we’ve covered in previous sessions.
- Download the climate projection data and apply bias correction if you're working with raw CMIP5 or CMIP6 data.
- Simulate the glacier’s future evolution from the present day to the end of the century (2020–2100).

<div class="alert alert-warning">
    <b>Task</b>: Why do you think it's necessary to bias-correct projection climate data to match the historical climate data?
</div>

Your answer here:

## OGGM setup

In [None]:
# Libs
import xarray as xr
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Locals
import oggm.cfg as cfg
from oggm import utils, workflow, tasks, DEFAULT_BASE_URL
from oggm.shop import gcm_climate

## Pre-processed directories

Let’s run a projection for two Alpine glaciers: Hintereisferner and Aletsch.

In [None]:
# Initialize OGGM and set up the default run parameters
cfg.initialize()

# define working directory
path = 'future_working_dir'
utils.mkdir(path, reset=False)  # if you set reset=True, everything will be deleted and you can start from a fresh state
cfg.PATHS['working_dir'] = path

# select the glacier of your choice, you can use the Glims viewer from the first session
rgi_ids = ['RGI60-11.00897',  # Hintereisferner
           'RGI60-11.01450']  # Altesch

# Go - get the pre-processed glacier directories
gdirs = workflow.init_glacier_directories(rgi_ids, from_prepro_level=5, prepro_base_url=DEFAULT_BASE_URL)

gdir = gdirs[0]

<div class="alert alert-warning">
    <b>Task</b>: Have a look in your working directory, do you know what each of the files means?
</div>

Your answer here:

## The `_spinup_historical` runs

The level 5 files include a pre-computed model run from the RGI outline date up to the last available date covered by the historical climate data. This run is generated using the dynamic spinup method discussed in the previous session.

In the case of the default climate dataset, [GSWP3_W5E5](https://www.isimip.org/gettingstarted/input-data-bias-adjustment/details/80/), the simulation runs up to the end of 2019, meaning the glacier volume is computed as of January 1st, 2020.
These results are stored in directories with the `_spinup_historical` suffix (see last session or the ["10 minutes to... a dynamical spinup"](dynamical_spinup.ipynb) tutorial for more context).

Let’s now compile the spinup results for our two glaciers into a single file:

In [None]:
# compile the run output for both glaciers at once
ds = utils.compile_run_output(gdirs, input_filesuffix='_spinup_historical')

# normalize the volume of each glacier with the 2000 volume
vol_ref2000 = ds.volume / ds.volume.sel(time=2000) * 100

# plot the volume and color each line for the individual glaciers
vol_ref2000.plot(hue='rgi_id');

# adapt the ylabel
plt.ylabel('Volume (%, reference 2000)');

The final model state in 2020 will serve as the starting point for our projections into the future. During these future runs, all model parameters we previously calibrated will remain constant over time.

<div class="alert alert-warning">
    <b>Task</b>: Do you think it’s a valid assumption to keep all model parameters constant over time during future projections?
</div>

Your answer here:

## Download and process GCM data from ISIMIP3b (bias-corrected CMIP6)

A typical use case for OGGM is to run simulations using climate model output, such as the bias-corrected CMIP6 GCMs from [ISIMIP3b](https://www.isimip.org/gettingstarted/isimip3b-bias-adjustment/). In this example, we use the files [mirrored on the OGGM cluster in Bremen](https://cluster.klima.uni-bremen.de/~oggm/cmip6/isimip3b/flat/monthly/), but it's easy to switch to other files if needed. From ISIMIP3b, we have access to 5 GCMs and 3 SSPs on the cluster. You can find more details on the [ISIMIP website](https://www.isimip.org/gettingstarted/isimip3b-bias-adjustment).

<div class="alert alert-warning">
    <b>Task</b>: What do the abbreviations GCM and SSP stand for?
</div>

Your answer here:

Let's download the data:

In [None]:
# you can choose one of these 5 different GCMs:
# 'gfdl-esm4_r1i1p1f1', 'mpi-esm1-2-hr_r1i1p1f1', 'mri-esm2-0_r1i1p1f1' ("low sensitivity" models, within typical ranges from AR6)
# 'ipsl-cm6a-lr_r1i1p1f1', 'ukesm1-0-ll_r1i1p1f2' ("hotter" models, especially ukesm1-0-ll)
member = 'mri-esm2-0_r1i1p1f1' 

for ssp in ['ssp126', 'ssp370','ssp585']:
    # bias correct them
    workflow.execute_entity_task(gcm_climate.process_monthly_isimip_data, gdirs, 
                                 ssp = ssp,
                                 # gcm member -> you can choose another one
                                 member=member,
                                 # recognize the climate file for later
                                 output_filesuffix=f'_ISIMIP3b_{member}_{ssp}'
                                 );

One advantage of using ISIMIP3b data is that it has already been bias-corrected by the ISIMIP consortium. Since OGGM v1.6 uses the [W5E5](https://docs.oggm.org/en/latest/climate-data.html#w5e5) dataset as its baseline historical climate, no further bias correction is needed when using ISIMIP3b projections. However, if you prefer to apply your own bias correction or want access to a larger selection of GCMs, you can also use the original CMIP5 or CMIP6 GCM datasets.

If you're curious about which historical climate dataset OGGM is using in your run, you can simply ask OGGM:

In [None]:
gdirs[0].get_climate_info()

<div class="alert alert-warning">
    <b>Task</b>: Take another look in your working directory, do you see any new files?
</div>

Your answer here:

Let’s take a closer look at one of these files:

In [None]:
ds_proj_example = xr.open_dataset(gdir.get_filepath('gcm_data', filesuffix='_ISIMIP3b_mri-esm2-0_r1i1p1f1_ssp126'))
ds_proj_example

<div class="alert alert-warning">
    <b>Task</b>: What information does the dataset contain, and what are the units of the variables? Create a plot for the temperature and precipitation and also add the historical data in the same plot (have a look at session 3). What do you observe?
</div>

Your answer here:

## Projection runs 

We now run OGGM under various future scenarios, starting from the end year of the historical spin-up run.
To tell the model where to start, we provide the argument `init_model_filesuffix='_spinup_historical'`. This ensures that the final state of the dynamic spin-up run is used as the starting point for our projections.

In [None]:
for ssp in ['ssp126', 'ssp370', 'ssp585']:
    # this is the same id we used when processing the climate data in process_monthly_isimip_data
    rid = f'_ISIMIP3b_{member}_{ssp}'
    workflow.execute_entity_task(tasks.run_from_climate_data, gdirs,
                                 climate_filename='gcm_data',  # use gcm_data, not climate_historical
                                 climate_input_filesuffix=rid,  # use the chosen scenario
                                 init_model_filesuffix='_spinup_historical',  # this is important! Start from 2020 glacier
                                 output_filesuffix=rid,  # recognize the run for later
                                );

That’s it, you have just completed your first projection runs through to the year 2100! Now, let’s take a look at the results by creating a plot in the next section.

## Plot projection runs 

We have performed three projection runs using the same model but with three different scenarios. Let’s now compare how the glacier evolves under each scenario:

In [None]:
# define one ax for each glacier
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))

# define which color should be used for which scenario
color_dict={'ssp126':'blue', 'ssp370':'orange', 'ssp585':'red'}

# loop through each scenario and plot the result
for ssp in ['ssp126','ssp370', 'ssp585']:
    # the same file id we used in the previous steps
    rid = f'_ISIMIP3b_{member}_{ssp}'
    
    # Compile the output into one file
    ds = utils.compile_run_output(gdirs, input_filesuffix=rid)
    
    # Plot it, each glacier on a different ax
    ds.isel(rgi_id=0).volume.plot(ax=ax1, label=ssp, c=color_dict[ssp]);
    ds.isel(rgi_id=1).volume.plot(ax=ax2, label=ssp, c=color_dict[ssp]);

# add the legend
plt.legend();

<div class="alert alert-warning">
    <b>Task</b>: Analyse the plots: Are the results what you expected? What are the main differences between the two glaciers in terms of their response to the scenarios? (Hint: For analyzing it also could help to look at the climate input data)
</div>

Your answer here:

## Running multiple models for the same scenario

So far, we’ve only used one GCM across three different scenarios. Now, we’ll take a single scenario and add more GCM members to see how we can analyze and compare the results.

In [None]:
# you can choose one of these 5 different GCMs:
# 'gfdl-esm4_r1i1p1f1', 'mpi-esm1-2-hr_r1i1p1f1', 'mri-esm2-0_r1i1p1f1' ("low sensitivity" models, within typical ranges from AR6)
# 'ipsl-cm6a-lr_r1i1p1f1', 'ukesm1-0-ll_r1i1p1f2' ("hotter" models, especially ukesm1-0-ll) 

# we stick for to the ssp370 scenario for now
ssp = 'ssp370'

# we select three GCM's as an example, and process the data for those
for member in ['gfdl-esm4_r1i1p1f1', 'mpi-esm1-2-hr_r1i1p1f1', 'mri-esm2-0_r1i1p1f1']:
    # bias correct them
    workflow.execute_entity_task(gcm_climate.process_monthly_isimip_data, gdirs, 
                                 ssp = ssp,
                                 # gcm member -> you can choose another one
                                 member=member,
                                 # recognize the climate file for later
                                 output_filesuffix=f'_ISIMIP3b_{member}_{ssp}'
                                 );

In [None]:
# our selected scenario
ssp = 'ssp370'

# loop through our selected GCMs and run OGGM until 2100
for member in ['gfdl-esm4_r1i1p1f1', 'mpi-esm1-2-hr_r1i1p1f1', 'mri-esm2-0_r1i1p1f1']:
    rid = f'_ISIMIP3b_{member}_{ssp}'
    workflow.execute_entity_task(tasks.run_from_climate_data, gdirs,
                                 climate_filename='gcm_data',  # use gcm_data, not climate_historical
                                 climate_input_filesuffix=rid,  # use the chosen scenario
                                 init_model_filesuffix='_spinup_historical',  # this is important! Start from 2020 glacier
                                 output_filesuffix=rid,  # recognize the run for later
                                );

After running the models, let’s examine the results in the same way as before:

In [None]:
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))
# Pick some colors for the lines
color_dict={'gfdl-esm4_r1i1p1f1':'blue',
            'mpi-esm1-2-hr_r1i1p1f1':'orange',
            'mri-esm2-0_r1i1p1f1':'red'}
ssp = 'ssp370'
for member in ['gfdl-esm4_r1i1p1f1', 'mpi-esm1-2-hr_r1i1p1f1', 'mri-esm2-0_r1i1p1f1']:
    rid = f'_ISIMIP3b_{member}_{ssp}'
    # Compile the output into one file
    ds = utils.compile_run_output(gdirs, input_filesuffix=rid)
    # Plot it
    ds.isel(rgi_id=0).volume.plot(ax=ax1, label=member, c=color_dict[member]);
    ds.isel(rgi_id=1).volume.plot(ax=ax2, label=member, c=color_dict[member]);
plt.legend();

<div class="alert alert-warning">
    <b>Task</b>: What is the main difference compared to the similar plot above? When you imagine using more GCMs, do you think there might be a better way to analyze the results?
</div>

Your answer here:

Earlier, we saw how to open the output for a single GCM and scenario using `utils.compile_run_output`. Now, we want to open the outputs for multiple GCMs under one scenario and combine them into a single dataset.

To do this, we:
1. Open each GCM–scenario combination individually.
2. Save each dataset to `ds_all`, adding the GCM and scenario as new coordinates.
3. Finally, we merge all datasets using these newly added coordinates.

In [None]:
# in this array all datasets going to be stored with additional coordinates GCM and SCENARIO
ds_all = []

# our selected scenario
ssp = 'ssp370'

# loop through all GCMs, add the new coordinates and add the final ds to ds_all
for GCM in ['gfdl-esm4_r1i1p1f1', 'mpi-esm1-2-hr_r1i1p1f1', 'mri-esm2-0_r1i1p1f1']:  
    # put together the same filesuffix which was used during the projection runs
    rid = '_ISIMIP3b_{}_{}'.format(GCM, ssp)  

    # open one model run
    ds_tmp = utils.compile_run_output(gdirs, input_filesuffix=rid)  

    # add GCM as a coordinate
    ds_tmp.coords['GCM'] = GCM  
    ds_tmp.coords['GCM'].attrs['description'] = 'used Global circulation Model'  # add a description for GCM
    ds_tmp = ds_tmp.expand_dims("GCM")  # add GCM as a dimension to all Data variables

    # add scenario (here ssp) as a coordinate
    ds_tmp.coords['SCENARIO'] = ssp  
    ds_tmp.coords['SCENARIO'].attrs['description'] = 'used scenario (here SSPs)'
    ds_tmp = ds_tmp.expand_dims("SCENARIO")  # add SSP as a dimension to all Data variables

    # add the dataset with extra coordinates to our final ds_all array
    ds_all.append(ds_tmp)  

# after adding the new coordinates, we can use them to merge all datasets into one
ds_merged = xr.combine_by_coords(ds_all, fill_value=np.nan)  # define how the missing GCM, SCENARIO combinations should be filled

In [None]:
ds_merged

You will notice that our data variables now have four dimensions:
- SCENARIO
- GCM
- time
- rgi_id

This structure makes it very easy to analyze the results. For example, let’s plot the median across all GCMs for one scenario and one glacier:

In [None]:
ds_merged.isel(rgi_id=0).volume.median(dim='GCM').plot()

However, the median alone only tells part of the story. To provide more insight, you can also plot the interquartile range, which shows the spread of the data. Alternatively, you could use other metrics such as the mean and standard deviation to illustrate both the central tendency and variability of the results.

Let’s create a plot that includes both options:

In [None]:
# define one axis per glacier
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))

# function for plotting the mean and the standard deviation
def plot_mean_and_std(ax, rgi_id, color):
    # first calculate the values we want to plot
    mean = ds_merged.isel(rgi_id=rgi_id).volume.mean(dim='GCM').values[0]
    std = ds_merged.isel(rgi_id=rgi_id).volume.std(dim='GCM').values[0]
    time = ds_merged.time.values

    # plot the mean
    ax.plot(time, mean,
            label='mean',
            color=color)

    # plot the standard deviation
    ax.fill_between(time, mean - std, mean + std,
                    label='standard deviation',
                    alpha=0.3,
                    color=color,)

# funciton for plotting the median and the interquartile range
def plot_median_and_interquarntile_range(ax, rgi_id, color):
    # first calculate the values we want to plot
    median = ds_merged.isel(rgi_id=rgi_id).volume.median(dim='GCM').values[0]
    p17 = ds_merged.isel(rgi_id=rgi_id).volume.quantile(0.17, dim='GCM').values[0]
    p83 = ds_merged.isel(rgi_id=rgi_id).volume.quantile(0.83, dim='GCM').values[0]
    time = ds_merged.time.values

    # plot the median
    ax.plot(time, median,
            label='median',
            color=color)

    # plot the interquartile range
    ax.fill_between(time, p17, p83,
                    label='interquartile range\n(17th-83th percentile)',
                    alpha=0.3,
                    color=color,)


# add some nice labels to an axis
def add_annotations_to_ax(ax):
    ax.set_ylabel('Volume in m³')
    ax.set_xlabel('Time')
    ax.set_title('3 GCM runs for SSP370')


plot_mean_and_std(ax=ax1, rgi_id=0, color='C0')  # with C0, C1 ... you can loop through the default colors
plot_median_and_interquarntile_range(ax=ax1, rgi_id=0, color='C1')
add_annotations_to_ax(ax1)

plot_mean_and_std(ax=ax2, rgi_id=1, color='C0')  # with C0, C1 ... you can loop through the default colors
plot_median_and_interquarntile_range(ax=ax2, rgi_id=1, color='C1')
add_annotations_to_ax(ax2)

plt.legend()

<div class="alert alert-warning">
    <b>Task</b>: Could you explain the differences you observe between the various metrics (e.g., median vs. mean, interquartile range vs. standard deviation)? Which metric do you think best fits our purpose, and why? (Hint: add the individual model runs as gray lines to the plot)
</div>

Your answer here:

Great, so far, we have conducted multiple model runs for the same SSP scenario. However, when it comes to communicating our results, grouping model runs strictly by scenario might not be the most intuitive approach. These scenario names (e.g., SSPs or RCPs) can be hard to interpret, especially for non-expert audiences.

Additionally, the transition from RCP scenarios in CMIP5 to SSP scenarios in CMIP6 introduced inconsistencies, there is no clear rule for how to directly compare or combine them. To explore what’s available, here you can find a [list of all available CMIP5 model runs](https://cluster.klima.uni-bremen.de/~oggm/cmip5-ng/gcm_table_2100.html) and a [list of all available CMIP6 model runs](https://cluster.klima.uni-bremen.de/~oggm/cmip6/gcm_table_2100.html).

<div class="alert alert-warning">
    <b>Task</b>: Could you think of a better way to group our simulations, other than by scenario?
</div>

Your answer here:

## Categories projections by temperature levels

One useful way to group the simulations is by the predicted global temperature increase at the end of the century, relative to pre-industrial levels. And that’s exactly what we’re going to do.

To support this, we have created a table that contains the global temperature change for each CMIP5 and CMIP6 model, calculated as the difference between the periods 2071-2100 and 1850-1900. You can [download the file here](https://cluster.klima.uni-bremen.de/~pschmitt/teaching/cryo_in_climate/cmip5_and_cmip6_warming_compared_to_preindustrial.csv) and save it in the same directory as your notebook.

Let’s take a look at it:

In [None]:
# reading the csv file
df_warming_levels = pd.read_csv('cmip5_and_cmip6_warming_compared_to_preindustrial.csv', index_col=0)

In [None]:
# look at the projected temperature changes in a histogram
plt.hist(df_warming_levels['global_temp_ch_2071-2100_preindustrial'],
         # define the bins the data should be displayed in, here form 0.5 to 6.5 in 0.5 steps
         bins=np.arange(0.5, 6.6, 0.5),  # 6.6 because the last value is not included in np.arange
         edgecolor='black',
        )
plt.ylabel('Number of realizations')
plt.xlabel('Temperature change in °C compared to preindustrial')
plt.grid('on')

<div class="alert alert-warning">
    <b>Task</b>: In which temperature range do we have the most model realizations available?
</div>

Your answer here:

Now we can use this data to group our simulations based on specific temperature levels. To do this, we will define a function which selects all simulations within a provided temperature range:

In [None]:
# the function takes a target temperature and a range, e.g. 2.7+/-0.2°C
def get_models_from_temp(temp, temp_range):
    pi_l = temp - temp_range  # our lower temperature limit
    pi_u = temp + temp_range  # our higher temperature limit

    # select only those which are inside of our temperature limit
    pd_cmip_sel = df_warming_levels.loc[
        # select all which have a larger temperature as our lower limit AND
        (df_warming_levels['global_temp_ch_2071-2100_preindustrial']>=pi_l) &
        # those having a smaller temperature as our higher limit
        (df_warming_levels['global_temp_ch_2071-2100_preindustrial']<=pi_u)
    ]
    return pd_cmip_sel

As an example, let’s select all models with a projected temperature increase of 2.7 ± 0.2 °C (i.e., between 2.5°C and 2.9°C):

In [None]:
models_2_7_deg = get_models_from_temp(2.7, 0.2)
models_2_7_deg

As you can see, the result includes a mix of CMIP5 and CMIP6 models, along with a variety of different scenarios.
The column `gcm` refers to the climate model used, while `rcp` and `ssp` refer to the scenario for CMIP5 and CMIP6, respectively.

<div class="alert alert-warning">
    <b>Task</b>: What should we check after selecting our model realizations?
</div>

Your answer here:

We should check how close the mean and median temperature of our selected models is to the target temperature level:

In [None]:
models_2_7_deg['global_temp_ch_2071-2100_preindustrial'].mean()

In [None]:
models_2_7_deg['global_temp_ch_2071-2100_preindustrial'].median()

Okay, that’s quite good. However, it is always important to check for cases where the selection doesn’t work as well, let’s look at an example:

In [None]:
get_models_from_temp(1.4, 0.2)['global_temp_ch_2071-2100_preindustrial'].mean()

In [None]:
get_models_from_temp(1.4, 0.2)['global_temp_ch_2071-2100_preindustrial'].median()

In this case, you can see that the mean temperature of our selection is 0.1 °C warmer than the target. This highlights why it is important to always check the actual temperature of your selection when analyzing the results, even small differences can influence the interpretation.

Another important aspect when comparing different temperature levels is to consider the number of selected models. In general, the more models you include, the more robust your statistics become, whether you're looking at the mean, standard deviation, median, or interquartile range.

In [None]:
len(models_2_7_deg)

In [None]:
len(get_models_from_temp(1.4, 0.2))

As you saw above, for the 2.7 ± 0.2 °C range, we have twice as many model realizations available compared to the 1.4 ± 0.2 °C range.

That’s why it’s important to ensure that, when defining your target temperature levels, the mean temperature of your selection closely matches your target, and that the number of realizations is balanced across the temperature groups. This ensures fair and robust comparisons.

With your selected models, you now have a list of realizations you can use for further analysis. As mentioned at the beginning of this session, running all CMIP5 and CMIP6 realizations yourself would require downloading several gigabytes of data. Fortunately, OGGM provides standard projections for all of these, which you can use directly for your analysis.

In the next session, you will learn how to access and work with these projections, especially useful for your group projects.

## Bonus: Sandbox

If you are interested, you can conduct a similar sensitivity study as done in last session, but this time for future projections. For this select one gcm-scneario combination and contucte multiple runs with changing model parameters. Create some plots for analysing how the changes effect the projected volume.

If you're interested, you can perform a similar sensitivity study as we did in the last session, but this time using future projections. To do this:

- Select one GCM-scenario combination.
- Conduct multiple projection runs, each with different model parameter settings (e.g., melt factor, precipitation factor, temperature bias, Glen A).
- Create plots to analyze how changes in these parameters affect the projected glacier volume.

This is a great way to understand the impact of model assumptions on long-term projections.

In [None]:
# Sandbox experiments

## Recap

- Future climate data needs to be bias-corrected to match the historical data used in the model.
- OGGM provides a wide range of climate models and scenarios (e.g., CMIP5/RCPs and CMIP6/SSPs).
- For clearer communication, it is often more effective to group results by warming levels (e.g., 1.5°C, 2°C) rather than by scenario names, which can be harder to interpret.