# Group project

## Instructions

**Objectives**

In this final project, you should pick two glacierized basins and analyse the projected future of glaciers for different climate scenarios, and compare them to each other. For this you should use tools and knowledge you gained during the practical sessions in the last weeks.

**Deadline**

Please submit your project via OLAT before **Monday June 17 at 00H** (in the night from Monday to Tuesday).

**Formal requirements**

You will work in groups of two. If we are an odd number of students, one group can have three participants. *(Tip: I recommend that students who have not followed a programming class to team up with students who have)*.

Each group will submit one (executed) jupyter notebook containing the code, plots, and answers to the questions (text in the markdown format) on OLAT. Please also submit an HTML version of the notebook. **Please ensure that your HTML file is smaller than 10 MB. This helps us provide you with more detailed and readable feedback.**

Each group member must contribute to the notebook. The notebook should be self-contained and the answers must be well structured. The plots must be as understandable as possible (title, units, x and y axis labels, appropriate colors…). 

Please be concise in your answer. We expect a few sentences per answer at most - there is no need to write a new text book in this project! Use links and references to the literature or your class slides where appropriate.

**Grading**

We will give one grade per project, according to the following table (total 10 points):
- correctness of the code and the plots: content, legends, colors, units, etc. (3 points)
- quality of the answers: correctness, preciseness, appropriate use of links and references to literature or external resources (5 points)
- contextualise your findings with literature (2 points)

In [None]:
# Imports
import os
import urllib.request

import xarray as xr
import pandas as pd
import numpy as np

## International Network for Alpine Research Catchment Hydrology INARCH

We aim, with this group project, to contribute to the INARCH initiative. The goal of INARCH is to better understand hydrological processes in alpine cold regions, improve their prediction, diagnose their sensitivity to global change, and develop consistent measurement strategies. You can find more information on the [INARCH website](https://inarch.usask.ca/about-inarch/about.php).

In this project, our focus will be on glacier projections from OGGM in glacierized basins of INARCH. You are expected to analyze how glaciers are evolving and how their contribution to total runoff is changing. The final outcome will be a dataset of OGGM projections for all CMIP5 and CMIP6 scenarios (which I have prepared), along with a first analysis of this dataset, this is the core of your work in the group project.

For your individual group analysis, you should compare two basins. As a whole class, we aim to ensure that each basin is covered by at least one group. The selection of basins is up to each group and should be done using the link to the spreadsheet shared in the presentation (first come, first served).

In your analysis, make sure to incorporate the background knowledge about how OGGM is working and how the data was generated, as we discussed during the practical sessions.

## Selecting Temperature Scenarios for Analysis

To analyze how glaciers are evolving, we will group the projections by temperature warming levels, as done in a previous session. For this, you will again need the table containing the actual warming levels for each climate model realization. You can [download it here](https://cluster.klima.uni-bremen.de/~pschmitt/teaching/cryo_in_climate/cmip5_and_cmip6_warming_compared_to_preindustrial.csv) and save it at the same location as this notebook.

Below is the code used to select the model realizations based on temperature targets. Our analysis focuses on three temperature goals: 1.5°C, 2.7°C, and 4.0°C, each with a tolerance of ±0.2°C.

In [None]:
# reading the csv file
df_warming_levels = pd.read_csv('cmip5_and_cmip6_warming_compared_to_preindustrial.csv', index_col=0)

# the function takes a target temperature and a range, e.g. 2.7+/-0.2°C
def get_models_from_temp(temp, temp_range):
    pi_l = temp - temp_range  # our lower temperature limit
    pi_u = temp + temp_range  # our higher temperature limit

    # select only those which are inside of our temperature limit
    pd_cmip_sel = df_warming_levels.loc[
        # select all which have a larger temperature as our lower limit AND
        (df_warming_levels['global_temp_ch_2071-2100_preindustrial']>=pi_l) &
        # those having a smaller temperature as our higher limit
        (df_warming_levels['global_temp_ch_2071-2100_preindustrial']<=pi_u)
    ]
    return pd_cmip_sel

# define the models for each temperature goal in a dictionary
temp_scenarios = {
    '4°C': get_models_from_temp(4, 0.2),
    '2.7°C': get_models_from_temp(2.7, 0.2),
    '1.5°C': get_models_from_temp(1.5, 0.2),
}

<div class="alert alert-warning">
    <b>Task</b>: For each temperature target (1.5°C, 2.7°C, and 4.0°C), evaluate the selection of climate model realizations by calculating the mean, median, and number of realizations included in each group. Briefly discuss your findings.
</div>

In [None]:
# add your code here

Your discussion here:

## Getting the Data for Your Basins

To begin your analysis, you need to load the glacier and runoff projection data for your assigned basins. Make sure you know which basins your group is working on (as selected in the shared spreadsheet). The data for each basin is stored in individual files, which you can load using the provided code template.  You can choose where to store the data locally by setting the variable `local_data_dir`. By default, this will create a new folder called `glacier_projection_data` in the same location as your notebook.

In [None]:
# add here your the basin_id of your selected basins (e.g. ['basin_1', 'basin_2'])
basin_ids = ['zugspitze', ]

# you can select here a location on your computer to store the glacier data
local_data_dir = 'glacier_projection_data'

# create the directory, if it does not exist
os.makedirs(local_data_dir, exist_ok=True)

# the url where all the data is stored
base_url = 'https://cluster.klima.uni-bremen.de/~pschmitt/teaching/cryo_in_climate/INARCH/data/'

# in this structure we will save the opened data
ds_all = {}

# Code for downloading the data, if data already downloaded this will be skipped
for basin in basin_ids:

    # create a directory for each basin
    basin_url = os.path.join(base_url, basin, '2100')
    local_basin_dir = os.path.join(local_data_dir, basin)
    os.makedirs(local_basin_dir, exist_ok=True)
    
    ds_all[basin] = {}
    for temp_level in temp_scenarios:
        ds_tmp_all = []
        for i, realization in temp_scenarios[temp_level].iterrows():
            # depending on the CMIP, different names for scenarios
            scenario_column = 'ssp' if realization['cmip'] == 'CMIP6' else 'rcp'
            filename = f"basin_{basin}_run_hydro_w5e5_gcm_merged_bc_2000_2019_{realization['gcm']}_{realization[scenario_column]}.nc"

            # only download if file not already downloaded
            if os.path.isfile(os.path.join(local_basin_dir, filename)):
                print(f"File already downloaded: {filename}")
            else:
                print(f"Downloading {filename}")
                urllib.request.urlretrieve(
                    os.path.join(basin_url, filename),
                    os.path.join(local_basin_dir, filename))

            # open individual dataset and combine gcma and scenaio in new variable
            with xr.open_dataset(os.path.join(local_basin_dir, filename)) as ds:
                ds_stacked = ds.stack(gcm_scenario=("gcm", "scenario"))
                ds_tmp_all.append(ds_stacked)            

        print(f'{basin}: combining data for {temp_level}')
        ds_all[basin][temp_level] = xr.combine_by_coords(ds_tmp_all, fill_value=np.nan)

After downloading and processing the data, everything will be stored in the variable `ds_all`. You can access the data for a specific basin and temperature level using the syntax `ds_all[basin_name][temperature_level]`. Below you can find one example:

In [None]:
ds_all[basin_ids[0]]['1.5°C']

<div class="alert alert-warning">
    <b>Task</b>: Explore and discuss the available data (data structure, variables, units, temporal resolution, ...).
</div>

Your answer here:

## Common running glaciers

Some individual glacier projections may be missing or not available for certain scenarios. To avoid introducing errors due to differing glacier counts across scenarios, we will first extract only those glaciers that are available in all scenarios:

In [None]:
# in this variable the common running glaciers will be saved for each basin
not_nan_rgi_ids_all = {}

# loop through your basins
for basin in basin_ids:
    not_nan_rgi_ids = None

    # loop though the temperature scenarios, only glaciers which are available in all temperature scenarios are selected
    for temp in temp_scenarios:
        not_nan_rgi_ids_temp = ~ds_all[basin][temp].volume.isnull().any(dim=["time", "gcm_scenario"])

        if not_nan_rgi_ids is None:
            not_nan_rgi_ids = not_nan_rgi_ids_temp
        else:
            not_nan_rgi_ids &= not_nan_rgi_ids_temp

    # save the working rgi_ids for each basin
    not_nan_rgi_ids_all[basin] = not_nan_rgi_ids.rgi_id[not_nan_rgi_ids].values

We can now use this list of valid rgi_ids to filter our data and include only those glaciers that are available across all scenarios.

In [None]:
basin_example = basin_ids[0]
ds_all[basin_example]['1.5°C'].sel(rgi_id=not_nan_rgi_ids_all[basin_example])

<div class="alert alert-danger">
    <b>Important</b>: For all subsequent analyses, make sure to include only the glaciers that are available across all scenarios to ensure consistency!
</div>

<div class="alert alert-warning">
    <b>Task</b>: For each of your basins, check whether any glaciers were excluded during the filtering process, and if so, calculate the total glacier area in 2000 of the excluded glaciers (in km²) and what percentage this represents of the total 2000 glacier area in the basin.
</div>

Your answer here:

## Describe your basins

Before starting your data analysis, take some time to explore your selected basins and do a bit of background research. You can download all [basin shapefiles here](https://cluster.klima.uni-bremen.de/~pschmitt/teaching/cryo_in_climate/INARCH/basin_shapefile.zip) to examine them more closely.

<div class="alert alert-warning">
    <b>Please answer at least the following questions</b>: 
<ul>
  <li>Where are the basins located, and what are their climate conditions?</li>
  <li>How large is each basin, and what proportion of the area is glacierized? <em>(Tip: check the Attribues of ds_all for the individual basins)</em></li>
</ul>
</div>

Your answer here:

## Volume and area evolution

Analyze the volume and area evolution of all glaciers in your basin. Tip: use `ds.sum(dim='rgi_id')` to sum over all glaciers.

For each basin, create:
- One plot showing total glacier volume evolution (in km³) from 2020 to 2100
- One plot showing total glacier area evolution (in km²) from 2020 to 2100

Each plot should include all three temperature scenarios, displayed as the median with interquartile range (17th to 83rd percentile). The title of each plot should include the name of the basin and the glacierized area fraction (in percent).

In [None]:
# add your code here


<div class="alert alert-warning">
    <b>Questions</b>: 
<ul>
  <li>What do you observe when comparing the different scenarios within each basin?</li>
  <li>Do the two basins react similarly or differently?</li>
  <li>Is there a noticeable difference in the behavior between glacier area and volume?</li>
</ul>
</div>

Your answers here:

## Hydrological output

Analyze the hydrological output of your basins. For guidance, you can refer to the plots of [this tutorial](https://edu-notebooks.oggm.org/oggm-edu/glacier_water_resources_projections.html).

For each basin, create the following plots:
- Total runoff for all temperature scenarios in one plot (showing median and interquartile range)
- Runoff components (only median), one plot per temperature scenario
- Monthly runoff (median only), as a 2D plot (x-axis: Months, y-axis: Years), one plot per temperature scenario
- Annual runoff at three time steps (e.g. 2020, 2060, 2100), showing median and interquartile range, one plot per temperature scenario

The title of each plot should include the name of the basin and the glacierized area fraction (in percent), and, if needed, the temperature scenario.

In [None]:
# add your code here


<div class="alert alert-warning">
    <b>Questions</b>: 
<ul>
  <li>What do you observe when comparing the different scenarios within each basin?</li>
  <li>Do the two basins react similarly or differently?</li>
  <li>Can you identify any evidence of peak water?</li>
</ul>
</div>

Your answers here:

## Contextualize your results with the literature

<div class="alert alert-warning">
    <b>Task</b>: Compare your findings with existing studies or reports. For each basin, find at least one relevant source from the scientific literature. Compare your results to those presented in the literature, and discuss any similarities or differences. Use the knowledge you have gained during the practical sessions, particularly about OGGM and how the data is generated, to help interpret and explain any contrasts between your findings and those in other studies.
</div>

Your answer here: