# Capstone project by PAUL MELLET: Data-driven avalanche hazard prediction

# Part 1: Introduction and context

## 1) The problem

### a) Context

In Switzerland, one of the missions of the SLF is to build a daily avalanche forecast for Switzerland. SLF stands for *Institut für Schnee-und Lawinenforschung* in German which means *Institute for Snow and Avalanche research*. The SLF is part of the WSL, *Eidgenössische Forschungsanstalt für Wald, Schnee und Landschaft* or the *Swiss Federal Institute for Forest, Snow and Landscape Research*.

This forecast is published once, at 5 pm, or twice per day, once at 8 am and once at 5 pm [[1]](#ref1). Generally, the first avalanche forecast is published from the first significant snowfall, around middle or end of november, until around end of May or middle of June. Normally, there isn't any forecast during summer unless significant snowfalls occur again. [[1]](#ref1)

The users and usages of this forecast can be various. For instance, by taking notes of it and for security measures, the swiss communes can temporarily close roads or schools in extreme cases, the ski resorts some of their pistes (if not the entire resort) or the amateurs of mountain sports, mountain professionals for their personal pratice. [[2]](#ref2)

### b) How to read and interpret an avalanche forecast

The avalanches can be triggered by human overloading on the snow cover (e.g a skier) or spontaneously, which we call spontaneous or natural avalanches. We won't go into too much details, but it is important for us to remind that there are 5 typical situations that can create avalanches (if needed, one can find more complete informations here [[3]](#ref3)): 

* new-snow: due to current or recent snowfalls creating an overloading on the snowpack

* wind slabs: due to the wind bringing additional accumulations of snow on the current zone

* old-snow: due to persistent weak layers in the snowpack that can break creating the start of snow slabs

* wet-snow: due to a certain amount of liquid water in the snowpack, they are mainly spontaneous

* gliding avalanches: the entire snowpack glides on the ground, due to low friction between the snowpack and the ground, they are mainly spontaneous

The SLF also classifies these situations into two categories [[4]](#ref4):

* "dry-snow" problem: takes into account the new-snow, wind slabs and old-snow situations

* "wet-snow" problem: takes the wet-snow and gliding avalanches situations. 

The SLF uses the five-level European avalanche danger scale to denote the avalanche danger. When the danger level is 1, it is considered as *"low"*, 2 is used for *"moderate"*, 3 for *"considerable"*, 4 for *"high"* and 5 for *"extremely high"*. Since december 2022 [[5]](#ref5), the danger level from 2 can also be followed by a -, =, or + sign to track more accurately the danger level tendency (2- is less dangerous than 2= which is less dangerous than 2+). 

The danger level is in fact an index of the risk of avalanches. It follows the formula:

$$\text{risk} = \text{consequences}\cdot\text{probability of occurrence}$$

The greater the danger level, the larger the avalanches may be, the more likely they are to occur in several zones and the higher the probability of having avalanches is [[6]](#ref6).

While the interpretation of the avalanche forecast is accessible to all (we will briefly explain how one can read it just after), its establishment and the study of avalanche hazard is really complex. The searchers at SLF use a large variety of data coming from different parts of the country and whose contributors can be either SLF observers, SLF partners or mountains enthusiasts. To mention only a few ones: measurements made by SLF stations (quantity of snow or snowfall, wind speed, wind direction, temperatures, ...), observations (signals of avalanches) reported by mountain amateurs while being on the terrain, weather forecast, or snow profiles study and stability tests established by SLF observers or partners. [[7]](#ref7) Snow profiles involve cutting through the entire snowpack down to the ground to examine its layers, the cohesion between them, and the snowpack’s resistance to additional loads. [[8]](#ref8)

We suggest to have a look on an example of an avalanche forecast published at 17:00 the 15th of March 2025. You can find the whole forecast here [[9]](#ref9).

We will only focus on parts of it to explain the important points. 

The first thing that the avalanche forecast presents is a map of Switzerland:

<img src="../pictures_maps/avalanche_danger1.png" width="600" height="300">


This map is separated in regions painted in different colors according to the current highest danger degree. Green is used when the danger is 1 (low), yellow when it is 2 (moderate) and orange when it is 3 (considerable). The avalanche danger degree can be also 4 (high), colored in red, and 5 (extremely high) colored with red and black grids. The regions in white are those where the snow cover isn't significant enough to be considered assuming a real avalanche danger. 

The avalanche forecast then shows a more detailed description for every region of the avalanche danger. Let's have a look at the region D of this forecast. 

<img src="../pictures_maps/avalanche_danger2.png" width="600" height="300">

We can see different things:

- separated regions can belong to the same alert zone. This is because the SLF considers that the conditions in those regions are equivalent and the dangers and the danger levels are the same, even if they are not connected. In our case, the region D includes a part of the south of Valais and a part of the Grisons.

- as we said previously, the degree, which is an integer from 1 to 5, can be followed by a sign - or +  which indicates if the danger tendency is upward or downward. When no sign is added, it is the same as if the = sign was added.

- Most of the time, the SLF defines an altitude from which the avalanche danger degree is effective. For locations below this altitude, there is a rule of thumb that says that the danger degree can be interpreted as equal to the one announced minus 1 [[10]](#ref10). In our case, the danger is 3- above 2200m, 2- below. Also, *"If this information is not given, the indicated danger level applies to all aspects and altitude zone"* [[10]](#ref10).

- the aspect rose defines the orientations (in black) for which the danger degree is effective. The locations that are not concerned by these orientations (in white), the danger degree is equal to the one announced minus 1. In our case, for the orientations from west to east included passing through the north, the danger is 3- and 2- for south, south-east and south-west oriented zones. 

- the avalanche forecast defines all the types of danger that are current in this region. There can be two different dangers with different degrees for a single alert region. This often happens during spring. We won't explain here the different types of avalanches but we can still keep in mind that during winter, most of the avalanches are either caused by wind slab, dry-snow avalanches or old-snow avalanches, while in spring, due to the temperature warming, the danger can be also due to wet snow. The overall avalanche danger degree (the one used for coloring the region) is always the highest one among all the possible avalanche dangers.

- for every type of danger, a text explains it more in details.

A maybe obvious but important thing to keep in mind is that the degree, if the degree is followed by a - or + sign, the altitude limits and the orientations are only indicatives. The danger doesn't strictly shift from 3 to 2 exactly below the altitude of 2700m for example.

### c) Our goal

As we said, the avalanche forecast and the intensity of the danger degree rely, among other things, on meteorological data recorded by the SLF stations. In this project, we aim to train supervised machine learning models with historical daily weather data recorded by these stations and the historical daily danger degrees that were effective at the location of these stations.

Throughout this project, we will first have a first contact with the variables that the SLF uses for evaluating the avalanche danger risk. Indeed, apart from the classic meteorological data one might expect the institute to use (such as temperature, wind, quantity of snowfall etc.) , we will see that there are some more "exotic" variables such as the *Skier penetration depth* that the SLF takes into account.

Doing this will allow us to predict the avalanche danger degree, which again takes value in 1, 2, 3, 4, 5 possibly followed by a - or + sign, with only data measured and provided by the SLF. 

More generally, this project will tell us wether the avalanche forecast can only be build with those data (without empirical knowledges, experiments or observations of scientists on the terrain) or there are great chances that it will be in the future. 

## 2) Source and context of the dataset

[*EnviDat*](#https://www.envidat.ch/#/) (Environmental Data) is the data portal of the WSL, which includes the SLF. It provides standardized, regulated access to environmental monitoring and research data.

During this project, we will work on the *"Data_weather_snowpack_danger_forecast"* dataset, which can be found here: 

https://www.doi.org/10.16904/envidat.330

*"This data set includes the meteorological variables (resampled 24-hour averages) and the profile variables extracted from the simulated profiles for each of the weather stations of the IMIS network in Switzerland, and, the danger ratings for dry-snow conditions assigned in the Swiss avalanche bulletin to the location of the weather station. This dataset provides daily meteorological variables, profile variables"* [[11]](#ref11) The IMIS network (Intercantonal Measurement and Information System) consists of 189 stations located in various parts of Switzerland, that are either snow stations, wind stations or specialized stations. [[12]](#ref12)


There are a few important things to note:

Firstly, as we will work on this dataset, our Machine Learning project will then only focus on danger levels for dry-snow conditions and not for wet-snow conditions.

Secondly, this dataset was built for a Machine Learning project within a SLF team to predict automatically and data-based the avalanche danger with two different Random Forest Classifiers [[13]](#ref13). If the objective of our project will be similar, our goal will be different in its aim: build our own pipeline and train several machine learning models seen in class and see how they behave on this task.

Finally, the SLF team worked with another data set *Data_RF2_tidy*, also available on https://www.doi.org/10.16904/envidat.330, to build the second Random Forest Classifier in order *"to reduce the uncertainty resulting from using the forecast danger level as target variable, we trained a second classifier (RF 2) that relies on a quality-controlled subset of danger level labels."* [[13]](#ref13) In this project, we plan to only use the first dataset. 


In this project, other tables will help us understand the features stored in the main dataset. The SLF provides the links that allow to extract useful information on the following website:

https://www.slf.ch/en/services-and-products/slf-data-service/

We will use:

- the SLF stations list: https://measurement-data.slf.ch/imis/
- the warning regions and sectors identifiers: https://aws.slf.ch/api/warningregion/#/

Finally, we will use the *"Appendix C: Definition of features for developing RF models"* part of *Pérez-Guillén, C., Techel, F., Hendrick, M., Volpi, M., van Herwijnen, A., Olevski, T., Obozinski, G., Pérez-Cruz, F., and Schweizer, J.: Data-driven automated predictions of the avalanche danger level for dry-snow conditions in Switzerland, Nat. Hazards Earth Syst. Sci., 22, 2031–2056, https://doi.org/10.5194/nhess-22-2031-2022, 2022* [[13]](#ref13). In this subsection, the authors provide two tables that describe variables stored in the dataset. Their names and descriptions can be downloaded as XLSX files that will be very useful for us. 

## 3) References

<a id="ref1"></a> [1] [SLF - Interpretation Guide (PDF)](https://www.slf.ch/fileadmin/user_upload/SLF/Lawinenbulletin_Schneesituation/Wissen_zum_Lawinenbulletin/Interpretationshilfe/Interpretationshilfe_EN.pdf), p. 13.
  
<a id="ref2"></a>[2] [SLF - Interpretation Guide (PDF)](https://www.slf.ch/fileadmin/user_upload/SLF/Lawinenbulletin_Schneesituation/Wissen_zum_Lawinenbulletin/Interpretationshilfe/Interpretationshilfe_EN.pdf), p. 5.

<a id="ref3"></a>[3] [SLF - Interpretation Guide (PDF)](https://www.slf.ch/fileadmin/user_upload/SLF/Lawinenbulletin_Schneesituation/Wissen_zum_Lawinenbulletin/Interpretationshilfe/Interpretationshilfe_EN.pdf), p. 27-32.

<a id="ref4"></a>[4] [SLF - Interpretation Guide (PDF)](https://www.slf.ch/fileadmin/user_upload/SLF/Lawinenbulletin_Schneesituation/Wissen_zum_Lawinenbulletin/Interpretationshilfe/Interpretationshilfe_EN.pdf), p. 7.

<a id="ref5"></a>[5] [SLF - Subdivision of danger levels in the avalanche bulletin](https://www.slf.ch/en/news/subdivision-of-danger-levels-in-the-avalanche-bulletin/)

<a id="ref6"></a>[6] [SLF - Interpretation Guide (PDF)](https://www.slf.ch/fileadmin/user_upload/SLF/Lawinenbulletin_Schneesituation/Wissen_zum_Lawinenbulletin/Interpretationshilfe/Interpretationshilfe_EN.pdf), p. 18.

<a id="ref7"></a>[7] [SLF - Interpretation Guide (PDF)](https://www.slf.ch/fileadmin/user_upload/SLF/Lawinenbulletin_Schneesituation/Wissen_zum_Lawinenbulletin/Interpretationshilfe/Interpretationshilfe_EN.pdf), p. 14-16.

<a id="ref8"></a>[8] [SLF - Information about snow profiles](https://www.slf.ch/en/avalanche-bulletin-and-snow-situation/snow-maps/information-about-snow-profiles/)

<a id="ref9"></a>[9] [SLF- Avalanche Forecast, 2025-03-15, 5 pm](https://www.slf.ch/fileadmin/avalanche_bulletin/pdf/2025/03/Bulletin_2025-03-15_17-00_en.pdf?time=1751982262)

<a id="ref10"></a>[10] [SLF - Interpretation Guide (PDF)](https://www.slf.ch/fileadmin/user_upload/SLF/Lawinenbulletin_Schneesituation/Wissen_zum_Lawinenbulletin/Interpretationshilfe/Interpretationshilfe_EN.pdf), p. 32.

<a id="ref11"></a>[11] Pérez-Guillén, C., Techel, F., Hendrick, M., Volpi, M., van Herwijnen, A., Olevski, T., Obozinski, G., Pérez-Cruz, F., Schweizer, J. (2022). Weather, snowpack and danger ratings data for automated avalanche danger level predictions.  EnviDat.  https://www.doi.org/10.16904/envidat.330.

<a id="ref12"></a>[12] [SLF - Description of automated stations](https://www.slf.ch/en/avalanche-bulletin-and-snow-situation/measured-values/description-of-automated-stations/)

<a id="ref13"></a>[13]  Pérez-Guillén, C., Techel, F., Hendrick, M., Volpi, M., van Herwijnen, A., Olevski, T., Obozinski, G., Pérez-Cruz, F., and Schweizer, J.: Data-driven automated predictions of the avalanche danger level for dry-snow conditions in Switzerland, Nat. Hazards Earth Syst. Sci., 22, 2031–2056, https://doi.org/10.5194/nhess-22-2031-2022, 2022.
