# Investigation of Avalanche Tendencies:
## Authors: Lucas Crichton, Omer Tahir

## Abstract:

## Introduction
Within this investigation, data from North America and Europe was explored to reveal potential trends among avalanche occurrences. The data used was provided by Avalanche Canada,the Colorado Avalanche Information Center (CAIC) and European Avalanche Warning Services (EAWS). Firstly, an exploratory analysis will be conducted to investigate the relationship between the type of activity performed at the time of the avalanches and the number of deaths caused by the avalanches. Next We will investigate whether the number of avalanche deaths are relatively even throughout the ski season or whether there is a time of the year where deadly avalanches are more common. 

## Sources:
- “Avalanche.org " Accidents.” Avalanche.org, Colorado Avalanche Information Center, 5 Feb. 2020, https://avalanche.org/avalanche-accidents/. 
- “Fatalities.” EAWS, 25 Nov. 2021, https://www.avalanches.org/fatalities/fatalities-20/. 
- “Historical Incidents.” Avalanche Canada, https://www.avalanche.ca/incidents. 


## Preparing the Data:
firstly, we must prepare the data so that our data frame for our analysis contains data from all 3 sources and all necessary variables.

## Installing Necessary Packages:

In [8]:
import matplotlib.pyplot as plt
import numpy as np
import json
import os
import urllib.request
import pandas as pd
import time
import requests
import io
import zipfile
import warnings
from itertools import chain

%matplotlib inline

## Scraping Data off the web

We will begin by scraping data for avalanche accidents across different regions such as Canada, the United States and Europe

### Extracting Avalanche Data (Canada):

In [57]:
# We can get more information about these incidents e.g. "https://www.avalanche.ca/incidents/37d909e4-c6de-43f1-8416-57a34cd48255"
# this information is also available through the API
def get_incident_details(id):
    url = "http://incidents.avalanche.ca/public/incidents/{}?format=json".format(id)
    req = urllib.request.Request(url)
    with urllib.request.urlopen(req) as response:
        result = json.loads(response.read().decode('utf-8'))
    return(result)

incidentsfile = "https://datascience.quantecon.org/assets/data/avalanche_incidents.csv"

# To avoid loading the avalanche Canada servers, we save the incident details locally.
if (not os.path.isfile(incidentsfile)):
    incident_detail_list = incidents_brief.id.apply(get_incident_details).to_list()
    incidents = pd.DataFrame.from_dict(incident_detail_list, orient="columns")
    incidents.to_csv(incidentsfile)
else:
    incidents = pd.read_csv(incidentsfile)
    
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
  display(incidents.head())

Unnamed: 0,id,ob_date,location,location_desc,location_coords,location_coords_type,location_elevation,location_province,num_involved,num_injured,num_fatal,comment,group_activity,avalanche_obs,weather_obs,weather_comment,snowpack_obs,snowpack_comment,documents
0,8bc4720d-498c-4793-81ef-c43db9f36ca4,2021-11-27,"Sunshine Bowl, Hasler Area",Approx. 17km East of Powder King ski area,"[55.366223, -122.34096]",Lat/lng,1700.0,BC,3.0,0.0,1,A party of four were snowmobiling in Sunshine ...,Snowmobiling,"[{'size': '3.0', 'type': 'S', 'trigger': 'Ma',...","{'temp_present': None, 'temp_max': None, 'temp...","Overcast, windy conditions were reported with ...","{'hs': None, 'hn24': None, 'hst': None, 'hst_r...",A snow profile near the avalanche on the follo...,"[{'date': '2021-11-30', 'title': 'Scene photo'..."
1,6a3a4698-d047-4082-bdea-92f4db7e63bf,2021-05-30,Mount Andromeda-Skyladder,Approximately 96km SE of Jasper,"[52.17836, -117.24785]",Lat/lng,3075.0,AB,2.0,0.0,2,A party of two people were climbing the Skylad...,Mountaineering,"[{'size': '2.5', 'type': 'S', 'trigger': 'Sa',...","{'temp_present': None, 'temp_max': 8.0, 'temp_...",,"{'hs': None, 'hn24': None, 'hst': None, 'hst_r...",,"[{'date': '2021-06-01', 'title': 'Mt Andromeda..."
2,ba14a125-29f7-4432-97ad-73a53207a5e7,2021-04-05,Haddo Peak,Approximately 6km SW of Lake Louise Village,"[51.38329, -116.23453]",Lat/lng,2950.0,AB,2.0,0.0,1,A party of two people were ski touring up the ...,Skiing,"[{'size': '2.0', 'type': 'S', 'trigger': 'Sa',...","{'temp_present': None, 'temp_max': None, 'temp...",,"{'hs': None, 'hn24': None, 'hst': None, 'hst_r...",,"[{'date': '2021-04-05', 'title': 'Overview pho..."
3,59023c05-b679-4e9f-9c06-910021318663,2021-03-29,Eureka Peak,Approximately 100km east of Williams Lake,"[52.33517, -120.69033]",Lat/lng,2170.0,BC,1.0,0.0,1,A group of snowmobilers rode to the upper reac...,Snowmobiling,"[{'size': '2.5', 'type': 'CS', 'trigger': 'Sa'...","{'temp_present': None, 'temp_max': None, 'temp...",,"{'hs': None, 'hn24': None, 'hst': None, 'hst_r...",,"[{'date': '2021-04-01', 'title': 'Overview', '..."
4,10774b2d-b7de-42ac-a600-9828cb4e6129,2021-03-04,Reco Mountain,Approximately 13km east of New Denver,"[49.99979, -117.18904]",Lat/lng,2465.0,BC,1.0,0.0,1,A group of five snowmobilers was riding in Ant...,Snowmobiling,"[{'size': '3.0', 'type': 'S', 'trigger': 'Ma',...","{'temp_present': None, 'temp_max': None, 'temp...",,"{'hs': None, 'hn24': None, 'hst': None, 'hst_r...",,"[{'date': '2021-03-05', 'title': 'Scene Overvi..."


### Cleaning and Finalizing the scraped data

* We begin by renaming the categories in the `group_activity` column to increase interpretability.
* Next, we extract one of the dataframes nested inside a column of the `incidents` dataframe.
* We then concat the `group_activity` column to the `cleaned_incidents` dataframe.
* Select only useful columns from the `incidents` dataframe and merge it with the `cleaned_incidents` dataframe to create the `ca_incidents` dataframe.


In [65]:
# clean up activity names
skiings = ['Skiing', 'Skiing/Snowboarding', 'Snowboarding', 'Backcountry Skiing', 'Ski touring', 'Heliskiing',
           'Mechanized Skiing', 'Out-of-bounds Skiing', 'Lift Skiing Closed', 'Lift Skiing Open', 'Out-of-Bounds Skiing']

mountaineering_and_climbing = ['Mountaineering', 'Snow Biking', 'Snowshoeing', 'Ice Climbing', 'Snowshoeing & Hiking']

snowmobiling = ['Snowmobiling']

non_leisure = ['Work', 'At Outdoor Worksite', 'Control Work', 'Inside Building', 'Car/Truck on Road',
               'Inside Car/Truck on Road', 'Outside Building']

other_or_unknown= ['Other Recreational', 'Hunting/Fishing', 'Unknown',]

def activities_can(s):
    """
    This function is used to clean the group_activity column.
    It takes a string as input and if similar to any of the specified 
    group activities, assigns the output accordingly.
    This way we have more general groups which are easier to interpret.
    """
    if s in skiings:
        return "Skiing"
    elif s in mountaineering_and_climbing:
        return "Mountaineering/Climbing"
    elif s in snowmobiling:
        return "Snowmobiling"
    elif s in non_leisure:
        return "Non-Leisure Activities"
    else:
        return "Other/Unknown"

incidents['group_activity'] = incidents['group_activity'].apply(activities_can)

# pd.DataFrame(chain.from_iterable(incidents.avalanche_obs)).replace(r'^s*$', float('NaN'), regex = True).dropna()
cleaned_incidents = (pd.DataFrame(chain.from_iterable(incidents.avalanche_obs))
                     .drop(columns=['observation_date'])
                    )

ca_incidents = (incidents
                .iloc[:,[1,6,7,8,9,10,12]]
                .merge(cleaned_incidents, left_index=True, right_index=True)
               )

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
  display(ca_incidents.head())

Unnamed: 0,ob_date,location_elevation,location_province,num_involved,num_injured,num_fatal,group_activity,size,type,trigger,aspect,elevation,slab_width,slab_thickness
0,2021-11-27,1700.0,BC,3.0,0.0,1,Snowmobiling,3.0,S,Ma,NE,1700.0,350.0,60.0
1,2021-05-30,3075.0,AB,2.0,0.0,2,Other/Unknown,2.5,S,Sa,N,3075.0,60.0,75.0
2,2021-04-05,2950.0,AB,2.0,0.0,1,Skiing,2.0,S,Sa,E,2950.0,40.0,50.0
3,2021-03-29,2170.0,BC,1.0,0.0,1,Snowmobiling,2.5,CS,Sa,E,2170.0,50.0,
4,2021-03-04,2465.0,BC,1.0,0.0,1,Snowmobiling,3.0,S,Ma,W,2465.0,125.0,85.0


In [7]:
incidents.columns

Index(['id', 'ob_date', 'location', 'location_desc', 'location_coords',
       'location_coords_type', 'location_elevation', 'location_province',
       'num_involved', 'num_injured', 'num_fatal', 'comment', 'group_activity',
       'avalanche_obs', 'weather_obs', 'weather_comment', 'snowpack_obs',
       'snowpack_comment', 'documents'],
      dtype='object')

## Extracting avalanche accidents in the US 

In [8]:
from bs4 import BeautifulSoup
import requests
from urllib.request import Request, urlopen

site = "https://avalanche.org/avalanche-accidents/"

# This is done to prevent 'HTTPError: HTTP Error 403: Forbidden'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site,headers=hdr)
page = urlopen(req)

# Prepare soup to access the source code
soup = BeautifulSoup(page)

# Scrape the source code to access the source containing the tables
soup.find('div', class_='content-area').iframe

# Read the cleaned up source and convert it into dataframes 
df = pd.read_html('https://avalanche.state.co.us/caic/acc/acc_us.php', parse_dates=True)

# Only select the useful tables
df = df[1::2]

# Clean the tables and merge them into one single dataframe representing cases in the US
def format_date_col(s, year):
    """
    This function is used to clean the date columns.
    It takes a string and cleans the string by removing the dagger sign and
    adds the year to the date string.
    """
    month = s.replace('†','').replace('/','-')
    year = str(year) + '-'
    return year+month

years = (2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009)
for data, yr in zip(df, years):
    data['Date'] = data['Date'].apply(format_date_col, args=[yr])
    
us_incidents = pd.concat(df).reset_index().drop(columns = ["index"])

us_incidents

Unnamed: 0,Date,State,Location,Description,Killed
0,2021-12-17,ID,"Ryan Peak, Idaho",1 skier and 1 snowmobiler killed,2
1,2021-12-11,WA,"Silver Basin, closed portion of Crystal Mounta...",6 backcountry tourers caught and 1 killed,1
2,2020-05-13,AK,"Ruth Glacier, Denali National Park and Preserve","2 climbers caught in serac fall, 1 killed",1
3,2020-03-27,AK,Matanuska Glacier,1 heliskier killed,1
4,2020-03-22,CO,Lime Creek south of Edwards,"2 sidecountry skiers caught, 1 buried and killed",1
...,...,...,...,...,...
269,2009-01-06,CO,Battle Mountain - outside Vail Mountain ski area,"1 snowboader caught, partially buried critical...",1
270,2009-01-03,MT,"Scotch Bonnet Mountain, near Lulu Pass","1 Snowmobiler caught, buried, and killed",1
271,2009-01-02,OR,Near Paulina Peak,"1 Snowmobiler caught, buried, and killed",1
272,2009-12-17,ID,"Rock Lake, Cascade, Idaho","2 snowmobilers caught, buried, 1 rescued, 1 ki...",1


## Extracting avalanche accidents in Europe

In [58]:
# Make a list of urls to be read
url1 = "https://www.avalanches.org/fatalities/"
url2 = "https://www.avalanches.org/fatalities/fatalities-20/"
url3 = "https://www.avalanches.org/fatalities/fatalities-19/"
urls = [url1, url2, url3]

# Scrape the tables from each url and make a list of the tables
df = [pd.read_html(url, parse_dates=True) for url in urls]

# Make a list of the dataframes within the table list and concat them together to form a single dataframe
df = [df[0][0], df[1][0], df[2][0]]
eu_incidents = pd.concat(df)

eu_incidents

Unnamed: 0,ID,Location,Country,Date,...,Group Size,Avalanche Comment,Incident Comment,Type
0,2782,Mentet,Spain,2021-11-28 00:00:00,...,1.0,"""Destructive Avalanche Size of 2.5""","""Completely buried. Fatal result. Re-analisis ...",Mountaineering/Climbing
1,2781,"Val d\'Ayas, Gran Sommettaz",Italy,2021-11-29 12:05:00,...,,,,Off-piste skiing
2,2783,La Thuille,Italy,2021-12-07 13:09:00,...,3.0,,,Backcountry skiing
3,2785,Monte Sorbetta,Italy,2021-12-16 13:04:00,...,2.0,,,Backcountry skiing
0,1814,Großvenediger,Austria,2020-10-10 00:00:00,...,1.0,,,Mountaineering/Climbing
...,...,...,...,...,...,...,...,...,...
35,149,Mont Brûlé,Switzerland,2020-05-08 12:00:00,...,,,,
36,87,Tofana di Rozes - rifugio Giussani,Italy,2020-05-09 09:30:00,...,2.0,,,Off-piste skiing
37,108,Pizzo del Diavolo/Canalone della Malgina,Italy,2020-05-12 10:15:00,...,1.0,,,Off-piste skiing
38,180,Gråfonnfjellet,Norway,2020-05-24 12:00:00,...,3.0,,,Backcountry skiing
