# Avalanche Data Scraper

So I wanted to assemble some of the data from CAIC for awhile now, and decided to use the Covid-19 situation of 2020 as an excuse to get going on this.

I am mostly concerned with what the last few years have looked like for Colorado in terms of avalanche danger and how that has corresoponded to injuries and/or fatalities. CAIC does a really good job reporting on avalanche conditions but unfortunately it is hard to find data from past reports assembed in one, easy-to-find location. I built a little scraper to pull the data from CAIC so that I can/ assemble it somewhere and play around with it a little.

For anyone unfamiliar with CAIC. This is the Colorado Avalanche Information Center. They do the avalanche research and reports in Colorado. This data is used by CDOT and backcountry explorers. https://avalanche.state.co.us/

The forecasts are split into 10 zones, identified by zone_id.

    zone_0 = Steamboat & Flat Tops
    zone_1 = Front Range
    zone_2 = Vail & Summit County
    zone_3 = Sawatch
    zone_4 = Aspen
    zone_5 = Gunnison
    zone_6 = Grand Mesa
    zone_7 = North San Juan
    zone_8 = South San Juan
    zone_9 = Sangre de Cristo

## The scraper

The scraper I wrote works by iterating throught the archives on CAIC's website, pulling the data of interest. If you take a look at the **helpers.py** file you can see a function called **getAviData()**. The function takes a zone id as an argument and then gathers all the data on avi danger available from the CAIC archives. The data is saved as a csv in the format **"[YYY-)MM-DD]_[zone_id]_aviDanger.csv"**. If I pulled the data from zone 0 on September 10th, 2020, the file would be saved as **2020-09-10_zone_0_aviDanger.csv**. After the data is pulled it is saved in my personal archive, a folder named **data**. 

## The Avi Danger Data

The data I pulled from CAIC was the rating for danger given that day. Danger is assessed by CAIC avalanche forecasters everyday during the avi season. The danger is rated at three different altitudes with a rating between 0 and 5.  

        0 = No Rating
        1 = Low
        2 = Moderate
        3 = Considerable
        4 = High
        5 = Extreme

Here's an example of a forecast:
![](./images/avi_forecast_example.png)

After I pulled the data this is what a csv file looks like:
![](./images/data_frame_aviDanger_example.png)

# Connecting Burials to Avalanche Data

Now that I can gather and save avalanche data I am ready to gather data concerning avalanche accidents related to the backcoutry.

All the data for accidents over the past seven years I have saved in a csv file. The data is organized as follows:
          
          date, zone_id, activity, number caught, number buried, number killed
          
Activities are abreviated:

    sm = snowmobiling
    sk = skiing
    sb = snowboarding
    sc = snow cat
    ss = snow shoe
    ft = on foot (such as in climbing)
    ** = other such as shoveling ones roof off

## Zone 1: Steamboat & Flat Tops

In [1]:
import pandas as pd
import numpy as np
%matplotlib inline

In [2]:
example = pd.read_csv("./2020-09-11_zone_0_aviDanger.csv")
df_example = pd.DataFrame(data=example)

In [5]:
df_example.head(10
               )

Unnamed: 0,date,danger_below,danger_near,danger_above
0,2013-12-31,2,2,2
1,2013-12-30,2,2,2
2,2013-12-29,2,2,2
3,2013-12-28,2,2,2
4,2013-12-27,2,2,2
5,2013-12-26,2,2,2
6,2013-12-25,2,2,3
7,2013-12-24,2,2,3
8,2013-12-23,2,3,3
9,2014-12-31,2,2,3


In [4]:
data_SteamBoat = pd.read_csv("./zone_0_aviDanger.csv")
df_Steamboat = pd.DataFrame(data=data_SteamBoat)

In [5]:
data = pd.read_csv("./accidents_.csv")
df_accidents = pd.DataFrame(data)

In [6]:
df_accidents.head(100)

Unnamed: 0,date,zone_id,activity,caught,buried,killed
0,2017-02-14,0,sm,2,1,1
1,2017-01-12,0,sk,1,0,0
2,2019-12-15,0,sb,2,1,0
3,2016-12-11,0,sm,1,2,0
4,2013-12-31,0,sb,1,1,1
...,...,...,...,...,...,...
89,2020-01-18,7,ft,1,1,1
90,2018-02-20,7,sb,2,1,0
91,2014-03-04,8,sk,1,1,1
92,2016-02-02,8,sm,2,1,1
