# Avalanche Data Scraper

So I have wanted to assemble some of the data from CAIC for awhile now, and have decided to use the Covid-19 situation of 2020 as an excuse to get going on this now that I have more free time.

I am mostly concerned with what the last few years have looked like for Colorado in terms of avalanche danger and how that has corresoponded to injuries and/or fatalities. CAIC does a really good job reporting on avalanche conditions but unfortunately it is hard to find data from past reports assembed in one, easy-to-find location. I built a little scraper to pull the data from CAIC so that I can/ assemble it somewhere and play around with it a little.

For anyone unfamiliar with CAIC. This is the Colorado Avalanche Information Center. They do the avalanche research and reports in Colorado. This data is used by CDOT and backcountry explorers. https://avalanche.state.co.us/

The forecasts are split into 10 zones, identified by zone_id.

    zone_0 = Steamboat & Flat Tops
    zone_1 = Front Range
    zone_2 = Vail & Summit County
    zone_3 = Sawatch
    zone_4 = Aspen
    zone_5 = Gunnison
    zone_6 = Grand Mesa
    zone_7 = North San Juan
    zone_8 = South San Juan
    zone_9 = Sangre de Cristo

## Requirements

To run this, install lxml to parse html from CAIC's website

In [6]:
import requests
import lxml.html as lh
from helpers import string_to_datetime, find_dates, danger_to_int
import time

# Assign url and specify zone_id
def getAviData(zone_id):
    url = 'https://avalanche.state.co.us/caic/pub_bc_avo.php?zone_id=' + str(zone_id)
    csv_file = "zone_" + str(zone_id) +"_aviDanger" + ".csv"
    with open(csv_file, "a") as file:
        file.write("date, danger_below, danger_near, danger_above" + "\n")

    # This code retrieves all the report_ids that have been archived
    response = requests.get(url)
    tree = lh.document_fromstring(response.content)
    value_script = tree.xpath("//div[@id='avalanche-forecast']/div[8]/ul/li[1]/script/text()")
    reports = find_dates(value_script)

    # Iterate over dates using nested loop year 2013 through 2020
    # This can be adjusted if just looking for specific dates. The current data goes back to 2013 though
    # This code builds the requests we are going to post
    # I had to submit a request for each report so that I could load the page associated with the report and then
    # parse the html from each page
    years = ['2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020']

    for year in years:
        for day in reports[year]['values']:
            form_data = {
                    '_qf__arc_form' : '',
                    'arc_bc_avo_fx_sel[0]': year,
                    'arc_bc_avo_fx_sel[1]': day,
            }

            response = requests.post(url, data=form_data)

            # Occasionally I would overload CAIC's server and get kicked off, so this bit of code 
            # gives their page a quick break if the page does not load

            # if the request does not yield a 200 status code, wait 1 minute and try again
            # this should keep the server from getting overwhelmed
            while response.status_code != 200:
                time.sleep(60)
                response = requests.post(url, data=form_data)

            # Sweet! So now we can get start parsing html
            # GATHER DATA:

            tree = lh.document_fromstring(response.content)

            # Here we snag the date and time of the reports and use a "string_to_datetime" helper function to get 
            # the data in a form that will be more useful later. I used a sql friendly format in case I wanted to 
            # save my data in a personal database
            date_time = string_to_datetime(tree.xpath("//div[@id='avalanche-forecast']/table[1]/thead/tr/td[1]/h2/text()")[0])

            # Snag the avi_danger rating
            # avi_danger(above, near, and at below treeline) saved as list in that order
            danger_above = danger_to_int(tree.xpath("//div[@id='avalanche-forecast']/table[1]/tbody/tr[1]/td[3]/strong/text()")[0])
            danger_near = danger_to_int(tree.xpath("//div[@id='avalanche-forecast']/table[1]/tbody/tr[1]/td[5]/strong/text()")[0])
            danger_below = danger_to_int(tree.xpath("//div[@id='avalanche-forecast']/table[1]/tbody/tr[3]/td[3]/strong/text()")[0])

            # save the data to a text file in the format ('2013-12-23 16:23:00', [3, 3, 2]) line by line
            with open(csv_file, "a") as file:
                file.write(str(date_time)+ "," + str(danger_below) + "," + str(danger_near) + "," + str(danger_above) + "\n")

In [8]:
getAviData(6)

# Connecting Burials to Avalanche Data

Now that I can gather and save avalanche data I am ready to gather data concerning avalanche accidents related to the backcoutry.

All the data for accidents over the past seven years I have saved in a csv file. The data is organized as follows:
          
          date, zone_id, activity, number caught, number buried, number killed
          
Activities are abreviated:

    sm = snowmobiling
    sk = skiing
    sb = snowboarding
    sc = snow cat
    ss = snow shoe
    ft = on foot (such as in climbing)
    ** = other such as shoveling ones roof off

In [40]:
import csv
with open("accidents_.csv", newline = "") as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print("date: " + row['\ufeffdate'],
              "zone id: " + row['zone_id'],
              "activity: " + row['activity'], 
              "caught: " + row['caught'],
              "burried: " + row['buried'],
              "killed: " + row['killed'])           

date: 2017-02-14 zone id: 0 activity: sm caught: 2 burried: 1 killed: 1
date: 2017-01-12 zone id: 0 activity: sk caught: 1 burried: 0 killed: 0
date: 2019-12-15 zone id: 0 activity: sb caught: 2 burried: 1 killed: 0
date: 2016-12-11 zone id: 0 activity: sm caught: 1 burried: 2 killed: 0
date: 2013-12-31 zone id: 0 activity: sb caught: 1 burried: 1 killed: 1
date: 2019-12-08 zone id: 0 activity: sk caught: 1 burried: 1 killed: 1
date: 2020-01-22 zone id: 1 activity: sk caught: 1 burried: 0 killed: 0
date: 2018-10-15 zone id: 1 activity: ft caught: 1 burried: 0 killed: 0
date: 2013-11-21 zone id: 1 activity: ft caught: 1 burried: 0 killed: 0
date: 2020-01-19 zone id: 1 activity: ft caught: 1 burried: 0 killed: 0
date: 2013-11-24 zone id: 1 activity: ft caught: 1 burried: 0 killed: 0
date: 2018-02-11 zone id: 1 activity: sk caught: 1 burried: 0 killed: 0
date: 2016-02-21 zone id: 1 activity: sb caught: 1 burried: 0 killed: 0
date: 2018-12-18 zone id: 1 activity: sk caught: 1 burried: 0 ki

## Zone 1: Steamboat & Flat Tops