## Collecting Avalanche Forecasts

Avalanche forecasts are quantized risk factors that are determined by the avalanche forecast professionals of the UAC. The risk factors range between 1-5 in severity: Low (1, Green), Moderate (2, Yellow), Considerable (3, Orange), High (4, Red), and Extreme! (5, Black). These are calculated by weighing various avalanche problems such as large storms which leads to storm slabs, persistant weak layers from weather variations, and sudden warming conditions which leads to wet slides. Each avalanche problem posted includes the information:

- Type (Rising Temps, New Snow, Depth Hoar)
- Location (Aspect and elevation in form of the compass rose. )
- Likelihood (5-point scale from Unlikely (1) to Likely (3) to Certain (5))
- Size (5-point scale from Small (1) to Medium (3) to Large (5))
- Description

This is the primary format of avalanche centers throughout the states and would be the most familiar one to present the findings and assesments we hope to generate from observations in a given day. To pick apart this we need to first figure out how to web scrape the Forecast page and the Archived forecast page which has forecasts accross the state starting from 2001 (https://utahavalanchecenter.org/archives/forecasts). The current forecasts are also displayed here.  

### Forecast List

Using the archive link (https://utahavalanchecenter.org/archives/forecasts) we can get forecasts from the present day to 2018. 

In [None]:
avy_center_url = 'http://utahavalanchecenter.org'
forecast_archive_url = avy_center_url + '/archives/forecasts'
page = requests.get(forecast_archive_url)
soup = BeautifulSoup(page.content, 'html.parser')
tbl = soup.find("table")
#print(tbl.dtype)
page_forecasts = pd.read_html(str(tbl),extract_links ='all')[0]
page_forecasts.head()

This is practically the same as collecting observation data except we just substitute the different urls. We had get_table_obs() before so lets make a get_forecasts()

In [None]:
def get_page_table(url):
    '''returns a dataframe of avalanche observations from url. Data in df
    includes Date, Region, Avalanche/Observation, (url) extension, and
    observor'''
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    tbl=soup.find("table")
    page_obs = pd.read_html(str(tbl), extract_links='all')[0]
    page_obs = clean_page_obs(page_obs)
    return page_obs

def clean_page_obs(page_obs):
    '''Cleans up the html parser's interpretation of the observation list'''
    old_columns = page_obs.columns
    #change names 
    page_obs[['Date', 'a']]= pd.DataFrame(page_obs[old_columns[0]].tolist(), index=page_obs.index)
    page_obs[['Region', 'b']]= pd.DataFrame(page_obs[old_columns[1]].tolist(), index=page_obs.index)
    page_obs[['Observation Title', 'extension']]= pd.DataFrame(page_obs[old_columns[2]].tolist(), index=page_obs.index)
    page_obs[['Observer', 'd']]= pd.DataFrame(page_obs[old_columns[3]].tolist(), index=page_obs.index)
    #remove old columns & columns with none
    page_obs=page_obs.drop(old_columns, axis=1)
    page_obs=page_obs.drop(['a','b','d'], axis=1)
    return page_obs

page_obs = get_page_obs(observations_url)
page_obs.head()