# Part II: Webscraping

# Read me

This is part II of our Webscraping project:

Planning a winter time hikes can be complicated in Switzerland. Sometimes hiking in the snow is an exciting adventure, but it can also be an unexpected surprise. There is no shortage of trail databases that contain information about how long or how difficult a trail is, however finding up to date information about snowfall data on these trails can be tedious. This notebook solves the snowline problem by scraping GPX data from bergfex.com and meteocentrale.ch. 

Weather GPS data is obtained from multiple weather towers across Switzerland that publicly provide current weather data. Since these towers are located in a specific latitude and longitude, an approximate chance of snowfall was assumed based on the distance from the tower and altitude. If a trail is within a certain distance from a tower, and is also at a certain altitude then it will be tagged with a snow alert. This was implemented by using the nearest neighbor algorithm, which finds the closest hiking trails to a tower that has reported snow.

## 1 Prerequisites: Installs and Imports

In [1]:
# unhash and run the below line once
#conda install -c anaconda beautifulsoup4

In [2]:
import requests
import re
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

## 2 Data Scraping: snow information

In [3]:
### initializing the future colums of our dataframe with empty lists

snowlevel = []  # height in cm
location = []  # town and elevation of town

### only scrape one page (no looping over several pages necessary as not so many data points available)
link = 'http://www.meteocentrale.ch/de/wetter/hitlisten/schneehoehen.html'
page = requests.get(link, timeout=10)
print(page.status_code)
soup = BeautifulSoup(page.content, "html.parser")  # bs4.BeautifulSoup object
hitlist = soup.findAll('table', {'class': 'hitlist'})  #bs4.element.ResultSet

200


In [4]:
#soup

In [5]:
#hitlist

## 3 Filtering for the necessary information
Village Name, Elevation, Snow Level

In [6]:
### get location
location_item = hitlist[0].findAll('a')
location.append([info.get_text().strip() for info in location_item])
loc = location[0]
#len(loc)
#loc

### get snowlevel
snowlevel_item = hitlist[0].findAll('td', {'class': 'value'}) 
snowlevel.append([info.get_text().strip() for info in snowlevel_item])
#len(snowlevel)
snow = snowlevel[0]
#len(snow)
#snow

### combine into DF
heights_df = pd.DataFrame({'location': loc,'snowlevel': snow})
heights_df

Unnamed: 0,location,snowlevel
0,"Grimsel-Hospiz, 1980 m",238 cm
1,"Glattalp, 1858 m",215 cm
2,"Melchsee-Frutt, 1940 m",170 cm
3,"Maloja/Maloggia, 1799 m",132 cm
4,"Les Attelas, 2733 m",131 cm
...,...,...
109,"Wassen, 920 m",0 cm
110,"Wasserauen, 868 m",0 cm
111,"Zollikofen, 553 m",0 cm
112,"Zürich-Affoltern, 443 m",0 cm


## 4 Data Manipulation

In [7]:
##remove cm, convert to 'int'
heights_df['snowlevel_in_cm']=pd.Series(heights_df['snowlevel']).str.replace(" cm", '')
heights_df['snowlevel_in_cm']=pd.Series(heights_df['snowlevel_in_cm']).astype(int)
#type(heights_df['snowlevel_in_cm'][0])
#heights_df

## split location into 'village' and 'elevation of village' and merge with previous DF
split_loc = pd.Series(heights_df['location']).str.split(',',n=2,expand = True)
merged_df = pd.merge(heights_df, split_loc, left_index=True, right_index=True)
#merged_df

## drop unused columns, rename final columns, remove unit, sort columns
intermediate_df = merged_df.iloc[:,[2,3,4]].copy()
intermediate_df.columns = ['snowlevel_in_cm', 'location', 'height_in_m']
intermediate_df['height_in_m']=pd.Series(intermediate_df['height_in_m']).str.replace(" m", '')
final_df = pd.DataFrame(intermediate_df, columns = ['location', 'height_in_m', 'snowlevel_in_cm'])
final_df

Unnamed: 0,location,height_in_m,snowlevel_in_cm
0,Grimsel-Hospiz,1980,238
1,Glattalp,1858,215
2,Melchsee-Frutt,1940,170
3,Maloja/Maloggia,1799,132
4,Les Attelas,2733,131
...,...,...,...
109,Wassen,920,0
110,Wasserauen,868,0
111,Zollikofen,553,0
112,Zürich-Affoltern,443,0


In [8]:
### Export into csv

# change the file_path to your path
file_path = '/Users/sd/Documents/0-Coding/3-Propulsion/GitHub_Projects/1_Webscraping_Bergfex/snow_level.csv'
final_df.to_csv(file_path, index = False)

## Continue in notebook part III