# Weather Scraping

The following document outlines how the weather data was collected for the project.
I want to thank [Motor Sport Total]("https://www.motorsport-total.com/") for collecting this data during the various Grand Prix's. Without it, the weather data would have been far less accurate.

## Setup & Helper Methods

In [226]:
# Import modules
import requests
import csv
from bs4 import BeautifulSoup as bs
import os

# Helper methods
def average(string):
    if "-" in string:
        string = string.split("-")
        if "°" in string[1]:
            string[1] = string[1].replace("°", "")
        avg = round((int(string[0]) + int(string[1]))/2)
        return str(avg)
    else:
        return string

def average_air(string):
    if "." in string:
        string = string.replace(".","")
    if "-" in string:
        string = string.split("-")
        avg = round((int(string[0]) + int(string[1]))/2)
        return str(avg)
    else:
        return string

def format_wind(string):
    result = string.split('m/s')
    if ',' in result[0]:
        result[0] = result[0].replace(',', '')
    if ',' in result[1]:
        result[1] = result[1].replace(',', '')
    speed = result[0].split()[0].strip()
    direction = result[1].strip()
    if "-" in speed:
        speed = average(speed)
    return [speed, direction]

# Test function(s) 2,0-4,5 m/s, Nordwest
average("35-48°")

'42'

In [232]:
def convert_date(string):
    temp = string.split()
    day = temp[0].replace(".", "")
    year = temp[2]
    month = "XX"
    if temp[1] == "Januar":
        month = "01"
    if temp[1] == "Februar":
        month = "02"
    if temp[1] == "März":
        month = "03"
    if temp[1] == "April":
        month = "04"
    if temp[1] == "Mai":
        month = "05"
    if temp[1] == "Juni":
        month = "06"
    if temp[1] == "Juli":
        month = "07"
    if temp[1] == "August":
        month = "08"
    if temp[1] == "September":
        month = "09"
    if temp[1] == "Oktober":
        month = "10"
    if temp[1] == "November":
        month = "11"
    if temp[1] == "Dezember":
        month = "12"
    return year + "-" + month + "-" + day

def convert_wind(wind):
    
    

convert_wind("südlich drehend")

'2007-05-27'

## Track information
Here track information is placed that allows the function below to scrape.

In [234]:
# Track information that we want to scrape

ital_url = ["monza", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-italien/rennen"]
mont_url = ["de-monaco", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-monaco/rennen"]
brit_url = ["silverstone", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-grossbritannien/rennen"]
belg_url = ["de-spa", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-belgien/rennen"]
braz_url = ["interlagos", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-brasilien/rennen"]
hung_url = ["hungaroring", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-ungarn/rennen"]
aust_url = ["albert-park", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-australien/rennen"]
span_url = ["catalunya", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-spanien/rennen"]

dates = ['2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019'] 
location_urls = [aust_url, span_url]
#location_urls = [ital_url, mont_url, brit_url, belg_url, braz_url, hung_url, aust_url, span_url]

## Scrape Function
The brains of the collection. Here, we convert the various urls and scrape information. Some data is modified in order to be more useful for future analysis.

In [235]:
def scrape_weather():
    # Create empty csv for writing
    file = open('data/track_weather.csv', 'w', newline='')
    writer = csv.writer(file)
    writer.writerow(['track', 'date', 'local_time', 'weather', 'temp', 'track_temp', 'humidity', 'air_pressure', 'wind_speed', 'wind_direction'])
    
    for raw_url in location_urls:
        track = raw_url[0]
        print("INFO: NOW SCRAPING " + raw_url[0])
        
        for year in dates:
            # Create response & find raw weather data
            raw_url[2] = year
            url = raw_url[1] + raw_url[2] + raw_url[3]       
            response = requests.get(url)
            soup = bs(response.content, 'html.parser')
            raw_weather = soup.find('div', {'id': 'session-info'}).find_all('td')

            # Assign raw to actual variables
            date = raw_weather[1].text.strip()
            local_time = raw_weather[5].text.strip() + ":00"
            weather = raw_weather[13].text.strip()
            temp = raw_weather[15].text.strip()[:-2]
            track_temp = raw_weather[21].text.strip()[:-2]
            humidity = raw_weather[17].text.strip()[:-1]
            air_pressure = raw_weather[19].text.strip()[:-5]
            wind = raw_weather[23].text.strip()

            # Format particular data
            date = convert_date(date)
            temp = average(temp)
            track_temp = average(track_temp)
            humidity = average(humidity)
            air_pressure = average_air(air_pressure)
            wind_info = format_wind(wind)
            wind_speed = wind_info[0]
            wind_direction = wind_info[1]

            # Write to file
            writer.writerow([track, date, local_time, weather, temp, track_temp, humidity, air_pressure, wind_speed, wind_direction])
            print("LOG: " + year + " FINISHED")
    
    print("INFO: COMPLETED SCRAPING WEB DATA")
    file.close()

## Calling Function
Now the function is called, producing the data set used in the assignment.

In [237]:
scrape_weather()

INFO: NOW SCRAPING albert-park
LOG: 2007 FINISHED
LOG: 2008 FINISHED
LOG: 2009 FINISHED
LOG: 2010 FINISHED
LOG: 2011 FINISHED
LOG: 2012 FINISHED
LOG: 2013 FINISHED
LOG: 2014 FINISHED
LOG: 2015 FINISHED
LOG: 2016 FINISHED
LOG: 2017 FINISHED
LOG: 2018 FINISHED
LOG: 2019 FINISHED
INFO: NOW SCRAPING catalunya
LOG: 2007 FINISHED
LOG: 2008 FINISHED
LOG: 2009 FINISHED
LOG: 2010 FINISHED
LOG: 2011 FINISHED
LOG: 2012 FINISHED
LOG: 2013 FINISHED
LOG: 2014 FINISHED
LOG: 2015 FINISHED
LOG: 2016 FINISHED
LOG: 2017 FINISHED
LOG: 2018 FINISHED
LOG: 2019 FINISHED
INFO: COMPLETED SCRAPING WEB DATA
