# Preparation of Weather Data

The following document outlines how the weather data was collected for the project.
I want to thank [Motor Sport Total](https://www.motorsport-total.com/) for collecting this data during the various Grand Prix's. Without it, the weather data would have been far less accurate.

> ***Note:*** This will only work by running the notebook in a virtual environment, and having the module 'googletrans' installed.

## Setup & Helper Methods

In [1]:
# Import modules
import requests
import csv
from bs4 import BeautifulSoup as bs
import os
from googletrans import Translator
translator = Translator()

In [3]:
# Helper methods
def average(string):
    if "°" in string:
        string = string.replace("°", "")
        
    if "C" in string.upper():
        string = string.upper().replace("C", "")
        
    if "-" in string:
        string = string.split("-")
        avg = round((float(string[0]) + float(string[1]))/2)
        return str(avg)
    else:
        return string

def average_air(string):
    if "." in string:
        string = string.replace(".","")
        
    if "-" in string:
        string = string.split("-")
        avg = round((int(string[0]) + int(string[1]))/2)
        return str(avg)
    else:
        return string

def format_wind(string):
    if 'm/s' in string:
        result = string.split('m/s')
    else:
        result = string.split('ms/s')
        
    if ',' in result[0]:
        result[0] = result[0].replace(',', '.')
        
    if ',' in result[1]:
        result[1] = result[1].replace(',', '')
        
    speed = result[0].split()[0].strip()
    direction = result[1].strip()
    
    if "-" in speed:
        speed = average(speed)
        
    return [speed, direction]

def convert_date(string):
    temp = string.split()
    day = temp[0].replace(".", "")
    year = temp[2]
    month = "XX"
    if temp[1] == "Januar":
        month = "01"
    if temp[1] == "Februar":
        month = "02"
    if temp[1] == "März":
        month = "03"
    if temp[1] == "April":
        month = "04"
    if temp[1] == "Mai":
        month = "05"
    if temp[1] == "Juni":
        month = "06"
    if temp[1] == "Juli":
        month = "07"
    if temp[1] == "August":
        month = "08"
    if temp[1] == "September":
        month = "09"
    if temp[1] == "Oktober":
        month = "10"
    if temp[1] == "November":
        month = "11"
    if temp[1] == "Dezember":
        month = "12"
    return year + "-" + month + "-" + day

def translate(string):
    return translator.translate(string, src='de', dest='en').text.lower()

# Test Function(s)
translate("Regen")

'rain'

## Track information
Here track information is placed that allows the function below to scrape.

In [4]:
# Track information that we want to scrape
aust = ["albert-park", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-australien/rennen"]
mala = ["sepang", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-malaysia/rennen"]
sach = ["sachir", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-bahrain/rennen"]
span = ["catalunya", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-spanien/rennen"]
mont = ["de-monaco", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-monaco/rennen"]
cana = ["montreal", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-kanada/rennen"]
fren = ["magny-cours", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-frankreich/rennen"]
brit = ["silverstone", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-grossbritannien/rennen"]
germ = ["valencia", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-europa/rennen"]
hung = ["hungaroring", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-ungarn/rennen"]
turk = ["istanbul", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-der-tuerkei/rennen"]
ital = ["monza", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-italien/rennen"]
belg = ["de-spa", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-belgien/rennen"]
japa = ["fuji", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-japan/rennen"]
chin = ["shanghai", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-china/rennen"]
braz = ["interlagos", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-brasilien/rennen"]
abud = ["abu-dhabi", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-abu-dhabi/rennen"]
usa1 = ["indianapolis", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-der-usa/rennen"]
usa2 = ["austin", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-der-usa/rennen"]
mexi = ["mexico", "https://www.motorsport-total.com/formel-1/ergebnisse/", "", "/grosser-preis-von-mexiko/rennen"]

# Specify information here
dates = ['2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019'] 
location_urls = [aust, mala, sach, span, mont, cana, usa1, usa2, fren, brit, germ, hung, turk, ital, belg, japa, chin, braz, abud, mexi]

# Debugging details
#dates = ['2010', '2011', '2012']
#location_urls = [chin]

## Scrape Function
The brains of the collection. Here, we convert the various urls and scrape information. Some data is modified in order to be more useful for future analysis.

In [91]:
def scrape_weather():
    # Create empty csv for writing
    file = open('data/track_weather.csv', 'w', newline='')
    writer = csv.writer(file)
    writer.writerow(['track', 'date', 'local_time', 'weather', 'temp', 'track_temp', 'humidity', 'air_pressure', 'wind_speed', 'wind_direction'])
    
    for raw_url in location_urls:
        track = raw_url[0]
        print("INFO: NOW SCRAPING " + raw_url[0])
        
        for year in dates:
            
            # Create response & find raw weather data
            raw_url[2] = year
            url = raw_url[1] + raw_url[2] + raw_url[3]       
            response = requests.get(url)
            soup = bs(response.content, 'html.parser')
            
            # Check response has content
            if soup.find('div', {'id': 'session-info'}) is None:
                print("WARN: NO TRACK WEATHER FOR YEAR " + year.upper())
                writer.writerow([track, year, "null", "null", "null", "null", "null", "null", "null", "null"])
            else:
                raw_weather = soup.find('div', {'id': 'session-info'}).find_all('td')            
         
                # Assign raw to actual variables
                date = raw_weather[1].text.strip()
                local_time = raw_weather[5].text.strip() + ":00"
                weather = raw_weather[13].text.strip()
                temp = raw_weather[15].text.strip()
                track_temp = raw_weather[21].text.strip()
                humidity = raw_weather[17].text.strip()[:-1]
                air_pressure = raw_weather[19].text.strip()[:-5]
                wind = raw_weather[23].text.strip()

                # Format particular data
                date = convert_date(date)
                weather = translate(weather)
                temp = average(temp)
                track_temp = average(track_temp)
                humidity = average(humidity)
                air_pressure = average_air(air_pressure)
                wind_info = format_wind(wind)
                wind_speed = wind_info[0]
                wind_direction = translate(wind_info[1])

                # Write to file
                writer.writerow([track, date, local_time, weather, temp, track_temp, humidity, air_pressure, wind_speed, wind_direction])
                print("LOG: " + year + " FINISHED")
    
    print("INFO: COMPLETED SCRAPING WEB DATA")
    file.close()

## Calling Function
Now the function is called, producing the data set used in the assignment.

In [93]:
scrape_weather()

INFO: NOW SCRAPING albert-park
LOG: 2007 FINISHED
LOG: 2008 FINISHED
LOG: 2009 FINISHED
LOG: 2010 FINISHED
LOG: 2011 FINISHED
LOG: 2012 FINISHED
LOG: 2013 FINISHED
LOG: 2014 FINISHED
LOG: 2015 FINISHED
LOG: 2016 FINISHED
LOG: 2017 FINISHED
LOG: 2018 FINISHED
LOG: 2019 FINISHED
INFO: NOW SCRAPING sepang
LOG: 2007 FINISHED
LOG: 2008 FINISHED
LOG: 2009 FINISHED
LOG: 2010 FINISHED
LOG: 2011 FINISHED
LOG: 2012 FINISHED
LOG: 2013 FINISHED
LOG: 2014 FINISHED
LOG: 2015 FINISHED
LOG: 2016 FINISHED
LOG: 2017 FINISHED
WARN: NO TRACK WEATHER FOR YEAR 2018
WARN: NO TRACK WEATHER FOR YEAR 2019
INFO: NOW SCRAPING sachir
LOG: 2007 FINISHED
LOG: 2008 FINISHED
LOG: 2009 FINISHED
LOG: 2010 FINISHED
WARN: NO TRACK WEATHER FOR YEAR 2011
LOG: 2012 FINISHED
LOG: 2013 FINISHED
LOG: 2014 FINISHED
LOG: 2015 FINISHED
LOG: 2016 FINISHED
LOG: 2017 FINISHED
LOG: 2018 FINISHED
LOG: 2019 FINISHED
INFO: NOW SCRAPING catalunya
LOG: 2007 FINISHED
LOG: 2008 FINISHED
LOG: 2009 FINISHED
LOG: 2010 FINISHED
LOG: 2011 FINISHE