## Weather in F1: Data Mining

The dataframe bellow needs to be filled with ones corresponding to the weather conditions in each race. Considering that the adherence of the circuit is one of the most important factors for a driver's performance, the presence/absence of clouds or sun will be ignored, and only the degree of wetness of the road will be recorded.

In [415]:
races_df = pd.read_csv(r'C:\Users\joaoe\Documentos\JP\JP_GitHub\f1_predict_ML\data\races.csv')
races_df = races_df.drop(columns=['raceId','round','time','circuitId','url'])
races_df = races_df.sort_values(by=['date'])
races_df = races_df.reset_index(drop=True)
length = len(races_df)
races_df.head()

weather = {'Descriptive': ["" for x in range(length)],'Dry': np.zeros(length), 'Wet': np.zeros(length),'Drizzle': np.zeros(length),'Rain': np.zeros(length)}
weather_df = pd.DataFrame(data=weather)

weather_df = pd.concat([races_df.reset_index(drop=True),weather_df.reset_index(drop=True)], axis=1)

In [437]:
weather_df.head()

Unnamed: 0,year,name,date,Descriptive,Dry,Wet,Drizzle,Rain
0,1950,British Grand Prix,1950-05-13,"Sun, Dry",1.0,0.0,0.0,0.0
1,1950,Monaco Grand Prix,1950-05-21,"Sun, Dry",1.0,0.0,0.0,0.0
2,1950,Indianapolis 500,1950-05-30,"Sun, short drizzle",0.0,0.0,1.0,0.0
3,1950,Swiss Grand Prix,1950-06-04,"Sun, Dry",1.0,0.0,0.0,0.0
4,1950,Belgian Grand Prix,1950-06-18,"Sun, Dry",1.0,0.0,0.0,0.0


### **1st Attempt**: Data scraping [F1 Facts](https://f1-facts.com/), a website with historical statistics from F1 races, from 1950 to 2018
The first step was using the Beautiful Soup library to read the html and capture the labels for each weather condition.

In [417]:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import pandas as pd
import numpy as np

# Reading through html file
myurl = 'https://f1-facts.com/stats/weather'
uClient = uReq(myurl)
page_html = uClient.read()
page_soup = soup(page_html,"lxml")

# Finding table containing the desired information
table = page_soup.find(text="1950").find_parent("table")

Now, we must capture the weather information, shown by an icon which defines if there is rain, sun, clouds, etc. We can read the source from each icon, and mapped that link to a specific weather state.

In [418]:
# Reading all image sources: ordered chronologically
images = table.findChildren("img")

# Creating a list with the source links converted to string values
image_list=[[] for i in range(len(images))]
for i in range(len(images)):
    image_list[i] = str(images[i])

Going from 1950 to 2011, cronologically, the digits corresponding to the weather condition are read, and converted into a descriptive text. 

In [433]:
# Up to 2018, 2015 missing 

pd.set_option('mode.chained_assignment', None)


for i in range(len(images)):
    if image_list[i][35:38] == 'w1.':
        weather_df['Descriptive'][i] = 'Sun, Dry'
        weather_df['Dry'][i] = 1
        
    elif image_list[i][35:38] == 'w2.':
        weather_df['Descriptive'][i]  = 'Dry, Clouds'
        weather_df['Dry'][i] = 1
        
    elif image_list[i][35:38] == 'w3.':
        weather_df['Descriptive'][i]  = 'Dry, Cold'
        weather_df['Dry'][i] = 1
        
    elif image_list[i][35:38] == 'w4.':
        weather_df['Descriptive'][i]  = 'Sun, short drizzle'
        weather_df['Drizzle'][i] = 1
        
    elif image_list[i][35:38] == 'w5.':
        weather_df['Descriptive'][i]  = 'Rain'
        weather_df['Rain'][i] = 1
        
    elif image_list[i][35:38] == 'w6.':
        weather_df['Descriptive'][i]  = 'Wet'
        weather_df['Wet'][i] = 1
        
    elif image_list[i][35:38] == 'w7.':
        weather_df['Descriptive'][i]  = 'Rain, then dry'
        weather_df['Rain'][i] = 1
        weather_df['Dry'][i] = 1
        
    elif image_list[i][35:38] == 'w8.':
        weather_df['Descriptive'][i]  = 'Dry, Cloudy'
        weather_df['Dry'][i] = 1

    elif image_list[i][35:38] == 'w9.':
        weather_df['Descriptive'][i]  = 'Rain'  
        weather_df['Rain'][i] = 1
        
    elif image_list[i][35:38] == 'w10':
        weather_df['Descriptive'][i]  = 'Dry, hot' 
        weather_df['Dry'][i] = 1
        
    elif image_list[i][35:38] == 'w.':
        weather_df['Descriptive'][i]  = 'Unknown'    
        

### **2nd Attempt**: Adding values from a .csv corresponding to 2012-2020
Because there were gaps on the weather for those years, a file was used, were the information from different sources was compiled.

In [438]:
weather_df_2012 = pd.read_csv(r'C:\Users\joaoe\Documentos\JP\JP_GitHub\f1_predict_ML\data\weather_2012_to_2020.csv')
weather_df_2012['Dry'] = np.zeros(len(weather_df_2012))
weather_df_2012['Wet'] = np.zeros(len(weather_df_2012))
weather_df_2012['Drizzle'] = np.zeros(len(weather_df_2012))
weather_df_2012['Rain'] = np.zeros(len(weather_df_2012))

weather_df_2012['index'] = weather_df_2012['index'].astype(int)
weather_df_2012 = weather_df_2012.set_index('index')
weather_df_2012.head()

Unnamed: 0_level_0,year,name,date,Descriptive,Dry,Wet,Drizzle,Rain
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
858,2012,Australian Grand Prix,3/18/2012,Dry,0.0,0.0,0.0,0.0
859,2012,Malaysian Grand Prix,3/25/2012,Rain,0.0,0.0,0.0,0.0
860,2012,Chinese Grand Prix,4/15/2012,Dry,0.0,0.0,0.0,0.0
861,2012,Bahrain Grand Prix,4/22/2012,Dry,0.0,0.0,0.0,0.0
862,2012,Spanish Grand Prix,5/13/2012,Dry,0.0,0.0,0.0,0.0


In [439]:
# Finding indexes
index_first = weather_df_2012.index[0]
index_last = weather_df_2012.index[-1]

# Looping through weather conditions
for i in range(index_first,index_last):
    if weather_df_2012['Descriptive'][i] == 'Dry':
        weather_df_2012['Dry'][i] = 1
        
    elif weather_df_2012['Descriptive'][i] == 'Wet':
        weather_df_2012['Wet'][i] = 1
        
    elif weather_df_2012['Descriptive'][i] == 'Drizzle':
        weather_df_2012['Drizzle'][i] = 1
        
    elif weather_df_2012['Descriptive'][i] == 'Rain':
        weather_df_2012['Rain'][i] = 1

In [440]:
weather_df_2012.head()

Unnamed: 0_level_0,year,name,date,Descriptive,Dry,Wet,Drizzle,Rain
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
858,2012,Australian Grand Prix,3/18/2012,Dry,1.0,0.0,0.0,0.0
859,2012,Malaysian Grand Prix,3/25/2012,Rain,0.0,0.0,0.0,1.0
860,2012,Chinese Grand Prix,4/15/2012,Dry,1.0,0.0,0.0,0.0
861,2012,Bahrain Grand Prix,4/22/2012,Dry,1.0,0.0,0.0,0.0
862,2012,Spanish Grand Prix,5/13/2012,Dry,1.0,0.0,0.0,0.0


In [441]:
weather_df.loc[index_first:index_last]=weather_df_2012.loc[index_first:index_last]

In [450]:
weather_df.to_csv(r'C:\Users\joaoe\Documentos\JP\JP_GitHub\f1_predict_ML\data\weather_1950_to_2020.csv')