# Getting the weather data of the airports

Using the table at http://www.flugzeuginfo.net/table_airportcodes_country-location_en.php
a csv with the IATA and ICAO codes of the airports can be created.

Using the ICAO codes the weather data can be retrieved from this website: https://www.wunderground.com/history/daily/de/frankfurt/EDDF/date/2015-3-18

Given a list with the destination airport codes and the schedules arrival time, a list of the corresponding weather conditions (wind speed, visibility) can be web scraped.

This list of weater conditions can then be incorporated into the dataframe with the delays and be subsequently used in a predictive model.

In [1]:
import pandas as pd

In [2]:
# The airport and date and time of departure
df_dep = pd.read_csv("data/sanitized_Train_data.csv", usecols=[3, 5], parse_dates=[1])
df_dep.columns = ['IATA', 'DATE_TIME']

In [3]:
df_dep.head()

Unnamed: 0,IATA,DATE_TIME
0,CMN,2016-01-03 10:30:00
1,MXP,2016-01-13 15:05:00
2,TUN,2016-01-16 04:10:00
3,DJE,2016-01-17 14:10:00
4,TUN,2016-01-17 14:30:00


In [4]:
# The airport and date and time of departure
df_dest = pd.read_csv("data/sanitized_Train_data.csv", usecols=[4, 6], parse_dates=[1])
df_dest.columns = ['IATA', 'DATE_TIME']

In [5]:
df_dest.head()

Unnamed: 0,IATA,DATE_TIME
0,TUN,2016-01-03 12:55:00
1,TUN,2016-01-13 16:55:00
2,IST,2016-01-16 06:45:00
3,NTE,2016-01-17 17:00:00
4,ALG,2016-01-17 15:50:00


In [6]:
df = pd.DataFrame()
df = pd.concat([df_dep, df_dest], axis=0)
df

Unnamed: 0,IATA,DATE_TIME
0,CMN,2016-01-03 10:30:00
1,MXP,2016-01-13 15:05:00
2,TUN,2016-01-16 04:10:00
3,DJE,2016-01-17 14:10:00
4,TUN,2016-01-17 14:30:00
...,...,...
107828,TUN,2018-07-06 02:00:00
107829,TUN,2018-01-13 09:00:00
107830,TUN,2018-11-07 12:50:00
107831,DJE,2018-01-23 18:45:00


In [7]:
df_airport_codes = pd.read_csv('list_IATA_ICAO_codes.csv', usecols=[0, 1])

In [8]:
code = df_airport_codes[df_airport_codes['IATA'] == 'CMN']['ICAO']
code = code.iloc[0]
type(code)

str

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 215666 entries, 0 to 107832
Data columns (total 2 columns):
 #   Column     Non-Null Count   Dtype         
---  ------     --------------   -----         
 0   IATA       215666 non-null  object        
 1   DATE_TIME  215666 non-null  datetime64[ns]
dtypes: datetime64[ns](1), object(1)
memory usage: 4.9+ MB


In [10]:
df = df.join(df_airport_codes.set_index('IATA'), on='IATA', how='left', lsuffix='_left', rsuffix='_right')

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 215666 entries, 0 to 107832
Data columns (total 3 columns):
 #   Column     Non-Null Count   Dtype         
---  ------     --------------   -----         
 0   IATA       215666 non-null  object        
 1   DATE_TIME  215666 non-null  datetime64[ns]
 2   ICAO       212380 non-null  object        
dtypes: datetime64[ns](1), object(2)
memory usage: 6.6+ MB


### Now the weather data have to be retrieved from the website using some web scraper

In [13]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

In [59]:
import time

from selenium import webdriver

driver = webdriver.Chrome(executable_path='/home/fklein/zindi/Flight_Delay_Prediction_Challenge/chromedriver')  # Optional argument, if not specified will search path.

driver.get('https://www.wunderground.com/history/daily/de/frankfurt/EDDF/date/2015-3-18');

time.sleep(1) # Let the user actually see something!

# search_box = driver.find_element_by_name('')
# <table _ngcontent-app-root-c202="" mat-table="" aria-labelledby="History observation" matsort="" aria-label="table of contents" class="mat-table cdk-table mat-sort ng-star-inserted" role="grid">
# <tr _ngcontent-app-root-c202="" role="row" mat-row="" class="mat-row cdk-row ng-star-inserted">
try:
    weather_table = driver.find_elements_by_xpath('//tr[@class="mat-row cdk-row ng-star-inserted"]')
    print("Len of weather_table: ", len(weather_table))
    table_html = list()
    for row in weather_table:
        display(row.text)
        table_html.append(row.get_attribute('outerHTML'))
        # for col in row.find_elements_by_xpath('//tr'):
        #     table_html.append(col.text)
except:
    print('Could not find the html element.')

# search_box.send_keys('airport weather')

# search_box.submit()

# time.sleep(1) # Let the user actually see something!

driver.quit()

  driver = webdriver.Chrome(executable_path='/home/fklein/zindi/Flight_Delay_Prediction_Challenge/chromedriver')  # Optional argument, if not specified will search path.


Len of weather_table:  48


  weather_table = driver.find_elements_by_xpath('//tr[@class="mat-row cdk-row ng-star-inserted"]')


'12:20 AM 50 °F 39 °F 66 % NNE 9 mph 0 mph 29.81 in 0.0 in Fair'

'12:50 AM 48 °F 37 °F 66 % NNE 10 mph 0 mph 29.81 in 0.0 in Fair'

'1:20 AM 48 °F 39 °F 71 % NNE 6 mph 0 mph 29.81 in 0.0 in Fair'

'1:50 AM 48 °F 39 °F 71 % E 5 mph 0 mph 29.81 in 0.0 in Fair'

'2:20 AM 46 °F 39 °F 76 % E 6 mph 0 mph 29.81 in 0.0 in Fair'

'2:50 AM 45 °F 39 °F 81 % E 7 mph 0 mph 29.81 in 0.0 in Fair'

'3:20 AM 43 °F 39 °F 87 % ENE 6 mph 0 mph 29.81 in 0.0 in Fair'

'3:50 AM 43 °F 37 °F 81 % ENE 6 mph 0 mph 29.81 in 0.0 in Fair'

'4:20 AM 43 °F 39 °F 87 % ENE 6 mph 0 mph 29.81 in 0.0 in Fair'

'4:50 AM 43 °F 37 °F 81 % E 6 mph 0 mph 29.81 in 0.0 in Fair'

'5:20 AM 43 °F 39 °F 87 % E 6 mph 0 mph 29.84 in 0.0 in Fair'

'5:50 AM 43 °F 37 °F 81 % E 7 mph 0 mph 29.84 in 0.0 in Fair'

'6:20 AM 43 °F 39 °F 87 % ENE 5 mph 0 mph 29.84 in 0.0 in Fair'

'6:50 AM 43 °F 37 °F 81 % VAR 2 mph 0 mph 29.84 in 0.0 in Fair'

'7:20 AM 41 °F 37 °F 87 % NNE 3 mph 0 mph 29.84 in 0.0 in Shallow Fog'

'7:50 AM 43 °F 39 °F 87 % VAR 2 mph 0 mph 29.84 in 0.0 in Shallow Fog'

'8:20 AM 43 °F 39 °F 87 % E 3 mph 0 mph 29.87 in 0.0 in Shallow Fog'

'8:50 AM 46 °F 39 °F 76 % VAR 2 mph 0 mph 29.87 in 0.0 in Fair'

'9:20 AM 50 °F 41 °F 71 % NE 3 mph 0 mph 29.87 in 0.0 in Fair'

'9:50 AM 52 °F 39 °F 62 % ENE 7 mph 0 mph 29.87 in 0.0 in Fair'

'10:20 AM 52 °F 39 °F 62 % E 7 mph 0 mph 29.87 in 0.0 in Fair'

'10:50 AM 54 °F 39 °F 58 % E 6 mph 0 mph 29.87 in 0.0 in Fair'

'11:20 AM 55 °F 39 °F 54 % SE 6 mph 0 mph 29.87 in 0.0 in Fair'

'11:50 AM 57 °F 37 °F 48 % E 6 mph 0 mph 29.84 in 0.0 in Fair'

'12:20 PM 59 °F 39 °F 48 % ESE 5 mph 0 mph 29.84 in 0.0 in Fair'

'12:50 PM 61 °F 39 °F 45 % VAR 3 mph 0 mph 29.84 in 0.0 in Fair'

'1:20 PM 63 °F 39 °F 42 % ENE 7 mph 0 mph 29.84 in 0.0 in Fair'

'1:50 PM 63 °F 37 °F 39 % NNE 6 mph 0 mph 29.81 in 0.0 in Fair'

'2:20 PM 63 °F 39 °F 42 % VAR 5 mph 0 mph 29.81 in 0.0 in Fair'

'2:50 PM 64 °F 39 °F 40 % VAR 6 mph 0 mph 29.81 in 0.0 in Fair'

'3:20 PM 64 °F 41 °F 42 % N 3 mph 0 mph 29.81 in 0.0 in Fair'

'3:50 PM 64 °F 39 °F 40 % VAR 2 mph 0 mph 29.81 in 0.0 in Fair'

'4:20 PM 64 °F 41 °F 42 % VAR 2 mph 0 mph 29.81 in 0.0 in Fair'

'4:50 PM 64 °F 41 °F 42 % N 5 mph 0 mph 29.81 in 0.0 in Fair'

'5:20 PM 64 °F 39 °F 40 % N 3 mph 0 mph 29.81 in 0.0 in Fair'

'5:50 PM 63 °F 39 °F 42 % NE 6 mph 0 mph 29.81 in 0.0 in Fair'

'6:20 PM 61 °F 39 °F 45 % N 5 mph 0 mph 29.81 in 0.0 in Fair'

'6:50 PM 61 °F 41 °F 48 % NW 8 mph 0 mph 29.81 in 0.0 in Fair'

'7:20 PM 59 °F 41 °F 51 % NW 8 mph 0 mph 29.81 in 0.0 in Fair'

'7:50 PM 59 °F 43 °F 55 % NW 9 mph 0 mph 29.81 in 0.0 in Fair'

'8:20 PM 59 °F 43 °F 55 % NNW 9 mph 0 mph 29.81 in 0.0 in Fair'

'8:50 PM 57 °F 43 °F 59 % N 9 mph 0 mph 29.84 in 0.0 in Fair'

'9:20 PM 57 °F 41 °F 55 % N 9 mph 0 mph 29.84 in 0.0 in Fair'

'9:50 PM 55 °F 41 °F 58 % N 6 mph 0 mph 29.84 in 0.0 in Fair'

'10:20 PM 54 °F 41 °F 62 % N 8 mph 0 mph 29.84 in 0.0 in Fair'

'10:50 PM 55 °F 41 °F 58 % N 8 mph 0 mph 29.84 in 0.0 in Fair'

'11:20 PM 54 °F 39 °F 58 % NNE 6 mph 0 mph 29.84 in 0.0 in Fair'

'11:50 PM 52 °F 41 °F 67 % NNW 6 mph 0 mph 29.84 in 0.0 in Fair'

In [57]:
len(table_html)

3888

In [58]:
table_html

['Temperature (° F) Actual Historic Avg. Record',
 'High Temp 64 50.3 --',
 'Low Temp 41 35.9 --',
 'Day Average Temp 53.35 43 -',
 'Precipitation (Inches) Actual Historic Avg. Record',
 'Precipitation (past 24 hours from 23:20:00) 0.00 0.10 -',
 'Dew Point (° F) Actual Historic Avg. Record',
 'Dew Point 39.38 - -',
 'High 43 - -',
 'Low 37 - -',
 'Average 39.38 - -',
 'Wind (MPH) Actual Historic Avg. Record',
 'Max Wind Speed 10 - -',
 'Visibility 6 - -',
 'Sea Level Pressure (Hg) Actual Historic Avg. Record',
 'Sea Level Pressure 29.87 - -',
 'Astronomy Day Length Rise Set',
 'Actual Time 12h 0m 6:35 AM 6:35 PM',
 'Civil Twilight 6:02 AM 7:07 PM',
 'Nautical Twilight 5:25 AM 7:45 PM',
 'Astronomical Twilight 4:45 AM 8:24 PM',
 'Moon: waning crescent 5:18 AM 4:19 PM',
 'Time\nTemperature\nDew Point\nHumidity\nWind\nWind Speed\nWind Gust\nPressure\nPrecip.\nCondition',
 '12:20 AM 50 °F 39 °F 66 % NNE 9 mph 0 mph 29.81 in 0.0 in Fair',
 '12:50 AM 48 °F 37 °F 66 % NNE 10 mph 0 mph 29.81 