# Filtering out the Weather Data

This file is used to filter out and form the final dataset related to the weather data.

##### OBJECTIVES:
1. Read all the weather files from the year 2016 and 2017
2. Filter out the weather data for the following columns
`['windspeedKmph','visibility','weatherCode','precipMM','WindGustKmph','pressure','cloudcover','winddirDegree','humidity',
'DewPointF','tempF','time','WindChillF']`
3. Filter it out based on the airport list as shown below
`['ATL','CLT','DEN','DFW','EWR','IAH','JFK','LAS','LAX','MCO','MIA','ORD','PHX','SEA','SFO']`
4. Merge the two datasets for 2016 and 2017

### Importing the Modules required for this notebook.
We will need
1. glob - for path reading 
2. pandas - for dataframe manipulation
3. json - for json file manipulation

In [1]:
import glob
import pandas as pd
import json

### Defining the list of airports and weather aspects in interest.
Define two lists which we will use later for the sake of filtering the datasets

In [2]:
# Airport List
airport_list = ['ATL', 'CLT', 'DEN', 'DFW', 'EWR', 'IAH', 'JFK', 'LAS', 
                'LAX', 'MCO', 'MIA', 'ORD', 'PHX', 'SEA', 'SFO']

# Weather list
weather_columns = ['windspeedKmph', 'winddirDegree',  'weatherCode', 'precipMM', 'visibility', 'pressure', 
                   'cloudcover', 'DewPointF', 'WindGustKmph', 'tempF',  'WindChillF', 'humidity', 'time']

## 2016 DATA WRANGLING

### Reading the path of the 2016 files
We read and save the paths of the weather datasets which we need from the year 2016 into a list `weather_files_2016`

In [3]:
weather_files_2016 = glob.glob('Data\WEATHER_DATA\\***\\2016-***.json')
weather_files_2016

['Data\\WEATHER_DATA\\ATL\\2016-1.json',
 'Data\\WEATHER_DATA\\ATL\\2016-10.json',
 'Data\\WEATHER_DATA\\ATL\\2016-11.json',
 'Data\\WEATHER_DATA\\ATL\\2016-12.json',
 'Data\\WEATHER_DATA\\ATL\\2016-2.json',
 'Data\\WEATHER_DATA\\ATL\\2016-3.json',
 'Data\\WEATHER_DATA\\ATL\\2016-4.json',
 'Data\\WEATHER_DATA\\ATL\\2016-5.json',
 'Data\\WEATHER_DATA\\ATL\\2016-6.json',
 'Data\\WEATHER_DATA\\ATL\\2016-7.json',
 'Data\\WEATHER_DATA\\ATL\\2016-8.json',
 'Data\\WEATHER_DATA\\ATL\\2016-9.json',
 'Data\\WEATHER_DATA\\CLT\\2016-1.json',
 'Data\\WEATHER_DATA\\CLT\\2016-10.json',
 'Data\\WEATHER_DATA\\CLT\\2016-11.json',
 'Data\\WEATHER_DATA\\CLT\\2016-12.json',
 'Data\\WEATHER_DATA\\CLT\\2016-2.json',
 'Data\\WEATHER_DATA\\CLT\\2016-3.json',
 'Data\\WEATHER_DATA\\CLT\\2016-4.json',
 'Data\\WEATHER_DATA\\CLT\\2016-5.json',
 'Data\\WEATHER_DATA\\CLT\\2016-6.json',
 'Data\\WEATHER_DATA\\CLT\\2016-7.json',
 'Data\\WEATHER_DATA\\CLT\\2016-8.json',
 'Data\\WEATHER_DATA\\CLT\\2016-9.json',
 'Data\\WE

### Creating the dataframe for the 2016 file
1. Opening each file from path
2. Reading json file
3. Getting the data from weather data in json file
4. Filtering based on the weather columns
5. Adding the date column
6. Adding the airport name column
7. Appending the dataset to the main 2016 dataframe

In [5]:
# Creating the main dataframe
weather_2016 = []

# Starting a loop which iterates through the paths of from the path list
for file in weather_files_2016:
    # Opening the file
    with open(file) as f:
        
        # Temporary json for manipulation
        temp_data = json.load(f)
        
        # Iterating through the file and the weather module
        for temp_series in temp_data['data']['weather']:
            
            # Creating the dataframe for appending later
            data1 = pd.DataFrame(temp_series['hourly'],columns=weather_columns)
            
            # Adding the date column
            data1['date'] = temp_series['date']
            
            # Adding the Airport code based on the file name
            for name in airport_list:
                if name in file.split('\\'):
                    data1['airport'] = name
            
            # Appending the temporary dataframe to the final dataframe
            weather_2016 = pd.DataFrame(weather_2016.append(data1))

In [6]:
weather_2016.head()

Unnamed: 0,windspeedKmph,winddirDegree,weatherCode,precipMM,visibility,pressure,cloudcover,DewPointF,WindGustKmph,tempF,WindChillF,humidity,time,date,airport
0,15,319,122,0.0,10,1026,86,33,23,40,34,76,0,2016-01-02,ATL
1,16,320,122,0.0,10,1026,81,33,23,39,33,78,100,2016-01-02,ATL
2,16,321,116,0.0,10,1026,76,33,23,38,32,80,200,2016-01-02,ATL
3,16,322,116,0.0,10,1026,71,33,23,38,31,83,300,2016-01-02,ATL
4,16,319,116,0.0,10,1026,79,32,23,37,30,83,400,2016-01-02,ATL


In [7]:
weather_2016.shape

(131736, 15)

## 2017 DATA WRANGLING

### Method
The next few kernels will be following the same method from the past dataframes the only change being the year from 2016 to 2017

In [8]:
weather_files_2017 = glob.glob('Data\WEATHER_DATA\\***\\2017-***.json')

# Creating the main dataframe
weather_2017 = []

# Starting a loop which iterates through the paths of from the path list
for file in weather_files_2017:
    # Opening the file
    with open(file) as f:
        
        # Temporary json for manipulation
        temp_data = json.load(f)
        
        # Iterating through the file and the weather module
        for temp_series in temp_data['data']['weather']:
            
            # Creating the dataframe for appending later
            data1 = pd.DataFrame(temp_series['hourly'],columns=weather_columns)
            
            # Adding the date column
            data1['date'] = temp_series['date']
            
            # Adding the Airport code based on the file name
            for name in airport_list:
                if name in file.split('\\'):
                    data1['airport'] = name
            
            # Appending the temporary dataframe to the final dataframe
            weather_2017 = pd.DataFrame(weather_2017.append(data1))

In [9]:
weather_2017.head()

Unnamed: 0,windspeedKmph,winddirDegree,weatherCode,precipMM,visibility,pressure,cloudcover,DewPointF,WindGustKmph,tempF,WindChillF,humidity,time,date,airport
0,8,100,353,6.1,7,1021,100,50,16,51,49,97,0,2017-01-02,ATL
1,8,99,353,5.1,7,1020,100,50,16,51,50,97,100,2017-01-02,ATL
2,7,99,356,4.1,8,1020,100,51,15,52,50,98,200,2017-01-02,ATL
3,7,99,356,3.1,9,1020,100,51,14,52,51,98,300,2017-01-02,ATL
4,6,92,356,4.1,8,1020,100,52,13,52,52,98,400,2017-01-02,ATL


In [10]:
weather_2017.shape

(131376, 15)

# Merging the two datasets
We use ```pd.concat``` 
to merge both the datasets.

In [11]:
# Using `pd.concat` to merge both the datasets.
final_weather_dataset = pd.concat([weather_2016,weather_2017])
# Looking at the shape
final_weather_dataset.shape

(263112, 15)

In [12]:
final_weather_dataset.isna().sum()

windspeedKmph    0
winddirDegree    0
weatherCode      0
precipMM         0
visibility       0
pressure         0
cloudcover       0
DewPointF        0
WindGustKmph     0
tempF            0
WindChillF       0
humidity         0
time             0
date             0
airport          0
dtype: int64

### Saving the dataframe to a new csv file

In [13]:
final_weather_dataset.to_csv('Data/Flight_Weather_Data.csv')