# Flight Delay Prediction 

This solution predicts flight delays based on factors such as route, airport congestion, airline diversion etc. using a trainable ML model.

## Contents

1. Prequisites
2. Data Dictionary
3. Import Libraries
4. Load Input Data
5. Create Model
6. Predict Test Datapoints
7. Saving Prediction

## Prerequisites

To run this notebook you need to have install following packages:

- `pandas` to read/save csv files.
- `sklearn` to generate prediction.

## Data Dictionary

- The input has to be a '.csv' file with 'utf-8' encoding. 
- PLEASE NOTE: If your input .csv file is not 'utf-8' encoded, model will not perform as expected.
- Required Features: `DAY_OF_MONTH`, `DAY_OF_WEEK`, `OP_CARRIER_AIRLINE_ID`,
       `ORIGIN_AIRPORT_ID`, `DEST_AIRPORT_ID`, `DEP_TIME`, `DEP_DEL15`,
       `ARR_TIME`, `ARR_DEL15`, `DIVERTED`, `DISTANCE`

## Import Libraries

In [2]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

## Load input data

In [3]:
traindf = pd.read_csv('train.csv')
testdf = pd.read_csv('test.csv')

In [4]:
traindf.head()

Unnamed: 0,DAY_OF_MONTH,DAY_OF_WEEK,OP_CARRIER_AIRLINE_ID,ORIGIN_AIRPORT_ID,DEST_AIRPORT_ID,DEP_TIME,DEP_DEL15,ARR_TIME,ARR_DEL15,DIVERTED,DISTANCE
0,1,3,20366,13930,11977,1003.0,0.0,1117.0,0.0,0.0,174.0
1,1,3,20366,15370,13930,1027.0,0.0,1216.0,0.0,0.0,585.0
2,1,3,20366,11618,15412,1848.0,0.0,2120.0,0.0,0.0,631.0
3,1,3,20366,10781,12266,1846.0,0.0,2004.0,0.0,0.0,253.0
4,1,3,20366,14524,12266,1038.0,0.0,1330.0,0.0,0.0,1157.0


In [5]:
testdf.head()

Unnamed: 0,DAY_OF_MONTH,DAY_OF_WEEK,OP_CARRIER_AIRLINE_ID,ORIGIN_AIRPORT_ID,DEST_AIRPORT_ID,DEP_TIME,DEP_DEL15,ARR_TIME,DIVERTED,DISTANCE
0,1,3,19393,11193,13232,1050.0,0.0,1047.0,0.0,249.0
1,1,3,19393,11193,13232,904.0,0.0,917.0,0.0,249.0
2,1,3,19393,11193,13232,1407.0,0.0,1405.0,0.0,249.0
3,1,3,19393,11193,13232,1810.0,0.0,1806.0,0.0,249.0
4,1,3,19393,11259,10140,1703.0,1.0,1753.0,0.0,580.0


## Create Model

In [6]:
from sklearn.ensemble import RandomForestClassifier

X_train = traindf[['DAY_OF_MONTH', 'DAY_OF_WEEK', 'OP_CARRIER_AIRLINE_ID',
       'ORIGIN_AIRPORT_ID', 'DEST_AIRPORT_ID', 'DEP_TIME', 'DEP_DEL15',
       'ARR_TIME', 'DIVERTED', 'DISTANCE']]
y_train = traindf['ARR_DEL15']

clf= RandomForestClassifier(max_depth=2, random_state=0)
clf.fit(X_train,y_train)

RandomForestClassifier(max_depth=2, random_state=0)

## Predict test datapoints

In [7]:
predictions = clf.predict(testdf)

## Saving Predictions

In [9]:
pd.DataFrame(predictions, columns=['delay_predictions']).to_csv('output.csv', index=None)