# IT Academy - Data Science with Python
## Sprint 12: Supervised Regressions
### [Github Supervised Regressions](https://github.com/jesussantana/Supervised-Regression)

[![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/)  
[![Made withJupyter](https://img.shields.io/badge/Made%20with-Jupyter-orange?style=for-the-badge&logo=Jupyter)](https://jupyter.org/try)  
[![wakatime](https://wakatime.com/badge/github/jesussantana/Supervised-Regression.svg)](https://wakatime.com/badge/github/jesussantana/Supervised-Regression)

In [1]:
import numpy as np
import pandas as pd
import warnings

from matplotlib import pyplot as plt
import seaborn as sns
from PIL import Image
%matplotlib inline

plt.figure(figsize=(16, 6))
warnings.filterwarnings('ignore')
sns.set_theme(style='darkgrid', palette='deep')

<Figure size 1152x432 with 0 Axes>

In [2]:
df_raw = pd.read_csv("../data/processed/DelayedFlightsProcessed.csv")

### Exercise 1: 
  - Create at least three different regression models to try to best predict DelayedFlights.csv flight delay (ArrDelay).

In [3]:
df = df_raw.copy()

In [4]:
df.head()

Unnamed: 0.1,Unnamed: 0,ArrTime,Distance,TaxiIn,TaxiOut,DayOfWeek,DepDelay,CarrierDelay,ArrDelay,UniqueCarrier_9E,...,UniqueCarrier_HA,UniqueCarrier_MQ,UniqueCarrier_NW,UniqueCarrier_OH,UniqueCarrier_OO,UniqueCarrier_UA,UniqueCarrier_US,UniqueCarrier_WN,UniqueCarrier_XE,UniqueCarrier_YV
0,595091,2104.0,737.0,3.0,9.0,4.0,22.0,16.72516,9.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
1,699277,1147.0,1438.0,18.0,13.0,4.0,7.0,16.72516,-3.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,513280,2118.0,1927.0,9.0,20.0,4.0,22.0,16.72516,13.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,555111,2201.0,989.0,6.0,14.0,1.0,16.0,16.72516,11.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1269142,1329.0,1142.0,6.0,8.0,2.0,19.0,0.0,19.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [5]:
df.columns

Index(['Unnamed: 0', 'ArrTime', 'Distance', 'TaxiIn', 'TaxiOut', 'DayOfWeek',
       'DepDelay', 'CarrierDelay', 'ArrDelay', 'UniqueCarrier_9E',
       'UniqueCarrier_AA', 'UniqueCarrier_AS', 'UniqueCarrier_B6',
       'UniqueCarrier_CO', 'UniqueCarrier_DL', 'UniqueCarrier_EV',
       'UniqueCarrier_F9', 'UniqueCarrier_FL', 'UniqueCarrier_HA',
       'UniqueCarrier_MQ', 'UniqueCarrier_NW', 'UniqueCarrier_OH',
       'UniqueCarrier_OO', 'UniqueCarrier_UA', 'UniqueCarrier_US',
       'UniqueCarrier_WN', 'UniqueCarrier_XE', 'UniqueCarrier_YV'],
      dtype='object')

In [None]:
df = df.drop('Unnamed: 0', axis=1, inplace=True)

In [None]:
df.head()

In [None]:
type(df)

### Simple Linear Regression

In [None]:
X = df['DepDelay'].values
y = df[['ArrDelay']].values

- Divide the data set into training set and test set

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

- Create Simple Linear Regression model with training set

In [None]:
from sklearn.linear_model import LinearRegression

regression = LinearRegression()
regression.fit(X_train, y_train)

- Predict the test set

In [None]:
y_pred = regression.predict(X_test)

In [None]:
y_pred

- View training results

In [None]:
plt.scatter(X_train, y_train, color = "red")
plt.plot(X_train, regression.predict(X_train), color = "blue")
plt.title("ArrDelay vs DeepDelay (Trainning Set)")
plt.xlabel("Air Delay")
plt.ylabel("Deep Delay")
plt.show()

## Multiple Linear Regression

In [None]:
X = df.iloc[:, :-1].values
# Vector
y = df.iloc[:, 7].values

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [None]:
regression = LinearRegression()
regression.fit(X_train, y_train)

In [None]:
y_pred = regression.predict(X_test)

## Build the optimal RLM model using Backward Elimination

In [None]:
import statsmodels.api as sm
X = np.append(arr = np.ones((50,1)).astype(int), values = X, axis = 1)
SL = 0.05

In [None]:
X_opt = X[:, [0, 1, 2, 3, 4, 5 , 6, 8]]
regression_OLS = sm.OLS(endog = y, exog = X_opt.tolist()).fit()
regression_OLS.summary()

X_opt = X[:, [0, 1, 2, 3, 4, 5, 8]]
regression_OLS = sm.OLS(endog = y, exog = X_opt.tolist()).fit()
regression_OLS.summary()

X_opt = X[:, [0, 1, 2, 3, 4, 5]]
regression_OLS = sm.OLS(endog = y, exog = X_opt.tolist()).fit()
regression_OLS.summary()

X_opt = X[:, [0, 1, 3, 4, 5]]
regression_OLS = sm.OLS(endog = y, exog = X_opt.tolist()).fit()
regression_OLS.summary()

X_opt = X[:, [0, 3, 4, 5]]
regression_OLS = sm.OLS(endog = y, exog = X_opt.tolist()).fit()
regression_OLS.summary()

X_opt = X[:, [0, 3, 5]]
regression_OLS = sm.OLS(endog = y, exog = X_opt.tolist()).fit()
regression_OLS.summary()

X_opt = X[:, [0, 3]]
regression_OLS = sm.OLS(endog = y, exog = X_opt.tolist()).fit()
regression_OLS.summary()