# Part 6 - Advanced Regression Techniques
In this notebook we will investigate some popular advanced regression techniques:  
* XGBoost
* Random Forest
* MultiLayer Perceptron (Neural Network)
  
We will use the exact same dataset and features as before and compare the results with our Linear Regressor.  
You will be happy to learn that the same procedure for training a Linear Regressor applies to nearly all other regression models!

In [None]:
import time
import pickle
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
%matplotlib inline

## Let's load the data and remind ourselves of the contents

In [None]:
df = pd.read_csv('./data/sf/data_clean_engineered.csv')
df.head()

In [None]:
features = [feature for feature in df.columns if feature != 'price']
X = df[features]
y = df['price']
X_np = X.values
y_np = y.values.reshape((len(df), 1))

In [None]:
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.30, random_state=123) # split 70% train, 30% validation

In [None]:
def evaluate_model(model, X, y):
    y_pred = model.predict(X) # predict y values from input X
    mse = mean_squared_error(y_true=y, y_pred=y_pred)
    print("Mean Squared Error: {}".format(mse))
    print("Accuracy: {}%".format(model.score(X, y)*100.0))

## XGBoost - Extreme Gradient Boosting
XGBoost is a popular machine learning algorithm applied to tabulated data. If tuned properly, it can perform very well across many different datasets and we can even visualize the "feature importance" and get an idea of how the model generates its prediction.  
  
High accuracy *AND* intuitive results? Sign me up!  
  
  
Before proceeding, let's watch a few quick clips to learn more about **Boosting**

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo(id='2Mg8QD0F1dQ')

In [None]:
YouTubeVideo(id='GM3CDQfQ4sw')

### If time permits: Gradient Boosting whiteboard example
http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting/

Import the xgboost library and fit our regressor same as before

In [None]:
from xgboost import XGBRegressor
xgb_regressor = XGBRegressor()
xgb_model = xgb_regressor.fit(X_train, y_train)
evaluate_model(xgb_model, X_val, y_val)

### Visualize the Feature Importance that XGBRegressor has assigned

In [None]:
# create a dataframe of feature importances
feature_importances = pd.DataFrame(columns=X.columns)
feature_importances.loc[0] = xgb_model.feature_importances_
# melt columns so we can easily sort and visualize
df_melt = pd.melt(feature_importances, value_vars=X.columns).sort_values(by='value', ascending=False)
df_melt

### Retrain on entire dataset and save model to disk

In [None]:
xgb_model = xgb_regressor.fit(X, y)
with open('./models/sf/xgb.pkl', 'wb') as f:
    pickle.dump(xgb_model, f)

# Let's train a few different regressors

## Random Forest  
https://en.wikipedia.org/wiki/Random_forest

In [None]:
from sklearn.ensemble import RandomForestRegressor
rf_regressor = RandomForestRegressor()
rf_model = rf_regressor.fit(X_train, y_train)
evaluate_model(rf_model, X_val, y_val)
rf_model = rf_regressor.fit(X, y)
with open('./models/sf/random_forest.pkl', 'wb') as f:
    pickle.dump(rf_model, f)

## MultiLayer Perceptron
https://en.wikipedia.org/wiki/Multilayer_perceptron

In [None]:
from sklearn.neural_network import MLPRegressor
mlp_regressor = MLPRegressor(max_iter=20000, random_state=123, solver='lbfgs')
mlp_model = mlp_regressor.fit(X_train, y_train)
evaluate_model(mlp_model, X_val, y_val)
mlp_model = mlp_regressor.fit(X, y)
with open('./models/sf/mlp.pkl', 'wb') as f:
    pickle.dump(mlp_model, f)

## Kernel Ridge Regression

In [None]:
from sklearn.kernel_ridge import KernelRidge
krr_regressor = KernelRidge(alpha=0.6, kernel='polynomial', degree=2, coef0=2.5, )
krr_model = krr_regressor.fit(X_train, y_train)
evaluate_model(krr_model, X_val, y_val)
krr_model = krr_regressor.fit(X, y)
with open('./models/sf/krr.pkl', 'wb') as f:
    pickle.dump(krr_model, f)

## ElasticNet

In [None]:
from sklearn.linear_model import ElasticNet
eln_regressor = ElasticNet(alpha=0.0005, l1_ratio=.9, random_state=3, max_iter=1000)
eln_model = eln_regressor.fit(X_train, y_train)
evaluate_model(eln_model, X_val, y_val)
eln_model = eln_regressor.fit(X, y)
with open('./models/sf/eln.pkl', 'wb') as f:
    pickle.dump(eln_model, f)