# Machine Learning Example: Boston Housing Dataset

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from math import sqrt

## Useful Resources
 - [Scikit-Learn](http://scikit-learn.org/)
 - [Learning Data Science Boston Housing Example](https://medium.com/@haydar_ai/learning-data-science-day-9-linear-regression-on-boston-housing-dataset-cd62a80775ef) 
 - [Python Data Analysis Library: pandas](https://pandas.pydata.org/)
 - [10 Minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/10min.html)
 - [Our Slides on ML](https://github.com/shivupa/QMMM_study_group/blob/master/04_Machine_Learning/ml_slides.pdf)   
 
## Setup
 1. Organize data
 2. Setup Classifier/Regressor
 3. Train, Test, Visualize!
 
## Organize Data
Data organization is one of the most important steps in machine learning. Unorganized data can lead to wasted compute time on improper data as well as make it more difficult for others to understand and replicate your method. For data organization we are going to use pandas (unfortunately not this one &#128060;).

In [None]:
from sklearn.datasets import load_boston
boston = load_boston()
boston.keys()

In [None]:
print(boston.DESCR)

In [None]:
boston.data[0]

In [None]:
df_boston = pd.DataFrame(boston.data, columns=boston.feature_names)
df_boston['PRICE'] = boston.target
df_boston.head()

In [None]:
df_boston.describe()

## Machine Learning
1. Split data into training and testing sets
2. Train classifier/regressor
3. Test trained classifier/regressor on test set
4. Visualize

## Import Scikit-Learn

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import BayesianRidge
from sklearn.linear_model import Ridge
from sklearn.kernel_ridge import KernelRidge
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

## Split data into training and testing sets

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    boston.data, boston.target, train_size=0.9, test_size=0.1)

## Train Regressor

In [None]:
####################### Choose ML Model #######################
regr = 

####################### Train ML Model ########################
regr.fit()

###################### Predict Test Set #######################
predicted = regr.predict()

##################### Evaluate Prediciton #####################
mae = mean_absolute_error()
mse = mean_squared_error()
rmse = sqrt(mse)
print('MAE:', mae, '\tMSE:', mse, '\tRMSE:', rmse)

## Plot Results

In [None]:
training_size = ('%0.1f' % (100 - (len(y_test)/len(boston.target) * 100)))
class_type = str(regr).split('(')[0]
label1 = ('MAE   = {}'.format('%8.3f' % mae))
label2 = (class_type + '\nTraining size = ' + training_size + '%')

plt.figure(dpi=250)
plt.plot([df_boston['PRICE'].min(), df_boston['PRICE'].max()], [df_boston['PRICE'].min(), df_boston['PRICE'].max()], ls="--", c="g")
plt.plot(y_test, predicted, 'o', markersize=1.5)
plt.xlabel("Actual Price")
plt.ylabel("Predicted Price")
legend1 = plt.legend([label1], loc='lower right', markerscale=0, fontsize=6, handlelength=0)
plt.legend([label2], loc='upper left', markerscale=0, fontsize=6, handlelength=0)
plt.gca().add_artist(legend1)
plt.show()