# Example 2: Boston House-Price Dataset
In this example we will work on the <strong> Boston House-Price Dataset </strong>.<br>
This dataset contains informations of Houses and their prices
### NOTE
To execute a cell, press <strong>Shift+Enter</strong>

## 1. Importing Libraries
We will need 
- numpy (NumPy) for arrays
- pandas (Pandas) for data manipulation
- matplotlib for visualisation
- sklearn (Scikit-Learn) for creating, fitting and evaluating the model
- seaborn (Seaborn), which gives a simpler syntax for visualisation

 

In [None]:
#Every single line comment in python starts with #
import numpy as np
#sklearn.datasets contains some predefined datasets
import sklearn.datasets as ds
#pandas is used for data manipulation
import pandas as pd
#matplotlib.pyplot is used for visualisation
import matplotlib.pyplot as plt
#seaborn is a user friendly library for visualisation built on top of matplotlib
import seaborn as sns
#we will use a Classification Model called LogisticRegression
from sklearn.linear_model import LinearRegression
#we will split the data using train_test_split
from sklearn.model_selection import train_test_split
#OPTIONAL: This is for evaluating a classification model
from sklearn.metrics import confusion_matrix

## 2. Preprocessing: Loading Boston Dataset
We will Load here The Boston Dataset <br>
We will then Load the $X$ values and $y$ values<br>
Remember that:
- $X$ is a matrix where each row describes a particular house, and each column describes a particular feature, $X$ is called the input set
- The correspending $y$ value of the $X$ row defines the price,$y$ is called the output set
- Based on the <strong>information learnt</strong> from $X$ and $y$, and given some <strong>new input data </strong> $X'$ we want to <strong>predict</strong> $y'$

In [None]:
#Load the Boston dataset
boston_dataset=ds.load_boston()
#X is a DataFrame (Matrix/2D array) containing the features of the iris data 
X=pd.DataFrame(boston_dataset["data"])
#y contains the numeric value/label of each iris flower, respecting the order 
#y is a vector (Series/1D array)
y=pd.Series(boston_dataset["target"],name ="Price")
#Feature Names and removing ' (cm)'
feature_names = boston_dataset["feature_names"]
X.columns=feature_names

### Viewing a sample of $X$
Complete the code below:

In [None]:
X.???

### Viewing the first elements of $y$
Feel free to change the number below

In [None]:
y.head(4)

## 3. Analysing Data
### Merging Tables
To Analyse data, sometimes, it maybe simpler to combine the data to one table.<br>
Complete the code below:

In [None]:
#This line combines The Input Data & The (Numerical) Ouput Data into a new Table (DataFrame)
U=pd.concat([???,???],axis=1)
#To see the last 10 examples
U.???

### Some Plottings

In [None]:
ax=sns.scatterplot(x="LSTAT",y="Price",data=U)
ax.set_title("Relation Between LSTAT  & Price")

## 4. Model Selection
### Creating Train & Test Sets
We will Create a training set that is used to fit our model<br>
The training data is a random sample of size $75\%$ of the boston dataset

In [None]:
X_train,X_test,y_train,y_test=???

### Creating & Fitting Model
Here we will create a LinearRegression model and we will fit it against the <strong>training data</strong><br>
Complete the code below

In [None]:
#Creating LinearRegression
model = ???
#Fitting Model
model.???;

## 5. Testing Model
We will evaluate the accuarcy of our model with the <strong>testing data</strong><br>
Complete the code below

In [None]:
#R² Score
r2_score =??? 
print("Our model has an R² Score of {:.3f}%".format(100*r2_score))

## 5. Model Deployment
Now our model is ready for use 😃<br>
We will now save it with Pickle
### a. Saving Model

In [None]:
from joblib import dump, load
dump(model, "boston_model.joblib") 

### b. Loading the Saved Model

In [None]:
model2 = load("boston_model.joblib")