<b>We will take the Housing dataset which contains information about different houses in Boston</b>

We can also access this data from the scikit-learn library. There are 506 samples and 13 feature variables in this dataset. 
The objective is to predict the value of prices of the house using the given features.

In [None]:
# import the required libraries.
import numpy as np
import matplotlib.pyplot as plt 
import pandas as pd  
import seaborn as sns 
import warnings
from numpy.linalg import inv
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

<b>load the housing data from the scikit-learn library</b>

In [None]:
with warnings.catch_warnings():
    warnings.filterwarnings("ignore")
    boston_dataset = load_boston()

In [None]:
# Now load the data into a pandas dataframe using pd.DataFrame
boston = pd.DataFrame(boston_dataset.data, columns=boston_dataset.feature_names)

# print the first 5 rows of the data
boston.head()

In [None]:
# the target value MEDV (house prices) is missing from the data. 
# We create a new column of target values and add it to the dataframe.
boston['MEDV'] = boston_dataset.target

<b>Create a correlation matrix that measures the linear relationships between the variables.</b>

In [None]:
corr = boston.corr()
corr

The correlation coefficient ranges from -1 to 1. If the value is close to 1,
it means that there is a strong positive correlation between the two variables. When it is close to -1,
the variables have a strong negative correlation.

From the above corelation plot we can see that MEDV is strongly correlated to LSTAT, RM

<b>Preparing the data for training the model</b>

In [None]:
#Select both these features together for training the model.
X = boston[['LSTAT', 'RM']].values  

# Target: prices of the house
y = boston_dataset.target

<b>Using a scatter plot to see how these features vary with MEDV.</b>

Which shows that the prices increase as the value of RM increases linearly

and The prices tend to decrease with an increase in LSTAT. Though it doesn’t look to be following exactly a linear line.

In [None]:
plt.figure(figsize=(15,4))
plt.subplot(1,2,1)
plt.xlabel('LSTAT')
plt.ylabel('MEDV')
plt.scatter(X[:,0], y ,marker='.')

plt.subplot(1,2,2)
plt.xlabel('RM')
plt.ylabel('MEDV')
plt.scatter(X[:,1], y ,marker='.')

<b>Splitting the data into training and testing sets</b>

Train the model with 80% of the samples and test with the remaining 20%

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

In [None]:
class LinearListSquare:
    def __init__(self):
        pass

    def fit(self,X,Y):
        self.w = np.matmul(inv(np.matmul(X.T,X)), np.matmul(X.T,Y))
        
        N = X_train.shape[0]

        #Hyper Parameters
        learning_rate = 0.05
        epochs = 2

        # Init weights
        W = np.random.rand(1,1)

        fig,(ax1,ax2) = plt.subplots(1,2)

        Errors = []

        # Train
        for epoch in range(epochs):
            for i in range(N):
                y_pred = np.matmul(X_train[i],W)
                e = Y_train[i] - y_pred


                # Update weights
                W += e * learning_rate * X_train[i]

                # Visualization
                Y_pred = np.matmul(X_train, W)
                ax1.clear()
                ax1.scatter(X_train, Y_train , c='#0000ff')
                ax1.plot(X_train , Y_pred , c='#ff0000', lw=4)

                Error = np.mean(Y_train - Y_pred)
                Errors.append(Error)
                ax2.clear()
                ax2.plot(Errors)

                plt.pause(0.01)

        plt.show()

    def predict(self, w):
        y_pred= np.matmul(X_test,self.w)
        return y_pred                     
    
    def evaluate(self , X , Y , loss="MAE"):
        Y_pred = []
        for i in range(X.shape[0]):
            y_pred = self.predict(X[i])
            Y_pred.append(y_pred)
            
        Y_pred = np.array(Y_pred)
        Error = Y - Y_pred
        
        if loss == "MAE":
            return np.mean(np.abs(Error))
        elif loss == "MSE":
            return np.mean(Error**2)

<b>Train the model by the above class</b>

In [None]:
lls = LinearListSquare()
lls.fit(X_train,y_train)

In [None]:
lls.w

In [None]:
y_pred = lls.predict(X_test)

In [None]:
MAE = lls.evaluate(X_test, y_test,'MAE')
MSE = lls.evaluate(X_test, y_test,'MSE')

print('MAE = ',MAE)
print('MSE = ' ,MSE)

In [None]:
fig = plt.figure(figsize=(10,10))
p = fig.add_subplot(1,1,1,projection='3d')
p.scatter(X_train[:,0], X_train[:,1], y_train , c = 'hotpink')
# p.scatter(X_train[:,0], X_train[:,1], y_pred , c = 'green')

p.set_xlabel('Percentage of lower status of the population')
p.set_ylabel('Average number of rooms')
p.set_zlabel('House Price')

plt.show()
print(y_train.shape ,y_pred.shape)

In [None]:
fig = plt.figure()
ax = fig.add_subplot(projection='3d')

x = np.arange(X_train[:,0].min(), X_train[:,0].max())
y = np.arange(X_train[:,1].min(), X_train[:,1].max())

x, y = np.meshgrid(x, y)
plane = x * lls.w[0] + y * lls.w[1]
ax.plot_surface(x, y, plane ,alpha=0.25)

ax.scatter(X_train[:,0], X_train[:,1], y_train , c = 'hotpink')
ax.view_init(20,50)

plt.show()