## Here we will be making our model

we will begin by importing required libraries

In [22]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


now that we have our libraries imported, we will first define some global variables

In [23]:
ALPHA = 0.0001
ITERATIONS = 2680
B = 0

now we will first import and feature scale data using z-scale normalisation

### After finding out that our model isn't learning
now we will also add some more features into the existing X, like square terms of both 'no. of owners' and age.

In [24]:
def getData():
    df = pd.read_csv('CleanDataNP.csv')
    fX = df.to_numpy()[:,0:-1]
    X = np.zeros((np.size(fX[:, 0]), 10))
    X[:, 0:8] = fX
    X[:, 8] = fX[:, 7]**2
    X[:, 9] = fX[:, 0]**2
    Y = df.to_numpy()[:,-1]
    mux = np.mean(X, axis=0)
    muy = np.mean(Y, axis=0)
    # print(np.shape(mux))
    sigx = np.std(X, axis=0)
    sigy = np.std(Y, axis=0)
    return (X - mux)/sigx, (Y-muy)/sigy, mux, muy, sigx, sigy

rX, rY, muX, muY, stdX, stdY = getData()

before moving further, let's first divide our data into 3 parts-  
1. Training set (X, Y) - 60%
2. Dev set (dX, dY) - 20%
3. Test set (tX. tY) - 20%

In [25]:
X, Y = rX[:5200, :], rY[:5200]
dX, dY = rX[5200:6950, :], rY[5200:6950]
tX, tY = rX[6950:, :], rY[6950:]
W = np.zeros(np.shape(X[0]))
print(np.shape(X), np.shape(Y), np.shape(dX), np.shape(dY), np.shape(tX), np.shape(tY))

(5200, 10) (5200,) (1750, 10) (1750,) (1768, 10) (1768,)


now we have the data which is 100% ready to be fed into model.  
so now let's make a function that calculate cost function

In [26]:
def calculateCost(x, y, w, b):
    cost = 0
    for i in range(len(y)):
        cost += (((np.dot(w, x[i]) + b) - y[i])**2)
    return (cost/(2*len(y)))
print(calculateCost(X, Y, W, B))

0.5388097996627864


let's also make a function to calculate MOE (mean obsolute error)

In [27]:
def calculateMOE(x, y, w, b):
    moe = 0
    for i in range(len(y)):
        moe += abs(np.dot(w, x[i]) + b - y[i])
    return (moe/(len(y)))
print(calculateMOE(X, Y, W, B))

0.5976416438749139


Now we will be making a function to calculate gradient for our gradient discent algo

In [28]:
def getGradient(x, y, w, b):
    grw = 0
    grb = 0
    for i in range(len(y)):
        grw += ((np.dot(w, x[i]) + b) - y[i])*x[i]
        grb += ((np.dot(w, x[i]) + b) - y[i])
    return (grw/(len(y))), (grb/(len(y)))

now perhaps we are ready to iterate and get the real values for w and b

In [29]:
def getWB(x, y, w, b):
    global W, B, tX, tY
    for i in range(ITERATIONS):
        grw, grb = getGradient(x, y, w, b)
        tw = W - ALPHA*grw
        tb = B - ALPHA*grb
        W, B = tw, tb
        if i%20 == 0:
            print("At iteration:", i, "Cost:", calculateCost(X, Y, W, B), "MOE:", calculateMOE(X, Y, W, B), "|| Unseen MOE:", calculateMOE(tX, tY, W, B))
getWB(X, Y, W, B)
print(W, B)

At iteration: 0 Cost: 0.538668939869814 MOE: 0.597514652617288 || Unseen MOE: 0.5606413826768367
At iteration: 20 Cost: 0.5358604029724232 MOE: 0.594974827464632 || Unseen MOE: 0.5581199492667612
At iteration: 40 Cost: 0.5330683593361791 MOE: 0.5924350023119802 || Unseen MOE: 0.5555985158566844
At iteration: 60 Cost: 0.5302928089610868 MOE: 0.5898963139162416 || Unseen MOE: 0.5530770824466106
At iteration: 80 Cost: 0.5275337518471436 MOE: 0.5873586983067516 || Unseen MOE: 0.5505563852111897
At iteration: 100 Cost: 0.5247911879943483 MOE: 0.584824403410384 || Unseen MOE: 0.5480382157827792
At iteration: 120 Cost: 0.5220651174027068 MOE: 0.5822941512229495 || Unseen MOE: 0.5455200463543692
At iteration: 140 Cost: 0.5193555400722141 MOE: 0.5797689441533722 || Unseen MOE: 0.5430028463317098
At iteration: 160 Cost: 0.5166624560028736 MOE: 0.5772470840353475 || Unseen MOE: 0.5404904175628237
At iteration: 180 Cost: 0.5139858651946831 MOE: 0.5747386793842647 || Unseen MOE: 0.5379894264950386


In [30]:
prices = np.dot(W, dX[0]) + B
print(prices*stdY + muY, dY[0]*stdY + muY)

print(X*stdX + muX, Y*stdY + muY)

2931836.910532183 1098000.0
[[5.0000e+00 2.2000e+01 1.7000e+05 ... 1.0000e+00 1.0000e+00 2.5000e+01]
 [4.0000e+00 8.0000e+00 4.5047e+04 ... 1.0000e+00 1.0000e+00 1.6000e+01]
 [8.0000e+00 1.0000e+01 5.0226e+04 ... 1.0000e+00 1.0000e+00 6.4000e+01]
 ...
 [1.3000e+01 2.2000e+01 7.4652e+04 ... 1.0000e+00 1.0000e+00 1.6900e+02]
 [6.0000e+00 1.0000e+01 6.4000e+04 ... 1.0000e+00 1.0000e+00 3.6000e+01]
 [1.2000e+01 1.2000e+01 7.1365e+04 ... 1.0000e+00 1.0000e+00 1.4400e+02]] [2000000.  950000.  850000. ...  188000.  425000.  218000.]


## So now our model isn't learning...
33L predicted vs 10.9L real is very very wrong!!!

since X and Y are mapping to each other correctly (I checked),
now we will be doing some feature engineering and then using regularization.