<a href="https://colab.research.google.com/github/namanpundir/buffalohousepricing/blob/main/main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### For this Assignment you have been given a data which is a subset of a bigger dataset which was collected by Buffalo Tax department. It contains information regarding the various properties in Buffalo.

Number of Instances: 92508

Number of Attributes: 16 (including the target variable)

Attribute Information:

| Column Name                | Description                                                                                                                                      | Type        |
|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|-------------|
| TOTAL VALUE                | The combined assessed value of the land and improvements on the parcel                                                                           | Number      |
| FRONT                      | The width of the front of property (in feet)                                                                                                     | Number      |
| DEPTH                      | The depth of the property (in feet)                                                                                                              | Number      |
| PROPERTY CLASS             | Property Type Classification Codes describe the primary use of each parcel of real property on the assessment roll                               | Number      |
| LAND VALUE                 | The assessed value of the land                                                                                                                   | Number      |
| SALE PRICE                 | The price that the parcel of real property was last sold for                                                                                     | Number      |
| YEAR BUILT                 | The year the primary building on the parcel was built                                                                                            | Number      |
| TOTAL LIVING AREA          | The amount of living space (in square feet)                                                                                                      | Number      |
| OVERALL CONDITION          | A grade of the condition of the property                                                                                                         | Number      |
| BUILDING STYLE             | A code for the style of building                                                                                                                 | Number      |
| HEAT TYPE                  | The type of heating system in the building (only applicable to residential properties)                                                           | Number      |
| BASEMENT TYPE              | The type of basement on the property (only applicable to residential properties)                                                                 | Number      |
| # OF STORIES               | The number of floors/Stories in the property                                                                                                     | Number      |
| # OF FIREPLACES            | The number of fireplaces in a dwelling (only applicable to residential properties)                                                               | Number      |
| # OF BEDS                  | The number of beds in a dwelling (only applicable to residential properties)                                                                     | Number      |
| # OF BATHS                 | The number of baths in a dwelling (only applicable to residential properties)                                                                    | Number      |
| # OF KITCHENS              | The number of kitchens in a dwelling (only applicable to residential properties)                                                                 | Number      |



There are no missing Attribute Values.

Your task is to implement a Linear Regression Model to predict the TOTAL VALUE of a property

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

%matplotlib inline

#### STEP 1 - Load Data (Already Done)

In [None]:
df = pd.read_csv('//content//data.csv', dtype=np.float64)

In [None]:
df.head()

Unnamed: 0,TOTAL VALUE,FRONT,DEPTH,PROPERTY CLASS,LAND VALUE,SALE PRICE,YEAR BUILT,TOTAL LIVING AREA,OVERALL CONDITION,BUILDING STYLE,HEAT TYPE,BASEMENT TYPE,# OF FIREPLACES,# OF BEDS,# OF BATHS,# OF KITCHENS
0,26600.0,11.0,0.0,411.0,2600.0,1.0,1985.0,1283.0,3.0,14.0,3.0,1.0,0.0,2.0,1.0,1.0
1,200.0,23.0,0.0,340.0,200.0,0.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,0.0,0.0,0.0,0.0
2,25000.0,99.0,1440.0,311.0,25000.0,100000.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,0.0,0.0,0.0,0.0
3,26300.0,40.0,60.0,220.0,1600.0,1.0,1900.0,2444.0,3.0,8.0,2.0,4.0,0.0,5.0,2.0,2.0
4,52100.0,35.0,200.0,210.0,2800.0,0.0,1926.0,2144.0,3.0,8.0,2.0,4.0,1.0,4.0,2.0,1.0


In [None]:
y = np.asarray(df['TOTAL VALUE'] )
y = y.reshape(y.shape[0],1)
feature_cols = df.columns.to_list()
feature_cols.remove('TOTAL VALUE')
x = np.asarray(df[feature_cols])

Variable **y** contains the total values of the property

Variable **x** contains the features

#### STEP 2 - Split the Data into training and testing and validation split ( 70% Training, 20% Testing and 10% validation) ( Hint: you can use the sklearn library for this step only) ( 5 Points)

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)

x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.125, random_state=1) #0.125*0.8=0.1

#### STEP 3 - Scale Data Using Min Max Scaler (10 Points)
For each feature scaled value can be calculated using $  x_{scaled} = \frac{x - min(x)}{max(x) - min(x)}$


In [None]:
#STEP 3

list_col =[]
for i in pd.DataFrame(x_train).columns:
  
  xscaled_numerator = pd.DataFrame(x_train)[i]-min(pd.DataFrame(x_train)[i])
  xscaled_denominator = max(pd.DataFrame(x_train)[i]) - min(pd.DataFrame(x_train)[i])  
  # print(max(pd.DataFrame(x_train)[i]))
  # print(min(pd.DataFrame(x_train)[i]))
  list_col.append(xscaled_numerator/xscaled_denominator)
df_x_train_scaled = pd.DataFrame(np.array(list_col).T)



In [None]:
df_x_train_scaled

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,0.000360,0.007170,0.013141,0.002226,1.123596e-08,0.945022,0.225008,0.666667,0.500000,0.6,1.0,0.0,0.500000,0.250,0.666667
1,0.000360,0.001557,0.132720,0.000069,1.123596e-08,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000,0.000000
2,0.000309,0.006667,0.132720,0.000199,5.617978e-06,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000,0.000000
3,0.000360,0.007233,0.000000,0.000225,4.820225e-04,0.954928,0.101738,0.666667,0.500000,0.6,1.0,0.0,0.250000,0.125,0.333333
4,0.000360,0.006415,0.013141,0.000727,3.685393e-04,0.954433,0.256536,0.666667,0.500000,0.6,0.8,0.0,0.500000,0.250,0.666667
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64750,0.000412,0.004969,0.000000,0.000372,1.123596e-08,0.967806,0.102507,0.666667,0.277778,0.6,1.0,0.0,0.250000,0.125,0.333333
64751,0.000360,0.006730,0.013141,0.000823,2.247191e-04,0.947003,0.187019,0.666667,0.500000,0.6,1.0,0.0,0.333333,0.375,0.666667
64752,0.000257,0.006289,0.132720,0.000268,0.000000e+00,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000,0.000000
64753,0.000329,0.007673,0.132720,0.000147,0.000000e+00,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000,0.000000


#### STEP 4 - Initialize values for the weights, No. of Epochs and Learning Rate (5 Points)

In [None]:
#STEP 4
w= np.dot(np.linalg.inv(np.dot(x_train.T,x_train)),np.dot(x_train.T,y_train))


array([[ 4.06749116e+01],
       [-1.00964420e+02],
       [ 1.09165684e+03],
       [ 3.51667282e+00],
       [ 3.04607501e-01],
       [-4.41468351e+02],
       [-8.43390683e+00],
       [ 1.32539280e+05],
       [ 7.05753114e+03],
       [ 8.16934327e+04],
       [ 1.89991627e+04],
       [-3.78742454e+03],
       [-3.63579482e+03],
       [-1.19128794e+04],
       [ 1.04994236e+04]])

In [None]:
batch_size=32
epochs=9,
learning_rate=0.01

#### STEP 5 - Train Linear Regression Model (40 Points)
 5.1 Start a Loop For each Epoch
 
 5.2 Find the predicted value using $ y(x,w) = w_0 + w_1x $ for the training and validation splits (10 Points)
 
 5.3 Find the Loss using Mean Squared Error for the training and validation splits and store in a list (10 Points)
 
 5.4 Calculate the Gradients (15 Points)
 
 5.5 Update the weights using the gradients (5 Points)

In [None]:
# STEP 5
for i in epochs:
  

#### STEP 6 - Evaluate the Model ( 25 Points)
6.1 Plot a graph of the Training and Validation Loss wrt epochs (10 Points)

6.2 Find the R2 Score of the trained model for the Train, Test and Validation splits (15 Points)

In [None]:
# STEP 6
