## Housing Data Prediction

### Dataset
The housing data has been divided into Train and Test Data as per the requirement stated in the project description.
We have 60 rows of training data and 10 test data.

Features Considered or Independent Variables: 'housing_median_age', 'total_rooms', 'total_bedrooms', 'population', 'households', 'median_income'
Independent Variable: 'median_house_value'

In [106]:
import pandas as pd
import numpy as np

def calculate_metrics(X, y):
    A = np.linalg.pinv(np.dot(X.T, X))
    B = np.dot(X.T, y)
    return A,B

np.set_printoptions(formatter={'float_kind':'{:f}'.format})

while True:
    try:
        filename = input('Enter the name of the Training File: ')
        train = pd.read_csv(filename)
        print('Length of Training Dataset', len(train))
        filename = input('Enter the name of the Test File: ')
        test = pd.read_csv(filename)
        print('Length of Testing Dataset', len(test))
        break
    except FileNotFoundError:
        print('File is not found')


X_train = train.iloc[ : , :-1].values 
y_train = train.iloc[ : , -1:].values

val_1 = np.ones(shape = y_train.shape) 
X_train = np.concatenate((val_1, X_train), 1) #Generating a Design Matrix

A, B =  calculate_metrics(X_train, y_train)
w = np.dot(A, B) #Calculating weights using Training Data
A = np.dot(X_train, w) - y_train
J = (1/len(train))*np.dot(A.T, A)
print()

print("Weights: \n", w) 
print()

print("Overall Training error (J): " ,float(J))

X_test = test.iloc[:, :-1].values
y_test = test.iloc[ : , -1:].values

val_1 = np.ones(shape = y_test.shape)
X_test = np.concatenate((val_1, X_test), 1) #Generating a Design Matrix

y_pred = np.dot(X_test, w) #Obtaining the prediction values
A = np.dot(X_test, w) - y_test
J = (1/len(test))*np.dot(A.T, A)
print("Overall Testing error (J): " ,float(J))

y_pred = y_pred.astype(float)
result = pd.DataFrame(y_pred, columns=['Predicted Value'])
result['Actual Value'] = y_test
result.to_csv('Result.csv', index = False)
print(result)

Enter the name of the Training File: Train.csv
Length of Training Dataset 60
Enter the name of the Test File: Test.csv
Length of Testing Dataset 10

Weights: 
 [[-15414.653998]
 [1156.558335]
 [-22.410307]
 [373.553765]
 [-135.485149]
 [30.684460]
 [49211.758244]]

Overall Training error (J):  1376184447.2728136
Overall Testing error (J):  1018864721.6003481
   Predicted Value  Actual Value
0    145691.785159      112000.0
1    150391.913366      107200.0
2    129109.895095      115600.0
3    113695.043039       98300.0
4    164475.946386      116800.0
5     88321.278002       78100.0
6    126906.415418       77100.0
7     95420.396081       92300.0
8    118706.290356       84700.0
9    116678.760473       89400.0


In [107]:
result

Unnamed: 0,Predicted Value,Actual Value
0,145691.785159,112000.0
1,150391.913366,107200.0
2,129109.895095,115600.0
3,113695.043039,98300.0
4,164475.946386,116800.0
5,88321.278002,78100.0
6,126906.415418,77100.0
7,95420.396081,92300.0
8,118706.290356,84700.0
9,116678.760473,89400.0
