# **Final Assignment**
In this assignment, you will implement K-Nearest Neighbour (KNN) and Ordinary Least Square (OLS) regression methods. Download the [excel file](https://docs.google.com/spreadsheets/d/17f6h4h-4x6XMuI4Budcw4Ujoxd0ceogv). The dataset contains 11 columns: `"bedrooms"`, `"bathrooms"`, `"sqft_living"`, `"sqft_lot"`, `"floors"`, `"condition"`, `"grade"`, `"sqft_above"`, `"sqft_basement"`, `"age"`and `"price"`.
Use `pd.read_excel()` to read the file and make prediction of whether a given house is likely to be expensive based on the features related to the house.

Use KNN and OLS regression independently to determine and compare their performance in terms of accuracy and confusion matrix.

To accomplish this task, you will have to convert the values under `"price"` into one of two possible values: **1** and **0** denoting "expensive" and "cheap" respectively. For this, if the price of the house is less than `450000`, it is "cheap"; otherwise it is "expensive" . 

**Note:**This conversion must be done before training for KNN, and after conversa for OLS regression, on each predicted `"price"`.

In [6]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix

data = pd.read_excel('/content/house_data.xlsx')

data['price'] = np.where(data['price']>=450000, 1, 0)

train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)

def knn_predict(train_data, test_data, k):
    y_pred = []
    for i in range(len(test_data)):
        distances = np.sqrt(np.sum(np.square(np.subtract(train_data[:,:-1],test_data[i,:-1])), axis=1))
        nearest_indices = np.argsort(distances)[:k]
        nearest_labels = train_data[nearest_indices,-1].tolist()
        y_pred.append(max(set(nearest_labels), key=nearest_labels.count))
    return y_pred

def ols_predict(train_data, test_data):
    X_train = train_data[:,:-1]
    y_train = train_data[:,-1].reshape(-1,1)
    X_test = test_data[:,:-1]
    X_train = np.concatenate([np.ones((X_train.shape[0],1)), X_train], axis=1)
    X_test = np.concatenate([np.ones((X_test.shape[0],1)), X_test], axis=1)
    beta = np.linalg.inv(X_train.T @ X_train) @ X_train.T @ y_train
    y_pred = np.where(X_test @ beta >= 0.5, 1, 0)
    return y_pred

train_data = train_data.values
test_data = test_data.values

k = 5
y_test_knn = test_data[:,-1].tolist()
y_pred_knn = knn_predict(train_data, test_data, k)
y_test_ols = test_data[:,-1].reshape(-1,1)
y_pred_ols = ols_predict(train_data, test_data)

print('KNN:')
print('Accuracy:', accuracy_score(y_test_knn, y_pred_knn))
print('Confusion Matrix:\n', confusion_matrix(y_test_knn, y_pred_knn))
print('OLS Regression:')
print('Accuracy:', accuracy_score(y_test_ols, y_pred_ols))
print('Confusion Matrix:\n', confusion_matrix(y_test_ols, y_pred_ols))

KNN:
Accuracy: 0.7284293314827666
Confusion Matrix:
 [[1571  533]
 [ 641 1578]]
OLS Regression:
Accuracy: 0.5910247513300948
Confusion Matrix:
 [[1916  188]
 [1580  639]]
