**Boston Housing Dataset**
Predicting Median value of owner-occupied homes
The aim of this assignment is to learn the application of machine learning algorithms to data sets. This involves learning what data means, how to handle data, training, cross validation, prediction, testing your model, etc.
This dataset contains information collected by the U.S Census Service concerning housing in the area of Boston Mass. It was obtained from the StatLib archive, and has been used extensively throughout the literature to benchmark algorithms. The data was originally published by Harrison, D. and Rubinfeld, D.L. Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.
The dataset is small in size with only 506 cases. It can be used to predict the median value of a home, which is done here. There are 14 attributes in each case of the dataset. They are:
*    CRIM - per capita crime rate by town
*    ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
*    INDUS - proportion of non-retail business acres per town.
*    CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
*    INOX - nitric oxides concentration (parts per 10 million)
*    RM - average number of rooms per dwelling
*    AGE - proportion of owner-occupied units built prior to 1940
*    DIS - weighted distances to five Boston employment centres
*    RAD - index of accessibility to radial highways
*    TAX - full-value property-tax rate per $10,000
*    PTRATIO - pupil-teacher ratio by town
*    B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
*    LSTAT - % lower status of the population
*    MEDV - Median value of owner-occupied homes in $1000's

Aim
    • To implement a linear regression with regularization via gradient descent.
    • to implement gradient descent with Lp norm, for 3 different values of p in (1,2]
    • To contrast the difference between performance of linear regression Lp norm and L2 norm for these 3 different values.
    • Tally that the gradient descent for L2 gives same result as matrix inversion based solution.
All the code is written in a single python file. The python program accepts the data directory path as input where the train and test csv files reside. Note that the data directory will contain two files train.csv used to train your model and test.csv for which the output predictions are to be made. The output predictions get written to a file named output.csv. The output.csv file should have two comma separated columns [ID,Output].
Working of Code
    • NumPy library would be required, so code begins by importing it
    • Import phi and phi_test from train and test datasets using NumPy's loadtxt function
    • Import y from train dataset using the loadtxt function
    • Concatenate coloumn of 1s to right of phi and phi_test
    • Apply min max scaling on each coloumn of phi and phi_test
    • Apply log scaling on y
    • Define a function to calculate change in error function based on phi, w and p norm
    • Make a dictionary containing filenames as keys and p as values
    • For each item in this dictionary
        ◦ Set the w to all 0s
        ◦ Set an appropriate value for lambda and step size
        ◦ Calculate new value of w
        ◦ Repeat steps until error between consecutive ws is less than threshold
        ◦ Load values of id from test data file
        ◦ Calculate y for test data using phi test and applying inverse log
        ◦ Save the ids and y according to filename from dictionary

In [2]:
import numpy as np

#### Import phi and phi_test from train and test datasets using NumPy's loadtxt function

In [5]:
#Import phi from train data set
phi = np.loadtxt('train.csv', dtype='float', delimiter=',', skiprows=1,usecols=tuple(range(1, 14)))
# Import phi_test from test data set
phi_test = np.loadtxt('test.csv', dtype='float', delimiter=',',skiprows=1, usecols=tuple(range(1, 14)))

#### Import y from train dataset using the loadtxt function

In [7]:
y = np.loadtxt('train.csv', dtype='float', delimiter=',', skiprows=1,usecols=14, ndmin=2)

#### Concatenate coloumn of 1s to right of phi and phi_test

In [8]:
phi_test = np.concatenate((phi_test, np.ones((105, 1))), axis=1)
phi = np.concatenate((phi, np.ones((400, 1))), axis=1)

#### Apply min max scaling on each coloumn of phi and phi_test

In [9]:
for i in range(0, 13):
    col_max = max(phi[:, i])
    col_min = min(phi[:, i])
    phi[:, i] = (phi[:, i] - col_min) / (col_max - col_min)
    phi_test[:, i] = (phi_test[:, i] - col_min) / (col_max - col_min)

#### Apply log scaling on y.

In [10]:
y = np.log(y)

#### Define a function to calculate change in error function based on phi, w and p norm

In [11]:
def delta_w(p, phi, w):
    if p == 2:
        deltaw = (2 * (np.dot(np.dot(np.transpose(phi), phi), w) -
                       np.dot(np.transpose(phi), y)) +
                  lambd * p * np.power(np.absolute(w), (p - 1)))
    if p < 2 and p > 1:
        deltaw = (2 * (np.dot(np.dot(np.transpose(phi), phi), w) -
                       np.dot(np.transpose(phi), y)) +
                  lambd * p * np.power(np.absolute(w), (p - 1)) * np.sign(w))
    return deltaw

#### Make a dictionary containing filenames as keys and p as values

In [12]:
filenames = {'output.csv': 2.0,
             'output_p1.csv': 1.75,
             'output_p2.csv': 1.5,
             'output_p3.csv': 1.3
             }


#### For each item in this dictionary:
#### Set the w to all 0s

In [13]:
for (fname, p) in filenames.items():
    w = np.zeros((14, 1))

#### Set an appropriate value for lambda(Hyperparameter) and step size

In [14]:
lambd = 0.2
t = 0.00012 # Max step size

#### Calculate new value of w

In [16]:
w_new = w - t * delta_w(p, phi, w)
w_new

array([[0.01009907],
       [0.03686453],
       [0.10625758],
       [0.02276236],
       [0.09557588],
       [0.1390052 ],
       [0.19031748],
       [0.07372563],
       [0.09978034],
       [0.11313201],
       [0.17698804],
       [0.26676462],
       [0.08052811],
       [0.29102296]])

#### Repeat steps until error between consecutive ws is less than threshold

In [17]:
i = 0
while(np.linalg.norm(w_new-w) > 10 ** -10):
        w = w_new
        w_new = w - t * delta_w(p, phi, w)
        i = i + 1

#### Load values of id from test data file

In [18]:
id_test = np.loadtxt('test.csv', dtype='int', delimiter=',',skiprows=1, usecols=0, ndmin=2)

#### Calculate y for test data using phi test and applying inverse log

In [19]:
y_test = np.exp(np.dot(phi_test, w_new))

#### Save the ids and y according to filename from dictionary

In [20]:
np.savetxt(fname, np.concatenate((id_test, y_test), axis=1),delimiter=',', fmt=['%d', '%f'], header='ID,MEDV', comments='')