In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy

We shall work with the dataset found in the file 'murderdata.txt', which is a 20 x 5 data matrix where the columns correspond to

Index (not for use in analysis)

Number of inhabitants

Percent with incomes below $5000

Percent unemployed

Murders per annum per 1,000,000 inhabitants

**Reference:**

Helmut Spaeth,
Mathematical Algorithms for Linear Regression,
Academic Press, 1991,
ISBN 0-12-656460-4.

D G Kleinbaum and L L Kupper,
Applied Regression Analysis and Other Multivariable Methods,
Duxbury Press, 1978, page 150.

http://people.sc.fsu.edu/~jburkardt/datasets/regression

**What to do?**

We start by loading the data; today we will study how the number of murders relates to the percentage of unemployment.

In [None]:
data = numpy.loadtxt('murderdata.txt')
N, d = data.shape

We consider all both features simulaneously.

In [None]:
t = data[:,4]
X = data[:,2:4]
print("Number of training instances: %i" % X.shape[0])
print("Number of features: %i" % X.shape[1])

In [None]:
# NOTE: This template makes use of Python classes. If 
# you are not yet familiar with this concept, you can 
# find a short introduction here: 
# http://introtopython.org/classes.html

class LinearRegression():
    """
    Linear regression implementation.
    """

    def __init__(self):
        
        pass
            
    def fit(self, X, t):
        """
        Fits the linear regression model.

        Parameters
        ----------
        X : Array of shape [n_samples, n_features]
        t : Array of shape [n_samples, 1]
        """        

        # make sure that we have Numpy arrays; also
        # reshape the target array to ensure that we have
        # a N-dimensional Numpy array (ndarray), see
        # https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html
        X = numpy.array(X).reshape((len(X), -1))
        t = numpy.array(t).reshape((len(t), 1))

        # prepend a column of ones
        ones = numpy.ones((X.shape[0], 1))
        X = numpy.concatenate((ones, X), axis=1)

        # compute weights  (matrix inverse)
        self.w = numpy.linalg.pinv((numpy.dot(X.T, X)))
        self.w = numpy.dot(self.w, X.T)
        self.w = numpy.dot(self.w, t)
        
        # (2) TODO: Make use of numpy.linalg.solve instead!
        # Reason: Inverting the matrix is not very stable
        # from a numerical perspective; directly solving
        # the linear system of equations is usually better.
                
    def predict(self, X):
        """
        Computes predictions for a new set of points.

        Parameters
        ----------
        X : Array of shape [n_samples, n_features]

        Returns
        -------
        predictions : Array of shape [n_samples, 1]
        """                     
        
        # (1) TODO: Compute the predictions for the
        # array of input points
        
        # predictions = ...

        return predictions

Next, let us instantiate a LinearRegression object and call the fit function.

In [None]:
model = LinearRegression()
model.fit(X, t)

We can also plot the computed model coefficients ...

In [None]:
print("Model coefficients:")
print(model.w)

In [None]:
# (c) evaluation of results
preds = model.predict(X)

plt.figure(figsize=(10,10))
plt.scatter(t, preds)
plt.xlabel("Murders per annum per 1,000,000 inhabitants")
plt.ylabel("Predictions")
plt.xlim([0,50])
plt.ylim([0,50])
plt.title("Evaluation")
plt.show()