# Linear Classification

In this lab you will implement parts of a linear classification model using the regularized empirical risk minimization principle. By completing this lab and analysing the code, you gain deeper understanding of these type of models, and of gradient descent.


## Problem Setting

The dataset describes diagnosing of cardiac Single Proton Emission Computed Tomography (SPECT) images. Each of the patients is classified into two categories: normal (1) and abnormal (0). The training data contains 80 SPECT images from which 22 binary features have been extracted. The goal is to predict the label for an unseen test set of 187 tomography images.

In [1]:
import urllib
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score
# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

testfile = urllib.request.URLopener()
testfile.retrieve("http://archive.ics.uci.edu/ml/machine-learning-databases/spect/SPECT.train", "SPECT.train")
testfile.retrieve("http://archive.ics.uci.edu/ml/machine-learning-databases/spect/SPECT.test", "SPECT.test")

df_train = pd.read_csv('SPECT.train',header=None)
df_test = pd.read_csv('SPECT.test',header=None)

train = df_train.as_matrix()
test = df_test.as_matrix()

y_train = train[:,0]
X_train = train[:,1:]
y_test = test[:,0]
X_test = test[:,1:]



### Exercise 1

Analyze the function learn_reg_ERM(X,y,lambda) which for a given $n\times m$ data matrix $\textbf{X}$ and binary class label $\textbf{y}$ learns and returns a linear model $\textbf{w}$.
The binary class label has to be transformed so that its range is $\left \{-1,1 \right \}$. 
The trade-off parameter between the empirical loss and the regularizer is given by $\lambda > 0$. 
Try to understand each step of the learning algorithm and comment each line.

In [2]:
def learn_reg_ERM(X,y,lbda):
    #max iteration do go our loop regression training
    max_iter = 200
    #boundary
    e  = 0.001
    #learning rate
    alpha = 1.
    #generate 22 the coeffs for each input Value X
    w = np.random.randn(X.shape[1]);
    for k in np.arange(max_iter):
        #inner product for each input with the coeffs 
        # X%*%v
        h = np.dot(X,w)
        #get the loss between prediction and label 
        #get the gradient
        l,lg = loss(h, y)
        #print the loss of the current learning model
        print ('loss: {}'.format(np.mean(l)))
        #compute the regulazations and the gradient of the regularizer
        r,rg = reg(w, lbda)
        #use the regularization in the model
        #both gradients from loss and from regularization
        g = np.dot(X.T,lg) + rg 
        if (k > 0):
            #change the learning stepsiye
            alpha = alpha * (np.dot(g_old.T,g_old))/(np.dot((g_old - g).T,g_old))
        #update the weights    
        w = w - alpha * g
        #break if the training step is slower than 0.001
        if (np.linalg.norm(alpha * g) < e):
            break
        #update g_old    
        g_old = g
    #return the trained weights    
    return w

### Exercise 2

Fill in the code for the function loss(h,y) which computes the hinge loss and its gradient. 
This function takes a given vector $\textbf{y}$ with the true labels $\in \left \{-1,1\right \}$ and a vector $\textbf{h}$ with the function values of the linear model as inputs. The function returns a vector $\textbf{l}$ with the hinge loss $\max(0, 1 − y_{i} h_{i})$ and a vector $\textbf{g}$ with the gradients of the hinge loss at the points $h_i$. The partial derivative of the hinge loss $h_i$ with respect to the $i$-th position of the weight vector $\textbf{w}$ is $g_{i} = −y x_{i}$ if $l_{i} > 0$, else $g_{i} = 0$).

In [3]:
def loss(h, y):
    # hinge loss max(0,1−𝑦𝑖ℎ𝑖)
    l = np.maximum(0,1 -y*h)
    #𝑔𝑖 = −𝑦𝑥𝑖 if 𝑙𝑖 > 0
    g = -y*(l > 0)
    return l, g

In [4]:
w = np.random.randn(X_train.shape[1])

### Exercise 3

Fill in the code for the function reg(w,lambda) which computes the $\mathcal{L}_2$-regularizer and the gradient of the regularizer function at point $\textbf{w}$. 


$$r = \frac{\lambda}{2} \textbf{w}^{T}\textbf{w}$$

$$g = \lambda \textbf{w}$$

In [5]:
def reg(w, lbda):
    r = lbda/2 * w.T *w
    g = lbda * w
    return r, g

### Exercise 4

Fill in the code for the function predict(w,x) which predicts the class label $y$ for a data point $\textbf{x}$ or a matrix $X$ of data points (row-wise) for a previously trained linear model $\textbf{w}$. If there is only a data point given, the function is supposed to return a scalar value. If a matrix is given a vector of predictions is supposed to be returned.

In [6]:
def predict(w, X):
    preds = 2*(np.dot(X,w) >0)  -1
    return preds

### Exercise 5

#### 5.1 
Train a linear model on the training data and classify all 187 test instances afterwards using the function predict. 
Please note that the given class labels are in the range $\left \{0,1 \right \}$, however the learning algorithm expects a label in the range of $\left \{-1,1 \right \}$. Then, compute the accuracy of your trained linear model on both the training and the test data. 

In [7]:
y_train = 2 * y_train -1
y_test = 2 * y_test -1

w =  learn_reg_ERM(X_train,y_train,100)


loss: 1.6997279968032148
loss: 43.47041326062724
loss: 1.0133054523700866
loss: 0.7931857498351504
loss: 0.8627522143122558
loss: 0.8246089870480443
loss: 0.8230282229045887
loss: 0.8274494840005208
loss: 0.8254105139494946
loss: 0.82125
loss: 0.8248817969321761
loss: 0.834125
loss: 0.8244096300652952
loss: 0.8234323780894985
loss: 0.8248848925955568
loss: 0.8244759869552883
loss: 0.8212499999999998
loss: 0.8268102741389433
loss: 0.8256785699963419
loss: 0.8262500177302708


In [8]:
predict_train = predict(w, X_train)
print("Train Accuracy: ",accuracy_score(y_train,np.round(predict_train)))
predict_test = predict(w, X_test)
print("Test Accuracy: ",accuracy_score(y_test,np.round(predict_test)))

Train Accuracy:  0.625
Test Accuracy:  0.9358288770053476


#### 5.2
Compare the accuracy of the linear model with the accuracy of a random forest and a decision tree on the training and test data set.

In [9]:
from sklearn.ensemble import RandomForestClassifier

In [10]:
rf = RandomForestClassifier()
rf.fit(X_train,y_train)
print("Random Forest Accuracy")
y_predict_test = rf.predict(X_test)
print("Test Accuracy: ",accuracy_score(y_test,y_predict_test))
y_predict_train = rf.predict(X_train)
print("Train Accuracy: ",accuracy_score(y_train,y_predict_train))

Random Forest Accuracy
Test Accuracy:  0.7647058823529411
Train Accuracy:  0.925




In [11]:
from sklearn.tree import DecisionTreeClassifier

clf_tree = DecisionTreeClassifier(criterion='entropy', max_depth=1)
clf_tree.fit(X_train,y_train)
print("Decision Tree Accuracy")
y_predict_test = clf_tree.predict(X_test)
print("Test Accuracy: ",accuracy_score(y_test,y_predict_test))
y_predict_train = clf_tree.predict(X_train)
print("Train Accuracy: ",accuracy_score(y_train,y_predict_train))

Decision Tree Accuracy
Test Accuracy:  0.6149732620320856
Train Accuracy:  0.725
