***1.1.1
 Please provide definiton of Hinge Loss Function with the offset $\theta_0$***


Hinge loss: 

$Loss_h(z) = max{1- y (\theta^Tx), 0}$

Hinge loss with offset $\theta_0$: 

$Loss_h(z) = max{1- y (\theta^T x + \theta_0), 0}$

***1.1.2 Describe how to update weights $\theta$ and $\theta_0$ to minimize the hinge loss function using stochastic sub-gradient descent***


Stochastic Gradient Descent (SGD) is an optimization method used to minimize a loss funciton like the hinge loss function by updating parameters using randomly selected data points. For hinge loss, the goal is to maximize the margin between data points and the decision boundary.

Hinge loss can be calculated as: $Loss_h(z) = max{1- y (\theta^Tx), 0}$. If $yt(\theta \cdot  x_t) >= 1$ the point is correctly classified and the loss is 0. However if $yt(\theta \cdot  x_t) < 1$ the point is either within the margin or misclassified, and hence the loss is positive. 

To optimize the classifier, we can compute the gradient of hinge loss with respect to \theta. If $yt(\theta \cdot  x_t) >= 1$, then no update is needed. If $yt(\theta \cdot  x_t) < 1$ then the weight vector must be updated in the direction of correction.

SGD updates the weight vector $\theta$ for each randomly selected sample $(xt, yt)$: 
$\theta^{(k + 1)} = \theta^{(k)} - \eta_k\nabla\theta Loss$

In SGD, $\theta$ is initialized as 0. Then a random traning example is seleted and the hinge loss is calculated. If $yt(\theta \cdot  x_t) >= 1$, then no update is needed. If $yt(\theta \cdot  x_t) < 1$ then we compute the gradient of the hinge loss: $\nabla_\theta Loss = -ytxt$. This tells us the direction in which we need to adjust $\theta$. $\theta = \theta - \eta_k(-ytxt)$ therefore $\theta = \theta + \eta_k ytxt$. Where $\eta_k$ is the learning rate. This is repeated until the max number of iterations is reached.



In [53]:
import numpy as np


In [54]:
def SGD(X, y, loss_gradient, lr = 0.01, max_iter = 100): 
    """
    Stochastic Gradient Descent

    X: numpy array of shape (n_samples, n_features) - features 
    y: numpy array of shape (n_samples,) - target values
    loss_gradient: function that computes the gradient of the loss function
    lr: learning rate
    max_iter: number of iterations

    
    """
    n_samples, n_features = X.shape
    theta = np.zeros(n_features) # initialze theta to 0 

    for k in range(max_iter):
        t = np.random.randing(n_samples)
        grad = loss_gradient(X[t], y[t], theta)
        theta -= lr * grad 
    
    return theta

In [55]:
def hinge_loss_gradient(x_t, y_t, theta):
    if y_t * np.dot(x_t, theta) < 1:
        return -y_t * x_t

    return np.zeros_like(theta)


***1.2.1 Implement the Perceptron Algorithm of Linear Classification with offset***

In [56]:
def perceptron_with_offset(theta, offset, feature, target, num_features, max_iter= 100): 
    it = 0
    error = 0
    while it <= max_iter: 
        m = 0 
        for i in range(0,len(target)): 
            if target[i]*(np.dot(theta, (feature[i]+ offset))) <= 0: 
                theta = theta + (feature[i] * target[i])
                offset = offset + target[i]
                error += 1

                m += 1
                it +=1
        if m == 0: 
            break

        err = error/len(target)
        ls = [theta, offset, err]

    return ls

***1.2.2 Train Linear Classifier with offset using Perceptron Algorithm and Evaluate***

In [57]:
import pandas as pd

In [58]:
training_data = pd.read_csv('data/hw1_train_1_5.csv')


In [59]:
X = training_data.iloc[:, :2].to_numpy()
y = training_data.iloc[:, 2].values

num_features = X.shape[1]
num_epochs = 1



In [60]:
theta = np.random.randn(num_features) * 0.01 
offset = 0.0
error = 0
num_epochs = 1

for i in range(num_epochs): 
    ls = perceptron_with_offset(theta, offset, X, y, num_features)
    theta = ls[0]
    offset = ls[1]
    err = ls[2]
    
    
print('learned weightd:', theta)
print('learned bias:', offset)
print('training error:', error)

learned weightd: [-439.74367432   51.72785564]
learned bias: 621.0
training error: 0


In [61]:
theta = np.random.randn(num_features) * 0.01 
offset = 0.0
num_epochs = 5

for i in range(num_epochs): 
    ls = perceptron_with_offset(theta, offset, X, y, num_features)
    theta = ls[0]
    offset = ls[1]
    
    
print('theta:', theta)
print('offset:', offset)

theta: [-2198.79817644   258.64768387]
offset: 3105.0
