# Problem 8 - Logistic Regression


For the update of $w$ in gradient descent we need to compute the gradient of the error function.

The error in logistic regression is given by (see slide 3, lecture 10): 

$e(h(\mathbf{x}), y) = \ln(1 + e^{-y \mathbf{w}^T \mathbf{x} })$


The partial derivatives $\frac{\partial e}{\partial w_k}$ are then:

$\frac{\partial e}{\partial w_k} = \frac{1}{1 + e^{-y \mathbf{w}^T \mathbf{x} }} \cdot e^{-y \mathbf{w}^T \mathbf{x} } \cdot (-y x_k) 
= -\frac{y \cdot x_k}{e^{y \mathbf{w}^T \mathbf{x}} + 1}$

with $k \in \{0,1,2 \}$ (See also slide 23, lecture 9).

So the gradient of $e$ is given by:

$\nabla e = [\frac{\partial e}{\partial w_0},\frac{\partial e}{\partial w_1}, \frac{\partial e}{\partial w_2}] 
          = [-\frac{y \cdot x_0}{e^{y \mathbf{w}^T \mathbf{x}} + 1} , -\frac{y \cdot x_1}{e^{y \mathbf{w}^T \mathbf{x}} + 1} ,  -\frac{y \cdot x_2}{e^{y \mathbf{w}^T \mathbf{x}} + 1}]
          = \frac{-y}{e^{y \mathbf{w}^T \mathbf{x}} + 1}[x_0, x_1, x_2]
          = \frac{-y\mathbf{x}}{e^{y \mathbf{w}^T \mathbf{x}} + 1}
          $
________

We can also compute the **partial derivative** in WolframAlpha

https://www.wolframalpha.com/input/?i=d%2Fdw(ln(1+%2B+exp(-y+*+w+*+x)))


________

# Problem 9

Problem 9 asks us to compute the average number of epochs that are required for convergence.

In [5]:
import random
import numpy as np
import math

def problem8_9():
    
    RUNS = 100
    E_out_total = 0
    epoch_total = 0
    
    for run in range(RUNS):
        # create training set with N = 100 points via separating line

        # separating line,
        # choose two random points A, B in [-1,1] x [-1,1]
        A = np.random.uniform(-1,1,2)
        B = np.random.uniform(-1,1,2)

        # the line can be described by y = m*x + b where m is the slope
        m = (B[1] - A[1]) / (B[0] - A[0])
        b = B[1] - m * B[0]  
        w_f = np.array([b, m, -1])

        #-----------------------

        # Pick N data points (x, y) uniformly from the box [-1,1] x [-1,1]
        N = 100
        x1 = np.random.uniform(-1,1,N)
        x2 = np.random.uniform(-1,1,N)

        X = np.transpose(np.array([np.ones(N), x1, x2]))           # input

        # Classify these points
        y_f = np.sign(np.dot(X, w_f))

        #-----------------------

        # Run logistic regression
        # initialize weights for hypothesis with zeros
        eta = 0.01
        w_g = np.zeros(3)       # weight vector for hypothesis g

        # start iterations
        for t in range(10**5):
            
            # create permutation of data points
            indices = list(range(N))
            random.shuffle(indices)
            w_old = w_g

            # for each epoch
            for index in indices:
                xn = X[index, :]                 # pick a point
                yn = y_f[index]
                delta_w = -yn * xn / (1 + math.exp(yn * np.dot(w_g.T, xn)))

                # update w
                w_g = w_g - eta * delta_w

            # after epoch check how much w_g changed
            # print("t = ", t, "    diff_w = ", np.linalg.norm(w_g - w_old))
            if np.linalg.norm(w_g - w_old) < 0.01:
                break

        epoch_total += t


        
        # Generate 1000 test points to calculate E_out
        N_test = 1000
        x1_test = np.random.uniform(-1,1,N_test)                    # 1000 points
        x2_test = np.random.uniform(-1,1,N_test)
        X_test = np.array([np.ones(N_test), x1_test, x2_test]).T    # feature matrix
        
        y_f_test = np.sign(np.dot(X_test, w_f))                     # true classification
        
        # Calculate E_out via cross entropy error
        E_out = 0
        for i in range(N_test):
            E_out += math.log(1 + math.exp(-y_f_test[i] * np.dot(X_test[i,:], w_g)))
        
        E_out_total += (E_out / N_test)
    
    E_out_avg = E_out_total / RUNS
    epoch_avg = epoch_total / RUNS
    
    return (E_out_avg, epoch_avg)


E_out_avg, epoch_avg = problem8_9()
print("Average cross entropy error E_out over 100 runs: ", E_out_avg)
print("average number of epochs: ", epoch_avg)

[ 0.  0.  0.]
[ 0.28923305 -0.11578446 -0.07291226]
[ 0.52191526 -0.22327348 -0.1399627 ]
[ 0.7051756  -0.32586858 -0.2025096 ]
[ 0.85048336 -0.42434233 -0.26069845]
[ 0.97248134 -0.51730086 -0.3169606 ]
[ 1.0730929  -0.60696782 -0.37048289]
[ 1.15784211 -0.69338273 -0.42229754]
[ 1.22877602 -0.77751564 -0.47232636]
[ 1.29307465 -0.85815161 -0.52047704]
[ 1.35158071 -0.93508501 -0.56710317]
[ 1.40023707 -1.01088978 -0.61212041]
[ 1.44369872 -1.08418801 -0.65652636]
[ 1.48552155 -1.15419647 -0.69952295]
[ 1.52150616 -1.22304971 -0.74097874]
[ 1.55478124 -1.28998298 -0.78135343]
[ 1.58845007 -1.35398877 -0.82043255]
[ 1.61867032 -1.41630193 -0.85896442]
[ 1.64639846 -1.47730999 -0.8963146 ]
[ 1.67266663 -1.53662573 -0.93279438]
[ 1.69921342 -1.59363278 -0.96832118]
[ 1.72395763 -1.64938089 -1.00300261]
[ 1.74791726 -1.70356241 -1.0369993 ]
[ 1.77155346 -1.75617336 -1.07008115]
[ 1.79688983 -1.80637189 -1.10213307]
[ 1.81818205 -1.85678411 -1.13373511]
[ 1.84000308 -1.90560689 -1.16440923

So the correct answers are **8[d]** 0.100 and **9[a]** 350.