## Logistic Regression

Main Idea is to just code an iteration of gradient descent myself.<br>
Forward Pass:<br>
$X(n,m)$ -> m examples each of size n stacked as column vectors $X:=[x_n^1,x_n^2,...x_n^m]$ <br>
$Y(1,m)$ -> m responses <br>
$Z(1,m)$ -> $Z:=[w^T.x^1+b,w^T.x^2+b....w^T.x^m+b]$. So we take dot product of weight vector $w$(n,1)  and add bias (b)<br>
$A(1,m)$ -> $A:=[\theta(z^1),\theta(z^2)....\theta(z^m)]$. These are our predictions<br>
$y' = \theta(z) = \frac{1}{1 + e^{-z}}$.<br>
$\frac{dy}{dz} = \frac{d\theta(z)}{dz} = \frac{e^{-z}}{{(1+e^{-z})^2}} = \theta(z)(1-\theta(z))$ <br>
$\frac{dA}{dz} = [\frac{da^1}{dz}......\frac{da^m}{dz}] = [\theta(z^1)(1-\theta(z^1))...\theta(z^m)(1-\theta(z^m)]$<br><br>
$\frac{dy}{dw_1} = \frac{dy}{dz}.\frac{dz}{dw_1}$ = $\theta(z).\theta(1-z).x_1$<br><br>
$\frac{dy}{db} = \frac{dy}{dz}.\frac{dz}{db} = [\frac{dy}{dz}....\frac{dy}{dz}]$


In [2]:
import numpy as np

In [3]:
def Z(w,x,b):
    return np.dot(w,x) + b

In [4]:
def theta(z):
    return 1/(1+np.exp(-z))

In [5]:
def dz(z):
    return theta(z)*(1-theta(z))

In [9]:
def gd(w,b,X,Y,alpha):
    Z = (np.sum(w*X,axis=1) + b).reshape(X.shape[0],1)
    A = theta(Z)
    dz = A - Y
    dw = (np.sum(dz*X,axis=0).reshape(w.shape[0],w.shape[1])) # OG says dw = X*dz_transpose
    db = np.sum(dz)
    w = w - (alpha*dw)
    b = b - (alpha*db)

In [10]:
w = np.array([[1,1,1]])
print (w.shape)
X = np.array([[1,1,1],[1,1,1],[1,1,1],[2,2,2]])
print (X.shape)
b = 0.1
Z = (np.sum(w*X,axis=1) + b).reshape(X.shape[0],1) 
print (theta(Z)) 
Y = np.array([[0],[0],[0],[1]])
dz = theta(Z) - Y
print (w - (np.sum(dz*X,axis=0).reshape(w.shape[0],w.shape[1])))
print (np.sum(dz))
alpha = 1.0

(1, 3)
(4, 3)
[[ 0.95689275]
 [ 0.95689275]
 [ 0.95689275]
 [ 0.99776215]]
[[-1.86620254 -1.86620254 -1.86620254]]
2.86844038666


In [11]:
gd(w,b,X,Y,alpha)