# Gradient Descent

Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. To find a local minimum of a function using gradient descent, we take steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point.

If we take the partial derivatives of the Linear Regression cost function with respect to m and b, we get

\begin{equation}
    \frac{\delta J}{\delta m} = \frac{2}{n} \sum -x_i(y_i -(mx_i + b))  
\end{equation}

\begin{equation}
    \frac{\delta J}{\delta b} = \frac{2}{n} \sum -(y_i -(mx_i + b))  
\end{equation}

The iterative steps are simply:

m = m - learning_rate * $\frac{\delta J}{\delta m}$ 

b = b - learning_rate * $\frac{\delta J}{\delta b}$ 



In [None]:
import numpy as np

def gradient_descent(x,y):
    m_curr = b_curr = 0
    iterations = 10000
    n = len(x)
    learning_rate = 0.08

    for i in range(iterations):
        y_predicted = m_curr * x + b_curr
        cost = (1/n) * sum([val**2 for val in (y-y_predicted)])
        md = -(2/n)*sum(x*(y-y_predicted))
        bd = -(2/n)*sum(y-y_predicted)
        m_curr = m_curr - learning_rate * md
        b_curr = b_curr - learning_rate * bd
        print ("m {}, b {}, cost {} iteration {}".format(m_curr,b_curr,cost, i))

x = np.array([1,2,3,4,5])
y = np.array([5,7,9,11,13])

gradient_descent(x,y)

# Logistic Regression vs Support Vector Machine

In logistic regression, we take the output of the linear function and squash the value within the range of [0,1] using the sigmoid function. The chosen class will be the one of maximum likelihood. 

SVM tries to finds the “best” margin (distance between the line and the support vectors) that separates the classes and this reduces the risk of error on the data.

SVM is deterministic while Logistic Regression is statistical.