In [54]:
import numpy as np

import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')

from sklearn.datasets import make_blobs

In [55]:
def perceptron(X,W,b):
    inputs = np.array(X)
    weights = np.array(W)
    bias = b
    
    prediction = inputs.dot(weights) + b
    
    return prediction

In [56]:
def sigmoid(X):
    probabilities = []
    for i in X:
        sigmoid_function = 1 / (1 + np.exp(-i))
        probabilities.append(sigmoid_function)
        
    return probabilities
    

Now we are going to have more than one point (2, -10) that we are going to test. Our goal is to see if we can improve the line we manually created earlier.

In [57]:
X = [[2, -10], [2,10],[-5,-4],[-3,5]]
W = [.38, 0.65]
b = 3

In [58]:
perceptron(X, W, b)

array([-2.74, 10.26, -1.5 ,  5.11])

Now we get an array of predictons for each of the 4 points we entered. 

In [59]:
probability = sigmoid(perceptron(X, W, b))
probability

[0.060653903315652125,
 0.9999649955375166,
 0.18242552380635635,
 0.9940001327725162]

Anything below 50% in this example is red. So this is very red at only 6%. 

So our points should be red, blue, red, blue

We now want the model that is most correct. By this I mean, we want each prediction to highly probably, or highly improbable. If a model predicts an event it is correct about at 55%, versus another model for the same event (and being correct) at 90%, the 90% is most likely the better model. So we look at these probabilities for each point, and then see which is the best (most, highest probability predictions).

For this we use the natural log of the probabilities so we can add them together rather than multiplying.

In [60]:
def max_like(x):
    p = np.array(x)
    ml = 0
    
    for i in p:
        ml += -np.log(i)
    return ml

In [61]:
max_like(probability)


4.510037509423813

Similiar to maximizing probability, when we take natural log the number gets smaller. So now we want to minimize what is called cross-entropy (sum of the negative natural logs of the sigmoid activations of the predictions).

One other thing to note is that is the errors need to be related to if the predictions are correct or not. Until now, we have assumed that the array of predictions was correct, that is $[0,1,0,1]$. What if it wasn't? In this case we would need to include that in our calculation of errors.

We can do that by multiplying the classification of 0 or 1 with the natural log. 

$$ CE = - \sum_{i=1}^{m} y_i\ln(p_i) + (1-y_i)\ln(1-p_i) $$

In the above example, if the prediction is correct (1) then the second term will 0 our and we get the - log of the probability. If the correct prediction is 0, the first term will 0 and we get the second term - giving us the still the - log of the probability on the other side.

In [62]:
def cross_entropy(Y, P):
    Y = np.float_(Y)
    P = np.float_(P)
    
    ce = -np.sum(Y * np.log(P) + (1-Y) * np.log(1-P))
    
    return ce

In [63]:
cross_entropy([0,1,1,1],probability)

1.7700375094238117

In practice, we usually view cross entropy as a type of average. So let's adjust our calculation.

In [64]:
def cross_entropy(Y, P):
    Y = np.float_(Y)
    P = np.float_(P)
    m = len(Y)
    
    ce = -(1/m)*np.sum(Y * np.log(P) + (1-Y) * np.log(1-P))
    
    return ce

In [65]:
cross_entropy([0,1,1,1],probability)

0.4425093773559529

Something to note is that the probabilities are actually sigmoid functions of the predictions. So you can view the cross entropy as 
$$ CE = -\frac{1}{m} \sum_{i=1}^{m} y_i\ln(\sigma(Wx^{(i)}+b) + (1-y_i)\ln(1-\sigma(Wx^{(i)}+b)) $$

Now we can adjust weights in order to try to lower cross entropy

In [66]:
W = [.37, 0.68]
probability = sigmoid(perceptron(X, W, b))
cross_entropy([0,1,1,1],probability)

0.4524701784215939

This adjustment made the model worse with a higher cross entropy.