## Sigmoid Function

### As a reminder:
When our activation function is a Sigmoid function, the Perceptron takes the inputs and multiplies them by the weights in the edges and adds the results, then applies the sigmoid function. So instead of returning 1 and 0 like before, it returns values between zero and 1, such as .99 or .67 etc.  Now, it says the probabilty of a point is - 45 percent for example.

The sigmoid function is defined as sigmoid(x) = 1/(1+e-x). If the score is defined by 4x1 + 5x2 - 9 = score, then which of the following points has exactly a 50% probability of being blue or red?






In [1]:
from scipy.stats import logistic
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


def sigmoid_array(x):
    """The Sigmoid function used for binary classification in logistic regression model.
    While creating artificial neurons sigmoid function used as the activation function.
    param inputs: array
    return: sigmoid array
    """
    return 1 / (1 + np.exp(-x))

# Testing:
# print(sigmoid_array(0)) # (1,1)
# print(sigmoid_array(19)) # (2,4)
# print(sigmoid_array(-14)) # (5,-5)
# print(sigmoid_array(0)) # (-4,5)
#10x1 + 10x2 + b
# x1 + x2 + b
# 1,1
# -1,-1

print(sigmoid_array(20))
print(sigmoid_array(-20))
print(sigmoid_array(2))
print(sigmoid_array(-2))


0.9999999979388463
2.0611536181902037e-09
0.8807970779778823
0.11920292202211755


## SoftMax Function

### The SoftMax Function is equivalent to the Sigmoid Activation function, except it works with problems with 3 or more classes.

    # Softmax function calculates the probabilities distribution of the event over ‘n’ different events. In general way of saying, this function will calculate the probabilities of each target class over all possible target classes. Later the calculated probabilities will be helpful for determining the target class for the given inputs.

    # The main advantage of using Softmax is the output probabilities range. The range will 0 to 1, and the sum of all the probabilities will be equal to one. If the softmax function used for multi-classification model it returns the probabilities of each class and the target class will have the high probability.

    # The formula computes the exponential (e-power) of the given input value and the sum of exponential values of all the values in the inputs. Then the ratio of the exponential of the input value and the sum of exponential values is the output of the softmax function.



In [11]:
# Softmax Activation Function

# Write a function that takes as input a list of numbers, and returns
# the list of values given by the softmax function.
def softmax(L):
    each = [np.exp(num) / sum(np.exp(L)) for num in L]
    L = each
    
    return L
print(softmax([5,6,7]))

[0.09003057317038046, 0.24472847105479764, 0.6652409557748219]


In [12]:
# Cross Entropy with 2 classes 1 or 0
# cross entropy is the sums of the negatives of the logarithms of the probabilities of the points being their colors
# the smaller the cross entropy the better


def cross_entropy(Y, P):
    Y = np.float_(Y)
    P = np.float_(P)
    return -np.sum(Y * np.log(P) + (1 - Y) * np.log(1 - P))
Y = [1, 0, 1, 1]
P = [0.4, 0.6, 0.1, 0.5]
# print(cross_entropy(Y,P))

4.828313737302301


In [None]:
# Some helper functions for plotting and drawing lines

def plot_points(X, y):
    admitted = X[np.argwhere(y==1)]
    rejected = X[np.argwhere(y==0)]
    plt.scatter([s[0][0] for s in rejected], [s[0][1] for s in rejected], s = 25, color = 'blue', edgecolor = 'k')
    plt.scatter([s[0][0] for s in admitted], [s[0][1] for s in admitted], s = 25, color = 'red', edgecolor = 'k')

def display(m, b, color='g--'):
    plt.xlim(-0.05,1.05)
    plt.ylim(-0.05,1.05)
    x = np.arange(-10, 10, 0.1)
    plt.plot(x, m*x+b, color)

## Reading and plotting the data

In [None]:
data = pd.read_csv('data.csv', header=None)
X = np.array(data[[0,1]])
y = np.array(data[2])
plot_points(X,y)
plt.show()

## The basic functions


    Sigmoid activation function

𝜎(𝑥)=11+𝑒−𝑥

    Output (prediction) formula

𝑦̂ =𝜎(𝑤1𝑥1+𝑤2𝑥2+𝑏)

    Error function

𝐸𝑟𝑟𝑜𝑟(𝑦,𝑦̂ )=−𝑦log(𝑦̂ )−(1−𝑦)log(1−𝑦̂ )

    The function that updates the weights

𝑤𝑖⟶𝑤𝑖+𝛼(𝑦−𝑦̂ )𝑥𝑖

𝑏⟶𝑏+𝛼(𝑦−𝑦̂ )

In [13]:
# Implement the following functions

# Activation (sigmoid) function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Output (prediction) formula
def output_formula(features, weights, bias):
    return sigmoid(np.dot(features, weights) + bias)

# Error (log-loss) formula
def error_formula(y, output):
    return -y * np.log(output) - (1-y) * np.log(1-output)

# Gradient descent step
def update_weights(x, y, weights, bias, learnrate):
    output = output_formula(x, weights, bias)
    d_error = y - output
    weights += learnrate * d_error * x
    bias += learnrate * d_error
    return weights, bias


Training function

This function will help us iterate the gradient descent algorithm through all the data, for a number of epochs. It will also plot the data, and some of the boundary lines obtained as we run the algorithm.


In [None]:
np.random.seed(44)

epochs = 100
learnrate = 0.01

def train(features, targets, epochs, learnrate, graph_lines=False):
    
    errors = []
    n_records, n_features = features.shape
    last_loss = None
    weights = np.random.normal(scale=1 / n_features**.5, size=n_features)
    bias = 0
    for e in range(epochs):
        del_w = np.zeros(weights.shape)
        for x, y in zip(features, targets):
            output = output_formula(x, weights, bias)
            error = error_formula(y, output)
            weights, bias = update_weights(x, y, weights, bias, learnrate)
        
        # Printing out the log-loss error on the training set
        out = output_formula(features, weights, bias)
        loss = np.mean(error_formula(targets, out))
        errors.append(loss)
        if e % (epochs / 10) == 0:
            print("\n========== Epoch", e,"==========")
            if last_loss and last_loss < loss:
                print("Train loss: ", loss, "  WARNING - Loss Increasing")
            else:
                print("Train loss: ", loss)
            last_loss = loss
            predictions = out > 0.5
            accuracy = np.mean(predictions == targets)
            print("Accuracy: ", accuracy)
        if graph_lines and e % (epochs / 100) == 0:
            display(-weights[0]/weights[1], -bias/weights[1])
            

    # Plotting the solution boundary
    plt.title("Solution boundary")
    display(-weights[0]/weights[1], -bias/weights[1], 'black')

    # Plotting the data
    plot_points(features, targets)
    plt.show()

    # Plotting the error
    plt.title("Error Plot")
    plt.xlabel('Number of epochs')
    plt.ylabel('Error')
    plt.plot(errors)
    plt.show()

Train the algorithm!

When we run the function, we'll obtain the following:

    10 updates with the current training loss and accuracy
    A plot of the data and some of the boundary lines obtained. The final one is in black. Notice how the lines get closer and closer to the best fit, as we go through more epochs.
    A plot of the error function. Notice how it decreases as we go through more epochs.




In [None]:
train(X, y, epochs, learnrate, True)