# Likelihoods with Sigmoid Neuron

## Introduction

So far, we have learned about the hypothesis function of neuron, and saw that when the linear function returns a positive number our neuron fires, and with a negative number it does not.

<img src="neuron-general-2.png" width="50%">

Now in this lesson, we'll learn how instead of an all or nothing response, we can instead change *how strongly* a neuron fires, based on how positive or negative the output of the linear function is.

### Adding an Activation Function

So far we focused on the linear component of a neuron.  Let's see it again.  We represent a single observation -- like a potentially cancerous cell -- with a feature vector like so. 

In [1]:
import torch
# cell area is 3, and cell concavities is 4
x = torch.tensor([3, 4])

And we represent the neuron by a weight vector and the bias, like so.

In [14]:
def linear_function(x):
    w = torch.tensor([2, 1])
    b = -4
    return w.dot(x) + b

In [15]:
z = linear_function(x)
z

tensor(6)

<img src="./neuron_cancer.png" width="50%">

Now remember that this neuron fires, or doesn't fire, based on whether or not the linear component returns a positive number.  And we can represent this firing or not firing in Python with a simple `if else` statement like so.

In [9]:
def activation_function(z):
    if z > 0:
        return 1
    else:
        return 0

In [11]:
z

tensor(6)

In [12]:
activation_function(z)

1

So we can see that our neuron really has two layers to it -- the linear function which can output any positive or negative number.  And our activation function which outputs either a 1 or 0, to represent the neuron firing or not.

In [19]:
z = linear_function(x)
z

tensor(6)

In [20]:
activation_function(z)

1

## From all or nothing to probabilities

Our code is currently looking quite good, but there is one thing that we would like to change.  The activation function currently returns an all or nothing response.  But if we think about how we might use this neuron, like to predict cancer or not, it's generally preferred to express a degree of confidence in the prediction.  For example, 95% chance of cancer, or 3% chance of cancer. 

So this time, if confident in a prediction of 1, we'll have the activation function return a number *close* to 1 (like .98).  And if confident in a prediction of 0, the neuron would return a number close to 0, like .02. 

> If we think of our neuron lighting up like a lightbulb, I like to think of this as going from a simple on and off switch to a dimmer, with the brighter the light the stronger our prediction.

Pytorch has a function that will take our positive or negative output from the linear function, and turn it into a percentage, where the lower the number the closer to 0, and the higher the number the closer to 1.

In [28]:
positive_num = torch.tensor(2.5)
torch.sigmoid(num)

tensor(0.9241)

In [29]:
negative_num = torch.tensor(-2.5)
torch.sigmoid(negative_num)

tensor(0.0759)

So putting it together, our neuron will have a linear function that looks like the following.

In [37]:
def linear_function(x):
    w = torch.tensor([2, 1])
    b = -4.
    return w.dot(x) + b

And an updated activation function that uses the sigmoid, to return a number between 0 or 1.

In [38]:
def sigmoid_activation_function(z):
    return torch.sigmoid(z)

In [39]:
z = linear_function(x)
z

tensor(6.)

In [40]:
sigmoid_activation_function(z)

tensor(0.9975)

### Exploring the Sigmoid Function

So above we saw that we can use the sigmoid function to have our activation function return a number between 0 and 1, instead of an all or nothing response.  Now, the sigmoid function is a pretty popular function within mathematics, and looks like the following:

$\sigma(x) = \frac{1}{1 + e^{-x}} $

So the sigmoid function is represented by the Greek letter, $\sigma$ (sigma), and it accomplishes what we want.  That is, it brings large positive numbers close to 1, and large negative numbers close to 0.  Let's see about why.

When $z$ is a large positive number, say $1,000$, we have:

* $\sigma(1000) = \frac{1}{1 + e^{-1000}} = \frac{1}{1 + 1/e^{1000}}  = \frac{1}{1 + small\_num} \approx 1$

And when $z$ is a large negative number, we have: 

* $\sigma(-1000) = \frac{1}{1 + e^{1000}} = \frac{1}{1 + e^{1000}}  = \frac{1}{1 + big\_num} \approx 0$

Finally, when $z = 0$ we have: 

* $\sigma(0) = \frac{1}{1 + e^{0}} = \frac{1}{1 + 1} = \frac{1}{2}$ 

In [41]:
import numpy as np
def sigmoid(value):
    return 1/(1 + torch.exp(-value))

In [2]:
sigmoid(-7)
# 0.000911

sigmoid(7)
# 0.9990

sigmoid(0)
# 0.5

0.5

So our sigmoid function takes values between positive and negative infinity and maps those values to numbers between 0 and 1.  Ok, let's try using our sigmoid function as our new activation function.

### Putting it together

So with our linear function and sigmoid activation function, we have just built a **sigmoid neuron**. 

> A sigmoid neuron consists of both a linear function and an activation function of the sigmoid function.

Mathematically, it looks like the following.

$z(x) = w_1x_1 + w_2x_2 + ... w_nx_n + b  = w \cdot x + b$

Then we pass this output to our activation function -- of the sigmoid function.    

$\sigma(z) = \frac{1}{1 + e^{-z}} $

So to summarize, our sigmoid neuron is a linear function wrapped in a sigmoid function: 

In [44]:
x = x = torch.tensor([2, 1])
sigmoid_activation_function(linear_function(x))

tensor(0.7311)

Or to write it mathematically: 

$\sigma(z(x)) =  \frac{1}{1 + e^{-z(x)}} $

where $z(x) = w \cdot x + b$

* Thinking in Layers

Finally, even though we are describing a single neuron, we can think of the linear function and activation function as two different layers of the network.  To reinforce this, we could express our hypothesis function with the following:

* $z(x) = w \cdot x + b $
* $a(z) =  \frac{1}{1 + e^{-z(x)}} $

So we can think of information as flowing downwards through these two layers of the neural network.

### Summary

We have now made it to the final form of our neuron's hypothesis function.  Our artificial neuron takes in weighted inputs and now returns a value between 1 and 0.  A value of .5 means that the artificial neuron is not making a prediction one way or the other. 

We calculate the hypothesis function in two steps: 

1. A linear component, which we represent with `z` 
2. Passing the output of that linear component to our activation function

So for an sigmoid neuron that takes in two inputs, we calculate the output with the following:

$z = w_1x_1 + w_2x_2 + b$

$ \sigma(z) = \frac{1}{1 + e^{-z}} $

<center>
<a href="https://www.jigsawlabs.io/free" style="position: center"><img src="jigsaw-icon.png" width="15%" style="text-align: center"></a>
</center>