![Practicum AI Logo image](https://github.com/PracticumAI/practicumai.github.io/blob/main/images/logo/PracticumAI_logo_250x50.png?raw=true) <img src='https://github.com/PracticumAI/deep_learning/blob/main/images/deep_learning_foundations.png?raw=true' align='right' width=50 padding=50>
***
# *Practicum AI:* Deep Learning - Perceptron


> This exercise adapted from the [W3 Schools Perceptrons](https://www.w3schools.com/ai/ai_perceptrons.asp) article and from Baig et al. (2020) The Deep Learning Workshop from [Packt Publishers](https://www.packtpub.com/product/the-deep-learning-workshop/9781839219856) (Exercise 2.01, page 55).

<img alt="A cartoon of Dr. Amelia, a nutrition researcher, sitting at a computer thinking about food items which appear in a thought bubble." src="images/DrAmelia.jpg" align="right" width=250>Amelia is back! This time, she needs your help to analyze some of her survey data. We'll use a simple [perceptron](https://developers.google.com/machine-learning/glossary#perceptron) to predict if patients follow her special Dr. Amelia Recommended Nutrition Plan (the DARN Plan!). 

**Note:** Dr. Amelia's cartoon was generated with AI's assistance.
 
As a note, this exercise lies somewhere between coding everything from scratch and relying on the pre-coded APIs (Application Programming Interfaces) that underlie the power of TensorFlow, Keras, and Pytorch. **You will not need to create weight tensors beyond this exercise**. Still, hopefully, by doing it this time, you will have a better understanding (*and appreciation*) of the details often lost in an API call to `model.fit()`, for example.

The table below shows some data Amelia's gathered from patient surveys about their nutrition. She's looking at how different factors predict if patients follow her DARN Plan ($y$, the output or [labels](https://developers.google.com/machine-learning/glossary#label) in our example) based on three input variables: if patients submit photos of three meals a day ($x_1$), if patients report being satisfied with their food choices ($x_2$), and if patients report being generally happy  ($x_3$). We will combine $x_1$, $x_2$, and $x_3$ into our input tensor $X$. Here, we are simplifying the question of the likelihood of following the DARN Plan to a Yes/No. 

Case # | Photos of 3 meals submitted? ($x_1$) | Satisfied with food choices? ($x_2$) | Generally happy? ($x_3$) | Following the DARN Plan? ($y$)
--|--------------------------|---------------------|-----------------------|----------------
1 | 1 (Yes) | 1 (Yes) | 1 (Yes) | Yes (1)
2 | 0 (No) | 1 (Yes) | 1 (Yes) | Yes (1)
3 | 1 (Yes) | 0 (No) | 1 (Yes) | Yes (1)
4 | 0 (No) | 0 (No) | 1 (Yes) | Yes (1)
5 | 1 (Yes) | 1 (Yes) | 0 (No) | Yes (1)
6 | 0 (No) | 1 (Yes) | 0 (No) | No (0)
7 | 1 (Yes) | 0 (No) | 0 (No) | No (0)
8 | 0 (No) | 0 (No) | 0 (No) | No (0)


## 1. Import libraries

In [None]:
import tensorflow as tf
import pandas as pd
from tensorflow.keras import activations

from matplotlib import pyplot as plt

## 2. Create an input data matrix

Create a 3 x 8 matrix for our input data. Remember that we have three inputs (we'll call them $x_1$, $x_2$, and $x_3$ for now), these are the columns in our input data.

The matrix below has the three input columns of our data table, using just the 0/1 values corresponding to the no/yes entries in the table. The comments help line up rows of the table with entries in our `X` variable. (Remember, we are using the capital letter `X` as our variable name here to remind us that this is a matrix with our input data).

We'll probably stop reminding you after this, but...remember not all red output is bad!

In [None]:
X = tf.Variable([[1.,1.,1.], # Case 1
                 [0.,1.,1.], # Case 2
                 [1.,0.,1.], # Case 3
                 [0.,0.,1.], # Case 4
                 [1.,1.,0.], # Case 5
                 [0.,1.,0.], # Case 6
                 [1.,0.,0.], # Case 7
                 [0.,0.,0.]], # Case 8
                 dtype = tf.float32)  # 3x8, input data table
print(X)


## 3. Create a label tensor

Create a tensor of labels to hold our 'ground truth'. This is the decision for each set of input whether or not the patient is following the DARN Plan. 

```python
# Outputs:       1, 2, 3, 4, 5, 6, 7, 8--one for each case in the table         
y = tf.Variable([1, 1, 1, 1, 1, 0, 0, 0], dtype = tf.float32) 

y = tf.reshape(y, [8,1]) # Reshape to be 8 rows of 1 column  
print(y)
```

In [None]:
# Code it!


## 4. Define some constants to set the shape of the weight matrix

Define two constants to be used in the next step when we define the connections weight matrix.

We can use the number of columns in the X table to determine the number of features or how many $x_i$'s we have and, therefore, how many weights we need to store (one for each feature). We only need one output value since we are looking for a binary decision about good turnout (Yes/No).

```python
num_features = X.shape[1]
output_size = 1
```

In [None]:
# Code it!


***

## 5. Define connections weight matrix

![Diagram of the perceptron with 3 input variables (x1, x2, x3), 3 weights (w1, W2, w3) and the bias term. The perceptron body multiplies the inputs by the weights and sums them and the bias, resulting in the output--whether or not the patient is following the DARN Plan. The three weights are highlighted here.](images/02_perceptron_section5.png)

In our feature matrix, we will need one weight for each feature, $x_i$ (three photos submitted, satisfied with food choices, etc.), labeled $X$. These weights are our $w_i$'s. We don't know what value they should take so that we can initialize them to 0. Another common option is to use a random number to initialize the weights--this is one reason different runs of model training may give different answers.

```python
W = tf.Variable(tf.zeros([num_features, output_size]), dtype = tf.float32)
print(W)
```

In [None]:
# Code it!


***

## 6. Define bias variable

![Diagram of the perceptron with 3 input variables (x1, x2, x3), 3 weights (w1, W2, w3) and the bias term. This is similar to the above image, but is highlighting the bias term](images/02_perceptron_section6.png)

Since we only have one neuron, we only need one bias value. Again, we'll initialize it to 0--a random number would be another option here. We can write each bias term as $b_i$ and the matrix of all biases as $B$.

```python
B = tf.Variable(tf.zeros([output_size, 1]), dtype = tf.float32)
print(B)
```

In [None]:
# Code it!


***

## 7. Define a perceptron function

![Diagram of the perceptron with 3 input variables (x1, x2, x3), 3 weights (w1, W2, w3) and the bias term. This is similar to the above image, but is highlighting the perceptron body.](images/02_perceptron_section7.png)


In the following code block, we define a perceptron function with one input argument, $X$, containing our three input data features. 

The function's first line implements a net input function.  It multiplies the input data matrix ($X$) by the weights ($W$) using the matrix multiplication function (matmul).  It then adds the bias ($B$) value to that product.

### <img src='images/note_icon.svg' width=40, align='center' alt='Note icon'>Note
> This is the essential function of a neuron: gather the inputs, multiply each input by the weight for that input, add the products up and add in the bias.

The function's second line implements an activation function. The activation function determines how the neuron's output (calculated above) is changed before passing it on. Here, we use the `tanh` activation function.  However, there are other TensorFlow options.  For example, you could use the `tf.sigmoid` function.  Or, select a function from the Keras activation (`activations`) library.  Search the [Keras documentation](https://keras.io/api/layers/activations/) for a complete list of available functions.

Try out these other options, retrain the network, and see what happens.

```python
output = tf.sigmoid(z)
output = activations.relu(z)
output = activations.linear(z)
```

In [None]:
def perceptron(X):
    z = tf.add(tf.matmul(X, W), B)  # Net input function
    output = tf.tanh(z)             # Activation function
    return output

Execute the perceptron function to see its initial predictions before any training.  All of its predictions ought to be 0 (remember we set all the weights and the bias to 0--so whatever the inputs are, they are all multiplied by 0 and have 0 added to the sum). 

In [None]:
print(perceptron(X))        # Execute the perceptron to see its initial predictions before training.

## 8. Training the Perceptron

Now that we have the elements of a simple, single-node perceptron in place, let's train the network using backpropagation. The optimizer algorithm implements the backpropagation, so we don't need to code that ourselves.

The [learning rate](https://developers.google.com/machine-learning/glossary#learning-rate) determines the size of the steps taken towards the global minimum, while the optimizer manages the weight update process during backpropagation.  Here, the Stochastic Gradient Descent (SGD) optimizer has been selected.

In [None]:
learning_rate = 0.01
optimizer = tf.optimizers.SGD(learning_rate)

## 9. Train the perceptron for 1000 epochs

An [epoch](https://developers.google.com/machine-learning/glossary#epoch) is a complete training pass over the entire dataset.  Our loss or error function is defined as a lambda function (a single-line, inline function) in the first line of code in the loop block.  We use the `sigmoid_cross_entropy_with_logits` function, an appropriate choice for this application, to calculate how far our predicted results are from the known results. We will not get into the technical details here as that is outside the scope of this learning experience. Our SGD optimizer seeks to minimize the model's total error in the second line.

### <img src='images/note_icon.svg' width=40, align='center' alt='Note icon'>Note
> The code below uses a `for` loop. This common programming construct allows you to loop, or iterate, through
> a list of items (the numbers 0 to 999 in our case). *Implicitly*, training will use `for` loops--for each epoch do 
> this thing. *Explicitly*, however, after this notebook, we will use the API that automatically does this for us.
> Thus we dropped coverage of `for` loops and other "flow control" methods from the *Python for AI* course. It's
> helpful to know about them, but they are rarely used explicitly in AI research.
> [Click here for more details](https://wiki.python.org/moin/ForLoop).
>
> The block also uses a special Python function called a `lambda` function. These are functions that can be 
> written as a single line of code. [Click here for more
> details](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions).

In [None]:
no_of_epochs = 1000

for n in range(no_of_epochs):
    loss = lambda:abs(tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels = y, logits = perceptron(X))))
    optimizer.minimize(loss, [W, B])

## 10. Print the weights
<img alt="AI Generated cartoon of happy people eating healthy food." src="images/happy_people.jpg" align="right" width="300">

Notice that the model has learned that the general happiness of a patient is the best predictor of whether or not they are following the DARN Plan! Of the weights, the 3rd one has the largest value.

Given that the input from each feature will be a 0 or a 1, multiplying by a larger weight will increase the contribution of that feature in the summation of all input-by-weight products ($x_i * w_i$) in determining the output of the neuron.

The perceptron has learned how to take the three input variables and weigh them to predict the output. 

**Note:** The image was generated with AI's assistance.

```python
print(W)
```

In [None]:
# Code it!


## 11. Print the bias

```python
print(B)
```

In [None]:
# Code it!


## 12. Test the perceptron

The numbers in the output tensor reflect the perceptron's predictions for each input case. These are not probabilities but the **model's estimate of the output value**. We could set a threshold value and conclude the patient is following the DARN Plan when the value exceeds some number.

```python
print(perceptron(X))
```

In [None]:
# Code it!

### Print things more clearly

Let's bring the `X`, `y` and predictions together to make it easier to read. Remember that `Yes=1` and `No=0` in the table.

In [None]:
X_df = pd.DataFrame(X.numpy(), columns=['Photos of 3 meals submitted?', 'Satisfied with food choices?', 'Generally happy?'])
y_df = pd.DataFrame(y.numpy(), columns=['Following the DARN Plan?'])
pred_df = pd.DataFrame(perceptron(X).numpy(), columns=['Predictions'])
df = pd.concat([X_df, y_df, pred_df], axis=1)
df

## 13. Let's see how different choices would change the results

Let's change the patient outcomes and see what happens to the learned weights and predictions. 

### Change 1: Patients are more likely to follow the DARN Plan when they like the food choices:

`y = tf.Variable([1, 1, 0, 0, 1, 0, 0, 0], dtype = tf.float32)`

### Change 2: Patients are more likely to follow the DARN Plan when they regularly submit three photos a day:

`y = tf.Variable([1, 0, 1, 0, 1, 0, 1, 0], dtype = tf.float32)`

Feel free to play with other parts of the model; everything but the X inputs is replicated below to put it all in one place for easy reference. Comments point out hyperparameters that you might want to change.

In [None]:
## From step 3
# Outputs:       1, 2, 3, 4, 5, 6, 7, 8--one for each case in the table         
y = tf.Variable([1, 1, 0, 0, 1, 0, 0, 0], dtype = tf.float32) # Change 1 has been made, you'll need to make change 2
y = tf.reshape(y, [8,1])  # convert to 4x1

## From step 4
num_features = X.shape[1]
output_size = 1

## From step 5
W = tf.Variable(tf.zeros([num_features, output_size]), dtype = tf.float32)

## From step 6
B = tf.Variable(tf.zeros([output_size, 1]), dtype = tf.float32)

## From step 7
def perceptron(X):
    z = tf.add(tf.matmul(X, W), B)      
    output = tf.tanh(z)                  # Activation function is a good hyperparameter to change 
    return output

## From step 8
learning_rate = 0.01  # Learning rate is a good hyperparameter to change
optimizer = tf.optimizers.SGD(learning_rate)

## From step 9
no_of_epochs = 1000 # Number of epochs is a good hyperparameter to change

for n in range(no_of_epochs):
    loss = lambda:abs(tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels = y, logits = perceptron(X))))
    optimizer.minimize(loss, [W, B])
    
## From steps 10 on, printing the output
print(f'Weights: {W}')
print(f'Bias: {B}')

X_df = pd.DataFrame(X.numpy(), columns=['Photos of 3 meals submitted?', 'Satisfied with food choices?', 'Generally happy?'])
y_df = pd.DataFrame(y.numpy(), columns=['Following the DARN Plan?'])
pred_df = pd.DataFrame(perceptron(X).numpy(), columns=['Predictions'])
df = pd.concat([X_df, y_df, pred_df], axis=1)
df