<a href="https://colab.research.google.com/github/rahiakela/machine-learning-algorithms/blob/main/neural-networks-from-scratch/02-coding-neurons/neural_networks_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##A Single Neuron

Let’s say we have a single neuron, and there are three inputs to this neuron. As in most cases, when you initialize parameters in neural networks, our network will have weights initialized randomly, and biases set as zero to start.

The input will be either actual training data or the outputs of neurons from the previous layer in the neural network. We’re just going to make up values to start with as input for now:

In [1]:
inputs = [1, 2, 3]

Each input also needs a weight associated with it. Inputs are the data that we pass into the model
to get desired outputs, while the weights are the parameters that we’ll tune later on to get these
results. Weights are one of the types of values that change inside the model during the training
phase, along with biases that also change during training. The values for weights and biases are
what get “trained,” and they are what make a model actually work (or not work).

Let’s say the first input, at index 0, which is a 1, has a weight of
0.2, the second input has a weight of 0.8, and the third input has a weight of -0.5. 

Our input and weights lists should now be:

In [2]:
inputs = [1, 2, 3]
weights = [0.2, 0.8, -0.5]

Next, we need the bias. At the moment, we’re modeling a single neuron with three inputs. Since
we’re modeling a single neuron, we only have one bias, as there’s just one bias value per neuron.

The bias is an additional tunable value but is not associated with any input in contrast to the
weights. We’ll randomly select a value of 2 as the bias for this example:

In [3]:
inputs = [1, 2, 3]
weights = [0.2, 0.8, -0.5]
bias = 2

This neuron sums each input multiplied by that input’s weight, then adds the bias. All the neuron
does is take the fractions of inputs, where these fractions (weights) are the adjustable parameters,
and adds another adjustable parameter — the bias — then outputs the result.

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/neural-networks-from-scratch/02-coding-neurons/images/1.png?raw=1' width='800'/>

In [4]:
output = (inputs[0] * weights[0] +
          inputs[1] * weights[1] +
          inputs[2] * weights[2] +
          bias)
print(output)

2.3


What might we need to change if we have 4 inputs, rather than the 3 we’ve just shown? 

Next to
the additional input, we need to add an associated weight, which this new input will be multiplied
with.

In [5]:
inputs = [1.0, 2.0, 3.0, 2.5]
weights = [0.2, 0.8, -0.5, 1.0]
bias = 2.0

Which could be depicted visually as:

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/neural-networks-from-scratch/02-coding-neurons/images/2.png?raw=1' width='800'/>

All together in code, including the new input and weight, to produce output:

In [6]:
inputs = [1.0, 2.0, 3.0, 2.5]
weights = [0.2, 0.8, -0.5, 1.0]
bias = 2.0

output = (inputs[0] * weights[0] +
          inputs[1] * weights[1] +
          inputs[2] * weights[2] +
          inputs[3] * weights[3] +
          bias)
print(output)

4.8


<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/neural-networks-from-scratch/02-coding-neurons/images/3.png?raw=1' width='800'/>

##A Layer of Neurons

Neural networks typically have layers that consist of more than one neuron. Layers are nothing
more than groups of neurons. Each neuron in a layer takes exactly the same input — the input
given to the layer (which can be either the training data or the output from the previous layer),
but contains its own set of weights and its own bias, producing its own unique output. The layer’s
output is a set of each of these outputs — one per each neuron. 

Let’s say we have a scenario with
3 neurons in a layer and 4 inputs:

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/neural-networks-from-scratch/02-coding-neurons/images/4.png?raw=1' width='800'/>

We’ll keep the initial 4 inputs and set of weights for the first neuron the same as we’ve been using
so far. We’ll add 2 additional, made up, sets of weights and 2 additional biases to form 2 new
neurons for a total of 3 in the layer. 

The layer’s output is going to be a list of 3 values, not just a
single value like for a single neuron.

In [7]:
inputs = [1.0, 2.0, 3.0, 2.5]

weights1 = [0.2, 0.8, -0.5, 1.0]
weights2 = [0.5, -0.91, 0.26, -0.5]
weights3 = [-0.26, -0.27, 0.17, 0.87]

bias1 = 2.0
bias2 = 3.0
bias3 = 0.5

output = [
    # Neuron 1
    inputs[0] * weights1[0] +
    inputs[1] * weights1[1] +
    inputs[2] * weights1[2] +
    inputs[3] * weights1[3] + bias1,

    # Neuron 2
    inputs[0] * weights2[0] +
    inputs[1] * weights2[1] +
    inputs[2] * weights2[2] +
    inputs[3] * weights2[3] + bias2,

    # Neuron 3
    inputs[0] * weights3[0] +
    inputs[1] * weights3[1] +
    inputs[2] * weights3[2] +
    inputs[3] * weights3[3] + bias3,
]

print(output)

[4.8, 1.21, 2.385]


<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/neural-networks-from-scratch/02-coding-neurons/images/5.png?raw=1' width='800'/>

Each
neuron is “connected” to the same inputs. The difference is in the separate weights and bias
that each neuron applies to the input. This is called a fully connected neural network — every
neuron in the current layer has connections to every neuron from the previous layer.

At this point, we have only shown code for a single layer
with very few neurons. Imagine coding many more layers and more neurons. This would get
very challenging to code using our current methods. 

Instead, we could use a loop to scale and
handle dynamically-sized inputs and layers. We’ve turned the separate weight variables into a
list of weights so we can iterate over them, and we changed the code to use loops instead of
the hardcoded operations.

In [8]:
inputs = [1.0, 2.0, 3.0, 2.5]

weights = [
  [0.2, 0.8, -0.5, 1.0],
  [0.5, -0.91, 0.26, -0.5],
  [-0.26, -0.27, 0.17, 0.87]
]

biases = [2.0, 3.0, 0.5]

# Output of current layer
layer_outputs = []
# For each neuron
for n_weights, n_bias in zip(weights, biases):
  # Zeroed output of given neuron
  n_output = 0
  # For each input and weight to the neuron
  for n_input, weight in zip(inputs, n_weights):
    # Multiply this input by associated weight and add to the neuron’s output variable
    n_output += n_input * weight
  # Add bias
  n_output += n_bias
  # Put neuron’s result to the layer’s output list
  layer_outputs.append(n_output)

print(layer_outputs)

[4.8, 1.21, 2.385]


This does the same thing as before, just in a more dynamic and scalable way.

Again, all we’re doing is, for each neuron (the outer loop in the code above, over neuron weights
and biases), taking each input value multiplied by the associated weight for that input (the inner
loop in the code above, over inputs and weights), adding all of these together, then adding a bias
at the end. Finally, sending the neuron’s output to the layer’s output list.

That’s it! How do we know we have three neurons? Why do we have three?

We can tell we have
three neurons because there are 3 sets of weights and 3 biases.

When you make a neural network
of your own, you also get to decide how many neurons you want for each of the layers. You can
combine however many inputs you are given with however many neurons that you desire.

**With our above code that uses loops, we could modify our number of inputs or neurons in our
layer to be whatever we wanted, and our loop would handle it.**

It would be a disservice not to show NumPy here since Python alone doesn’t do matrix/tensor/array math very efficiently.

But first, the reason the most popular deep learning library in Python is
called **TensorFlow** is that it’s all about doing operations on tensors .

##Dot Product and Vector Addition

Let’s now address vector multiplication, as that’s one of the most important operations we’ll
perform on vectors. We can achieve the same result as in our pure Python implementation of
multiplying each element in our inputs and weights vectors element-wise by using a dot product ,
which we’ll explain shortly.

Traditionally, we use dot products for vectors (yet another name for
a container), and we can certainly refer to what we’re doing here as working with vectors just as
we can call them “tensors.”

When multiplying vectors, you either perform a dot product or a cross product. A cross
product results in a vector while a dot product results in a scalar (a single value/number).

First, let’s explain what a dot product of two vectors is. Mathematicians would say:

$$
\overrightarrow a. \overrightarrow b = \sum_{i=1}^n a_i b_i = a_1 b_1 + a_2 b_2 + ... + a_n b_n
$$

A dot product of two vectors is a sum of products of consecutive vector elements. Both vectors
must be of the same size (have an equal number of elements).


Let’s write out how a dot product is calculated in Python. For it, you have two vectors, which we
can represent as lists in Python. We then multiply their elements from the same index values and
then add all of the resulting products. Say we have two lists acting as our vectors:

In [9]:
a = [1, 2, 3]
b = [2, 3, 4]

# calculate the dot product
dot_product = a[0] * b[0] + a[1] * b[1] + a[2] * b[2]
print(dot_product)

20


Now, what if we called a “inputs” and b “weights?” Suddenly, this dot product looks like a
succinct way to perform the operations we need and have already performed in plain Python. We
need to multiply our weights and inputs of the same index values and add the resulting values
together. The dot product performs this exact type of operation; thus, it makes lots of sense to use
here.

Plain Python does
not contain methods or functions to perform such an operation, so we’ll use the NumPy package.

NumPy lets us perform this in a natural way — using the plus sign with the variables containing
vectors of the data. The addition of the two vectors is an operation performed element-wise,
which means that both vectors have to be of the same size, and the result will become a vector of
this size as well. The result is a vector calculated as a sum of the consecutive vector elements:

$$
\overrightarrow a. \overrightarrow b = [a_1 + b_1 , a_2 + b_2 , ..., a_n + b_n]
$$

## A Single Neuron with NumPy

Let’s code the solution, for a single neuron to start, using the dot product and the addition of the
vectors with NumPy. 

This makes the code much simpler to read and write (and faster to run):

In [10]:
import numpy as np

In [11]:
inputs = [1.0 , 2.0 , 3.0 , 2.5]
weights = [0.2 , 0.8 , - 0.5 , 1.0]
bias = 2.0

outputs = np.dot(weights, inputs) + bias
print(outputs)

4.8


In [12]:
outputs = np.dot([0.2 , 0.8 , - 0.5 , 1.0], [1.0 , 2.0 , 3.0 , 2.5]) + 2.0
print(outputs)

4.8


## A Layer of Neurons with NumPy

Now, we’d like to calculate the output of a layer of 3 neurons,
which means the weights will be a matrix or list of weight vectors. In plain Python, we wrote this as a list of lists. With NumPy, this will be a 2-dimensional array, which we’ll call a matrix.


The weights are now a matrix, and we need
to perform a dot product of them and the input vector. NumPy makes this very easy for us —
treating this matrix as a list of vectors and performing the dot product one by one with the vector
of inputs, returning a list of dot products.

The biases can be easily added to the result of the dot product operation as they are a vector of the
same size. We can also use the plain Python list directly here, as NumPy will convert it to an
array internally.

When we add two vectors using NumPy, each i-th element is added together, resulting
in a new vector of the same size. This is both a simplification and an optimization, giving us
simpler and faster code.

In [13]:
inputs = [1.0, 2.0, 3.0, 2.5]

weights = [
  [0.2, 0.8, -0.5, 1.0],
  [0.5, -0.91, 0.26, -0.5],
  [-0.26, -0.27, 0.17, 0.87]
]

biases = [2.0, 3.0, 0.5]

layer_outputs = np.dot(weights, inputs) + biases  # shape: (3 x 4) (4 x 1) = (1 x 3)
print(layer_outputs)

[4.8   1.21  2.385]


In [16]:
# layer_outputs = np.dot(inputs, weights) + biases # shape: (4 x 1) (3 x 4) = (1 x 3) mis-match
# print(layer_outputs)

```
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-c5d15def04c9> in <module>()
----> 1 layer_outputs = np.dot(inputs, weights) + biases # shape: (4 x 1) (3 x 4) = (1 x 3) mis-match
      2 print(layer_outputs)

<__array_function__ internals> in dot(*args, **kwargs)

ValueError: shapes (4,) and (3,4) not aligned: 4 (dim 0) != 3 (dim 0)
```

To explain the order of parameters we are passing into np.dot() , we should think of it as whatever
comes first will decide the output shape. In our case, we are passing a list of neuron weights first
and then the inputs, as our goal is to get a list of neuron outputs. 

A dot product of a matrix and a vector results in a list of dot products. The np.dot() method treats the matrix as
a list of vectors and performs a dot product of each of those vectors with the other vector.

##A Batch of Data

**To train, neural networks tend to receive data in batches.** So far, the example input data have
been only one sample (or observation ) of various features called a feature set:

In [17]:
inputs = [ 1 , 2 , 3 , 2.5 ]

Each of these values is a feature observation datum, and together they form a
feature set instance , also called an observation , or most commonly, a sample.

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/neural-networks-from-scratch/02-coding-neurons/images/5b1.png?raw=1' width='800'/>

Often, neural networks expect to take in many samples at a time for two reasons. One reason
is that it’s faster to train in batches in parallel processing, and the other reason is that batches help with generalization during training.

If you fit (perform a step of a training process) on one
sample at a time, you’re highly likely to keep fitting to that individual sample, rather than
slowly producing general tweaks to weights and biases that fit the entire dataset. 

Fitting or
training in batches gives you a higher chance of making more meaningful changes to weights and biases.

An example of a batch of data could look like:

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/neural-networks-from-scratch/02-coding-neurons/images/5b2.png?raw=1' width='800'/>

Recall that in Python, lists are useful containers for holding a sample as well
as multiple samples that make up a batch of observations. Such an example of a batch of
observations, each with its own sample, looks like:


In [18]:
inputs = [[ 1 , 2 , 3 , 2.5 ],
          [ 2 , 5 , - 1 , 2 ], 
          [ - 1.5 , 2.7 , 3.3 , - 0.8 ]]

This list of lists could be made into an array since it is homologous. Note that each “list” in this
larger list is a sample representing a feature set. `[ 1 , 2 , 3 , 2.5 ] , [ 2 , 5 , - 1 , 2 ]` , and
`[ - 1.5 , 2.7 , 3.3 , - 0.8 ]` are all samples , and are also referred to as feature set instances or
observations .

We have a matrix of inputs and a matrix of weights now, and we need to perform the dot product
on them somehow, but how and what will the result be?

Similarly, as we performed a dot product
on a matrix and a vector, we treated the matrix as a list of vectors, resulting in a list of dot
products.

In this example, we need to manage both matrices as lists of vectors and perform dot
products on all of them in all combinations, resulting in a list of lists of outputs, or a matrix; this
operation is called the matrix product.

## Matrix Product

The matrix product is an operation in which we have 2 matrices, and we are performing dot
products of all combinations of rows from the first matrix and the columns of the 2nd matrix,
resulting in a matrix of those atomic dot products:

<img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/neural-networks-from-scratch/02-coding-neurons/images/6.png?raw=1' width='400'/>

To perform a matrix product, the size of the second dimension of the left matrix must match the size of the first dimension of the right matrix.

For example, if the left matrix has a shape of (5, 4)
then the right matrix must match this 4 within the first shape value (4, 7) . The shape of the
resulting array is always the first dimension of the left array and the second dimension of the right array, (5, 7).

In the above example, the left matrix has a shape of (5, 4) , and the upper-right matrix
has a shape of (4, 5) . The second dimension of the left array and the first dimension of the second
array are both 4 , they match, and the resulting array has a shape of (5, 5).

In
mathematics, we can have something called a column vector and row vector, which we’ll explain
better shortly. They’re vectors, but represented as matrices with one of the dimensions having a size of 1:

$a =
 \begin{pmatrix}
  1 & 2 & 3 & 4
 \end{pmatrix}$

 $b =
 \begin{pmatrix}
  1 \\
  2 \\
  3 \\
  4 
 \end{pmatrix}$

 a is a row vector. It looks very similar to a vector a.

 The difference in notation between a row vector and vector are
commas between values and the arrow above symbol a is missing on a row vector. It’s called a
row vector as it’s a vector of a row of a matrix. b , on the other hand, is called a column vector
because it’s a column of a matrix. As row and column vectors are technically matrices, we do not
denote them with vector arrows anymore.

When we perform the matrix product on them, the result becomes a matrix as well, but containing just a single value, the same value as in the dot product.

$ab =
 \begin{pmatrix}
  1 & 2 & 3
 \end{pmatrix}
 \begin{pmatrix}
  2 \\
  3 \\
  4 
 \end{pmatrix} = \begin{pmatrix}
  20
 \end{pmatrix}$

 <img src='https://github.com/rahiakela/machine-learning-algorithms/blob/main/neural-networks-from-scratch/02-coding-neurons/images/7.png?raw=1' width='400'/>

 In other words, row and column vectors are matrices with one of their dimensions being of a
size of 1; and, we perform the matrix product on them instead of the dot product , which
results in a matrix containing a single value. 

In this case, we performed a matrix multiplication
of matrices with shapes `(1, 3)` and `(3, 1)` , then the resulting array has the shape `(1, 1)` or a size of `1x1`.

###Transposition for the Matrix Product