# Machine Learning Overview

In the field of machine learning, we assume that there are some connection between the input data and the output labels.

For example, the stock price of Google may be related to its performance, the ability of its CEO, global market, and its competitor, etc.

This can be modeled as a function:

$$ stock \ price \ of \ Google =  \\ F(performance, CEO, global market, competitor, ...) $$

If we somehow figure out the connection between performance, CEO ... and the stock price, we all becomes billionaires, since we all know what's tomorrow's stock price.

This naturally leads to two questions:

1. *How can we find the right input that is related to the output (stock price)?*
1. *How can we write the function in such a way that can represents this connection, instead of being a random number generator?*

For the first question, we really need to think deeper about what type of input data we are using. *For example, predicting stock price using previous stock price would be a **BAD** idea*, since the thing that makes Google so valuable is not its stock price. A bad decision of the CEO can lead to stock price decrease, but neither increase or decrease in previous stock price is related to tomorrow's stock price.

For the second questions, that's where the neuron network comes in. Since it's way to complicated for any human to measure the exact connections, we invent neuron network to do this part: we want the neuron network to find the connection by itself.

# A single neuron

A single neuron is just a function! It receives some inputs, and produce some outputs.
$$ y = f(x) $$
But as we see, there may be multiple factors that affects the output. So our neuron maybe need to take multiple inputs.
$$ y = f(x_0, x_1, x_2, \cdots, x_n) $$
This makes the function unnecessary long, and instead of thinking these parameters as separate inputs, we can think them as a group of inputs.
We represent them as a vector, which is just an array of numbers.
$$ \vec{x} = \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix} $$
In python, we can represent them as a list.
```python
inputs = [x0, x1, x2, ..., xn]
```
As we mentioned before, we assume there are some connections between the inputs and the output. So we need another vector, or list, to store these connections. We call they as 'weights'.
```python
weights = [w0, w1, w2, ..., wn]
```
Since the weights are just factor that determines the influence of a certain input, we can multiply each of them with the corresponding inputs. And we just add all the influences together.
$$ output = w_0 \cdot x_0 + w_1 \cdot x_1 + \cdots + w_n \cdot x_n = \sum_{i=0}^{n} w_i \cdot x_i $$
Mathematically, we can represent this as a dot product of two vectors.
$$ output = \vec{w} \cdot \vec{x}$$
There may also be some constant factors that affects the output (y = mx + b?). We call them 'biases'.
$$ output = \vec{w} \cdot \vec{x} + b $$

## Implementation in Python

In [None]:
# since we as human does not know the connection...
# we just random initialize them and hope these artificial neurons can find them.
import random

random.seed(0) # set the seed for reproducibility

inputs = [1, 2, 3, 4, 5] # input data are known
weights = [random.random() for _ in range(len(inputs))]
bias = random.random() # also random biases

output = sum([x * w for x, w in zip(inputs, weights)]) + bias
print(output) # 7.619020145363499

Now we already see the weakness: 5 random weights and 1 biases are probably not enough for representing the connection. We also want to collect more inputs instead of only 5.
Let's start with input data. Holding all the data in a single vector would be confusing, so we categorize them based on where they from.
```python
data_from_source_1 = [1, 2, 3, 4, 5]
data_from_source_2 = [6, 7, 8, 9, 10]
data_from_source_3 = [11, 12, 13, 14, 15]
```
But we still only want a single input instead of '$x_0, x_1, ...$', so we grab all the data vectors and put them in another vector.
```python
all_inputs = [
    data_from_source_1, 
    data_from_source_2, 
    data_from_source_3,
]
```

## Shape of Matrix

This is call a matrix. It's a list of list, or a two dimensional list.
All the list inside its parent list must be *homogeneous*, meaning they must have the same length.
We delineate the size of the matrix using shape.  
For example, *all_inputs* would have a shape of (3, 5), since there are three element in the first level and five in the second.  
Here's the mathematical representation of the matrix:
$$
all\_inputs = \begin{bmatrix}
1 & 2 & 3 & 4 & 5 \\
6 & 7 & 8 & 9 & 10 \\
11 & 12 & 13 & 14 & 15 \\
\end{bmatrix}
$$
And this matrix has 3 rows and 5 columns. Working with rows and columns could be tricky, so we use the term *axis* to refer to them.
axis 0 refers to the first level, axis 1 refers to the second level...  
Some may argue that working with axis would be even more confusing, and this is perfectly true. So we will try to avoid these terms, and just use our *shape*.
```python
shape[0] = len(matrix)
shape[1] = len(matrix[0])
```
Assume the matrix is valid (homogeneous).

## Implementation in Python

In [None]:
import pprint

class Matrix:
    def __init__(self, vector_of_vectors: list):
        self.matrix = vector_of_vectors
    
    def __str__(self):
        return pprint.pformat(self.matrix)        

    @property
    def shape(self) -> tuple[int, ...]:
        shape_list = []
        next_layer = self.matrix
        while isinstance(next_layer, list):
            if len(next_layer) == 0:
                break
            shape_list.append(len(next_layer))
            next_layer = next_layer[0]
        return tuple(shape_list)
            
matrix = Matrix([
    [1, 2, 3, 4],
    [4, 5, 6, 7],
    [7, 8, 9, 10],
])
assert matrix.shape == (3, 4)

## Shape of Vector
What about the shape of vector? Since vectors has only one dimension, we can just use a tuple with one element to represent the shape.
```python
def shape(vector) -> tuple[int]:
    return (len(vector),)
```
And the output is just simple like this:
```python
>>> shape([1, 2, 3, 4, 5])
(5,)
```

Now we have the input, it's time for our first neuron to take the input the produce some output.  
We now have three groups of inputs, for each group we just calculate the dot product between these the group and the weights, then add the bias.

In [None]:
matrix = Matrix([
    [1, 2, 3, 4],
    [4, 5, 6, 7],
    [7, 8, 9, 10],
])
weights = [random.random() for _ in range(matrix.shape[1])]
bias = random.random()
output = []
for group in matrix.matrix:
    output.append(sum([x * w for x, w in zip(group, weights)]) + bias)
print(output) # [5.16898712243865, 12.014580879206125, 18.860174635973603]

There are several improvements we can give to this implementation:
1. We can use a simpler list comprehension instead of the for loop to compute outputs.
1. We may also want a class of vector.

## Implementation

In [None]:
from __future__ import annotations
from typing import Union

class Vector:
    def __init__(self, vector: list):
        self.vector = vector
    
    def __str__(self):
        return pprint.pformat(self.vector)        

    @property
    def shape(self) -> tuple[int]:
        return (len(self.vector),)
    
    # element wise multiplication
    def __mul__(self, other: 'Vector'):
        return sum(x * w for x, w in zip(self.vector, other.vector))
    
    __rmul__ = __mul__

    # add a function for dot product
    # it's probably a bad idea to have dot function associated with 'Vector', but we will leave it here...
    # We add will add full implementation in the later python scripts
    def dot(self, other: Union['Matrix', 'Vector'], bias: float):
        if isinstance(other, Matrix):
            return Vector([sum([x * w for x, w in zip(group, self.vector)]) + bias for group in other.matrix])
        elif isinstance(other, Vector):
            return self * other + bias
        else:
            raise TypeError(f"Expected Matrix or Vector, got {type(other).__name__}.")

## Single Neuron with Vector input

In [None]:
inputs = Vector([1, 2, 3, 4, 5]) # input data are known
weights = Vector([random.random() for _ in range(inputs.shape[0])])
bias = random.random() # also random biases

output = inputs * weights + bias
print(output) # 7.619020145363499

## Single Neuron with Matrix input

In [None]:
inputs = Matrix([
    [1, 2, 3, 4],
    [4, 5, 6, 7],
    [7, 8, 9, 10],
])
weights = Vector([random.random() for _ in range(inputs.shape[1])])
bias = random.random()
output = weights.dot(inputs, bias)
print(output) # [5.16898712243865, 12.014580879206125, 18.860174635973603]

There are still things we need to consider next. A single neuron is still not enough to calculate the convoluted connection between the input and the output, but before we dive into the layer of neurons, we need to first clarify some concepts about shapes.

## Shape

As we know, each vector inside the Matrix must have the same length, or we will get an shape error like this:

```python
--------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 2
      1 import numpy as np
----> 2 np.array([
      3     [1, 2, 3, 4, 5],
      4     [6, 7, 7, 9] # incorrect shape, ERROR!
      5 ])

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
```

We also want to make sure that the length of the weights vector is the same as the length of the vectors inside the input matrix.
Think about the operation we perform:
$$
output = \begin{bmatrix}
    1 & 2 & 3 & 4 \\
    4 & 5 & 6 & 7 \\
    7 & 8 & 9 & 10 \\
\end{bmatrix}
\begin{bmatrix}
    w_1 \\
    w_2 \\
    w_3 \\
    w_4 \\
\end{bmatrix}
+ b
$$
Which is equals to:
$$
\begin{bmatrix}
    (1 \cdot w_1 + 2 \cdot w_2 + 3 \cdot w_3 + 4 \cdot w_4) + b \\
    (4 \cdot w_1 + 5 \cdot w_2 + 6 \cdot w_3 + 7 \cdot w_4) + b \\
    (7 \cdot w_1 + 8 \cdot w_2 + 9 \cdot w_3 + 10 \cdot w_4) + b
\end{bmatrix}
$$
But what if we one less weight?
$$
\begin{bmatrix}
    w_1 \\
    w_2 \\
    w_3 \\
\end{bmatrix}
$$
Then we have nothing to multiple with 4, 7, and 10!

That's why the length of the weights vector must be the same as the length of the vectors inside the input matrix, or the second value of the shape of the input matrix.

We also find that the length of the output vector is the same as the length of the input matrix, or the total number of vectors inside the input matrix.