In [2]:
import numpy as np

This text introduces the basic building blocks of neural networks by explaining how a neuron uses inputs, weights, and a bias to produce an output, and then connects these concepts to linear algebra principles such as vectors, matrices, and dot products, with practical examples using NumPy.

Here we have a neuron which looks like a 2D version of the Sputnik satellite. The inputs are our data (famous numbers btw, extra credit if you know what they are **without** looking them up), Weights are the things we 'tweak', and Bias is another number we can tweak to get the desired output.

![NEURON](neuron.png) 

In [1]:
inputs = [3.141, 2.1718, 1.618]
weights = [.32, .37, .62]
bias = 1.38

The inputs map directly to each weight.

And doing some simple elementary school multiplication and addition we get an output. We're skipping the Activation function for now.

In [3]:
output = (inputs[0]*weights[0]+inputs[1]*weights[1]+inputs[2]*weights[2] + bias)
print(output)

4.191846


Good for something simple like this, but scale it up to hundreds of millions/billions, that's way too much typing.

So, NumPy and Linear Algebra to the rescue!

Python has lists like our inputs. Linear Algebra also has lists but name them vectors instead. But there is a difference between a row vector and a column vector. A row vector is a 1xN matrix (a 1-dimensional array of numbers) [3.141, 2.1718, 1.618]. A column vector is an Nx1 matrix (a list of numbers arranged vertically).\
3.141 \
2.1718 \
1.618

A scalar in Linear Algebra is just a number like our bias, 1.38. The poor little thing has no direction or dimension, also called a 0-dimensional number, or a 0th-order tensor. It just sits there waiting to be called upon. Vectors on the other hand have both magnitude and direction. They got things to do! A vector could be the speed and direction of a baseball pitchers fastball. Our inputs and weights are three-dimensional vectors because they have three numbers. There's no limit to the amount of numbers a vector can contain. Typically denoted as n-dimensional.

In NumPy, a row vector can be represented as a 1-dimensional array, `np.array([3.141, 2.718, 1.618])`, and a column vector as a 2-dimensional array, `np.array([[3.141], [2.718], [1.618]])`. Inputs and weights can be 1-dimensional arrays representing vectors, but these are treated as 1D arrays in NumPy. Vectors in 3D space ([x,y,z]) are represented as 1D arrays in NumPy, though they're conceptually "3-dimensional."

Lots of things can be done to vectors: addition, subtraction, scaling, dot product. You've made it out of elementary school I assume, so you're familiar with the first two, scaling changes the magnitude/length of the vector. When referring to vectors those two words are synonymous. But scaling does not change the direction (with one exception). To scale a vector it is **multiplied** by a scalar. Note the difference, a bias shifts the value of the output by a fixed amount but doesn't scale the vector.

Scaling:
- Changes the magnitude (length) of a vector.
- Does not change the direction of the vector if the scalar is positive.
- Flips the direction of the vector if the scalar is negative (reverses direction).
- Affects every component of the vector by multiplying them by the scalar.

Example: 
- 2⋅[3,4]=[6,8] (Magnitude doubles, direction remains the same).

Example: 
- −2⋅[3,4]=[−6,−8] (Magnitude doubles, direction flips).

Bias:
- Shifts the output of a computation, but does not change the magnitude or direction of a vector.
- Adds a constant value to the weighted sum of inputs, providing an offset.
- Does not affect the vector's components directly; it only changes the output after the weighted sum.
- Used in neural networks to allow for flexibility in learning and threshold adjustments.

**Dot product** is what was used to calculate the output earlier 4.191846. Take two scalars that have the same number of elements in them, multiply each element, the add to the next two multiplied elements, rinse and repeat. It's easier to visually see what's going on vs. attempting to describe it written out. `(inputs[0]*weights[0] + inputs[1]*weights[1] + inputs[2]*weights[2])`. All of that became one number 4.191846 which is what? That's right, a scalar.

Let's use NumPy to do the exact same thing.

In [5]:
np_outputs = np.dot(inputs, weights) +  bias
print(np_outputs)

4.191846


When it comes to multiplying a vector * vector the commutative property can be used. That is **not** the case when multiplying a vector * matrix or matrix * matrix. 

## Vector * Matrix

a = $$\left[\begin{array}{ccc} 1 & 2 & 3 \end{array}\right]$$
b = $$\left[\begin{array}{ccc} 4 & 5 & 6 \\  7 & 8 & 9 \\ 10 & 11 & 12 \end{array}\right]$$

'a' is 1 row of 3 columns, 1x**3**. 'b' is 3 rows and 3 columns, **3**x3. A vector, only having one row is multiplied with each column in the 'b' matrix. So, the number of elements in the vector, 3 in this case **must** match the number of columns, also 3, in the matrix in order for the math to work.

np.dot(a,b) = each number in 'a' row 1, which is 3 columns with one row is multipied with each column in matrix 'b'. The output is the opposite of how the dot product is multiplied, so the output will be a 1 x 3 vector in this case.

np.dot (1x4)+(2x7)+(3x10) \
np.dot (1x5)+(2x8)+(3x11) \
np.dot(1x6)+(2x9)+(3x12) = [48 54 60]
$$\left[\begin{array}{ccc} 48 & 54 & 60 \end{array}\right]$$

What if we did np.dot(b,a) would the result be the same? First, remember that the first element in a dot product you **must** consider the rows, and the second element the columns. 'a' only has one column in this case, so they cannot be multiplied together currently. BUT, if we were to make 'a' vector, currently 1 row and 3 columns and turn it into 1 column and 3 rows then it would work. $$\left[\begin{array}{ccc} 4 & 5 & 6 \\  7 & 8 & 9 \\ 10 & 11 & 12 \end{array}\right]$$ $$\left[\begin{array}{c} 1 \\ 2 \\ 3 \end{array}\right]$$

np.dot(4x1)+(5x2)+(6x3) \
np.dot(7x1)+(8x2)+(9x3) \
np.dot(6x1)+(9x2)+(12x3) = $$\left[\begin{array}{c} 32 \\ 50 \\ 68 \end{array}\right]$$

## Matrix * Matrix

Columns of the first matrix must equal the rows of the second matrix. The output will be the number of rows of the first matrix equaling the number of columns of the second matrix. Confusing I know. $$\begin{bmatrix}
a_{11} & a_{12}\\ a_{21} & a_{22}\\ a_{31} & a_{32}\\
\end{bmatrix}$$ $$\begin{bmatrix}b_{11} & b_{12}\\ b_{21} & b_{22}\\\end{bmatrix}$$

'a' is a 3x2 matrix, and 'b' is a 2x2 matrix. The output will be a 3x2 matrix, 'a' has 3 rows, and 'b' has 2 columns.

$$\left[\begin{matrix}
a_{11}&b_{11} + a_{12}&b_{21} & a_{11}&b_{12} + a_{12}&b_{22} \\
a_{21}&b_{11} + a_{22}&b_{21} & a_{21}&b_{12} + a_{22}&b_{22} \\
a_{31}&b_{11} + a_{32}&b_{21} & a_{31}&b_{12} + a_{32}&b_{22}
\end{matrix}\right]$$

In a dense layer of the network, when you process a batch of inputs, the matrix (batch of inputs) is multiplied by the weight matrix (which is the same for all data points in the batch). Here's the skinny:

Let the input matrix 𝑋 have a shape of 𝐵×𝑁, where:𝐵 is the batch size (number of data points in the batch), 𝑁 is the number of features (input dimension of each data point).The weight matrix 𝑊 has a shape of 𝑁×𝑀, where:𝑁 is the number of features in the input, 𝑀 is the number of neurons in the layer.The result of multiplying the input matrix 𝑋 by the weight matrix 𝑊 gives you an output matrix Y:**𝑌=𝑋⋅𝑊**


To clarify,
- **N** is the number of features in the input, which would correspond to columns in a spreadsheet for example. Each column corresponds to one feature (like closing price, opening price, volume, etc.) for each data point (day).
- **B** is the batch size (number of data points), would correspond to say a years worth of daily stock price data for example, one day for each row. If you're using batch processing during training, you might process a batch of multiple consecutive days (or even randomly sampled days) at once. For example: If your batch size 𝐵 is 32, the network will process 32 days' worth of data (32 rows) at a time during one training iteration.
- **M** is the number of neurons in a particular layer of the neural network, and it refers to the layout or structure of the network, not directly to the data itself.
While B (the batch size) and N (the number of features) are directly related to the structure of the data, M (the number of neurons) is related to the internal structure of the network itself and how it processes that data.

## Tranpose a matrix

So, what happens if our rows and columns do not match up nicely? You kick it in the shins, and while it's hopping around in pain, put a judo throw on it and **Transpose** it to comply. In other words, taking a 2x3 matrix and turning it into a 3x2 matrix. This often happens in backpropogation.

You'll often see vector notation written as: $$\vec{v}$$

If we are multiplying two vectors and one needs to be transposed it would look like this: $$\vec{a} \ast \vec{b} = ab ^{T} $$

Row vectors:
a=[1 2 3] * 𝑏=[4 5 6]. the transpose of 𝑏, denoted<sup>𝑇</sup> turns the row vector into a column vector:$$\begin{bmatrix}4 \\5\\6\end{bmatrix}$$

Now, the dot product between 𝑎 (a row vector) and 𝑏<sup>𝑇</sup>(a column vector) is computed as:𝑎⋅𝑏<sup>𝑇</sup>=(1⋅4)+(2⋅5)+(3⋅6) \
=4+10+18=32=4+10+18=32 \
So, the scalar from the multiplication of these two vectors is 32. \
Written in NumPy format `a = np.array([a])` `b = np.array([b]).T`

Another minor notation note. A **row vector** is typically written as a 1 × 𝑛 matrix, which is a single row containing *n* number of  column elements: $$\left[\begin{array}{ccc} 1 & 2 & 3 \end{array}\right]$$(this is a 1 × 3 row vector). In NumPy it would be written as `np.array([[1, 2, 3]])`

A **column vector** is written as an 𝑛 × 1 matrix, which is a single column containing *n* number of row elements, e.g., a column vector would look like our *b* vector above, which is a 3 x 1 vector.

In the case of the dot product for the example, the row vector 𝑎 (1 × 3) is multiplied by the column vector 𝑏 (3 × 1), resulting in a scalar (1 × 1 matrix).

Now, back to kicking shins, and judo throws. If we now have two Sputnik satellites in the cosmos with the following inputs and weights: \
$$inputs=
\begin{bmatrix}
3.141, 2.718, 1.618 \\
1.0, 2.0, 3.75
\end{bmatrix}

weights=
\begin{bmatrix}
0.32, 0.37, 0.62 \\
0.19, 0.43, 0.58
\end{bmatrix}$$

Both are a 2x3 matrix, so we cannot mulitply them together unless we transpose one of them. Transposing the weights, we get:
$$inputs=
\begin{bmatrix}
3.141, 2.718, 1.618 \\
1.0, 2.0, 3.75
\end{bmatrix}

weights=
\begin{bmatrix}
0.32, 0.19 \\
0.37, 0.43 \\
0.62, 0.58
\end{bmatrix}$$

inputs is a 2x3 matrix and weights is a 3x2 matrix, so the resulting output will be a 2x2 matrix because we take the row of the first matrix * the column of the second matrix.

In [5]:
inputs = [[3.141, 2.718, 1.618], [1.0, 2.0, 3.75]]
weights = [[0.32, 0.37, 0.62], [0.19, 0.43, 0.58]]
outputs = np.dot(inputs, np.array(weights).T)
outputs

array([[3.01394, 2.70397],
       [3.385  , 3.225  ]])

The long hand version: 
- First row, first column (resulting value at 𝐶<sub>11</sub>):​\
=(3.141×0.32)+(2.718×0.37)+(1.618×0.62) \
𝐶11 = 1.004 +1.006 +1.003 =**3.013**

- First row, second column (resulting value at 𝐶<sub>12</sub>): \
𝐶12 = (3.141×0.19)+(2.718×0.43)+(1.618×0.58) \
𝐶12 = 0.597 +1.167 +0.938 =**2.702**

- Second row, first column (resulting value at 𝐶<sub>21</sub>): \
𝐶21 = (1.0×0.32)+(2.0×0.37)+(3.75×0.62) \
𝐶21 = 0.32 +0.74 +2.325 =**3.385**

- Second row, second column (resulting value at 𝐶<sub>22</sub>): \
𝐶22=(1.0×0.19)+(2.0×0.43)+(3.75×0.58)  
𝐶22=0.19+0.86+2.175=**3.225**

to summarize, the text (hopefully) demonstrates how neurons compute outputs by performing dot products between inputs and weights, incorporating a bias for adjustment, and clarifies key linear algebra operations—like vector scaling and matrix multiplication—while showcasing their application in machine learning through code examples. These are my notes and trying to organize my thoughts.