In [None]:
'''
 * Copyright (c) 2004 Radhamadhab Dalai
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
'''

# Chapter 1 : 1.2 Tensors (The Krypton Knights)  

What are “tensors?”
Tensors are ​ closely-related to ​ arrays. If you interchange tensor/array/matrix when it comes to
machine learning, people probably won’t give you too hard of a time. But there are subtle
differences, and they are primarily either the context or attributes of the tensor object. To
understand a tensor, let’s compare and describe some of the other data containers in Python
(things that hold data). Let’s start with a list. A Python list is defined by comma-separated
objects contained in brackets. So far, we’ve been using lists.
This is an example of a simple list:

In [None]:
## What are Tensors?

In machine learning, **tensors** are closely related to **arrays** and **matrices**. If you use the terms tensor, array, or matrix interchangeably, most people won’t mind, but there are subtle differences. The primary differences are in the **context** of their use and the **attributes** of the tensor object.

To understand what a tensor is, let’s first review some of the more familiar data containers in Python, like lists.

### Python List

A **list** in Python is defined by comma-separated values (elements) enclosed in square brackets. A list can hold any type of object — numbers, strings, other lists, etc. For example:

```python
my_list = [1, 2, 3, 4, 5]


In [2]:
# This is an example of a simple list:
l  =  [  1  ,  5  ,  6  ,  2  ]
# list of lists:
lol  =  [[ 1  ,  5  ,  6  ,  2  ],
[ 3 ,  2  ,  1  ,  3  ]]
# list of lists of lists!
lolol  =  [[[ 1  ,  5  ,  6  ,  2  ],
[ 3  ,  2  ,  1 ,  3  ]],
[[ 5  ,  2  ,  1  ,  2  ],
[ 6  ,  4  ,  8  ,  4  ]],
[[ 2  ,  8  ,  5  ,  3  ],
[ 1  ,  1  ,  9  ,  4  ]]]

Everything shown so far could also be an array or an array representation of a tensor. A list is just a list, and it can do pretty much whatever it wants, including:
#### another_list_of_lists  =  [[ 4  ,  2  ,  3  ],[ 5  ,  1  ]]
The above list of lists cannot be an array because it is not  homologous . A list of lists is homologous if each list along a dimension is identically long, and this must be true for each dimension. In the case of the list shown above, it’s a 2-dimensional list. The first dimension’s length is the number of sublists in the total list (2). The second dimension is the length of each of those sublists (3, then 2). In the above example, when reading across the “row” dimension (also called the second dimension), the first list is 3 elements long, and the second list is 2 elements long — this is not homologous and, therefore, cannot be an array. While failing to be consistent in one dimension is enough to show that this example is not homologous, we could also read down the “column” dimension (the first dimension); the first two columns are 2 elements long while the
third column only contains 1 element. Note that every dimension does not necessarily need to be the same length; it is perfectly acceptable to have an array with 4 rows and 3 columns (i.e., 4x3).

A matrix is pretty simple. It’s a rectangular array. It has columns and rows. It is two dimensional.
So a matrix can be an array (a 2D array). Can all arrays be matrices? No. An array can be far
more than just columns and rows, as it could have four dimensions, twenty dimensions, and so on.

In [3]:
list_matrix_array  =  [[ 4  ,  2  ],
[ 5  ,  1  ],
[ 8  ,  2  ]]

In [None]:
The above list could also be a valid matrix (because of its columns and rows), which
automatically means it could also be an array. The “shape” of this array would be 3x2, or more
formally described as a shape of  (3, 2)  as it has 3 rows and 2 columns.
To denote a shape, we need to check every dimension. As we’ve already learned, a matrix is a
2-dimensional array. The first dimension is what’s inside the most outer brackets, and if we look
at the above matrix, we can see 3 lists there: 
    [ 4  ,  2  ]  , [ 5  ,  1  ]  , and [ 8  ,  2 ] ; thus, the size in this
dimension is 3 and each of those lists has to be the same shape to form an array (and matrix in this
case). The next dimension’s size is the number of elements inside this more inner pair of brackets,
and we see that it’s 2 as all of them contain 2 elements.

In [None]:
#### With 3-dimensional arrays, like in  lolol  below, we’ll have a 3rd level of brackets:
lolol  =  [[[ 1  ,  5  ,  6  ,  2  ],
[ 3  ,  2  ,  1  ,  3  ]],
[[ 5  ,  2  ,  1  ,  2  ],
[ 6  ,  4  ,  8  ,  4  ]],
[[ 2  ,  8  ,  5  ,  3  ],
[ 1  ,  1  ,  9  ,  4  ]]]

## The first level of this array contains 3 matrices:

[[ 1  ,  5  ,  6  ,  2  ],
[ 3  ,  2  ,  1  ,  3  ]]
[[ 5  ,  2  ,  1  ,  2  ],
[ 6  ,  4  ,  8  ,  4  ]]
And
[[ 2  ,  8  ,  5  ,  3  ],
[ 1  ,  1  ,  9  ,  4  ]]


That’s what’s inside the most outer brackets and the size of this dimension is then 3. If we look at
the first matrix, we can see that it contains 2 lists  —   [  1  ,  5  ,  6  ,  2  ]  and  [  3  ,  2  ,  1  ,  3  ]  so the size of
this dimension is 2  —  while each list of this inner matrix includes 4 elements. These 4 elements
make up the 3rd and last dimension of this matrix since there are no more inner brackets.
Therefore, the shape of this array is (3, 2, 4)  and it’s a 3-dimensional array, since the shape
contains 3 dimensions

![neuron](img/dl1.7.png)

Finally, what’s a tensor? When it comes to the discussion of tensors versus arrays in the context
of computer science, pages and pages of debate have ensued. This intense debate appears to be
caused by the fact that people are arguing from entirely different places. There’s no question that
a tensor is not just an array, but the real question is: “What is a tensor, to a computer scientist, in
the context of deep learning?” We believe that we can solve the debate in one line:
A tensor object is an object that can be represented as an array.
What this means is, as programmers, we can (and will) treat tensors as arrays in the context of
deep learning, and that’s really all the thought we have to put into it. Are all tensors ​ just ​ arrays?
No, but they are represented as arrays in our code, so, to us, they’re only arrays, and this is why
there’s so much argument and confusion.
Now, what is an array? In this book, we define an array as an ordered homologous container for
numbers, and mostly use this term when working with the NumPy package since that’s what the
main data structure is called within it. A linear array, also called a 1-dimensional array, is the
simplest example of an array, and in plain Python, this would be a list. Arrays can also consist
of multi-dimensional data, and one of the best-known examples is what we call a matrix in
mathematics, which we’ll represent as a 2-dimensional array. Each element of the array can be
accessed using a tuple of indices as a key, which means that we can retrieve any array element.
We need to learn one more notion ​ — ​ a vector. Put simply, a vector in math is what we call a list
in Python or a 1-dimensional array in NumPy. Of course, lists and NumPy arrays do not have
the same properties as a vector, but, just as we can write a matrix as a list of lists in Python, we
can also write a vector as a list or an array! Additionally, we’ll look at the vector algebraically
(mathematically) as a set of numbers in brackets. This is in contrast to the physics perspective,
where the vector’s representation is usually seen as an arrow, characterized by a magnitude and
a direction.

###Dot Product and Vector Addition

Let’s now address vector multiplication, as that’s one of the most important operations we’ll
perform on vectors. We can achieve the same result as in our pure Python implementation of
multiplying each element in our inputs and weights vectors element-wise by using a  dot product​ ,
which we’ll explain shortly. Traditionally, we use dot products for  vectors (yet another name for
a container), and we can certainly refer to what we’re doing here as working with vectors just as
we can call them “tensors.” Nevertheless, this seems to add to the mysticism of neural networks
— like they’re these objects out in a complex multi-dimensional vector space that we’ll never
understand. Keep thinking of vectors as arrays  —  a 1-dimensional array is just a vector (or a list
in Python).
Because of the sheer number of variables and interconnections made, we can model very complex
and non-linear relationships with non-linear activation functions, and truly feel like wizards, but
this might do more harm than good. Yes, we will be using the “dot product,” but we’re doing this
because it results in a clean way to perform the necessary calculations. It’s nothing more in-depth
than that — as you’ve already seen, we can do this math with far more rudimentary-sounding
words. When multiplying vectors, you either perform a dot product or a cross product. A cross
product results in a vector while a dot product results in a scalar (a single value/number).v

## What is the Dot Product of Two Vectors?

In mathematics, the **dot product** of two vectors is defined as the sum of the products of corresponding elements from each vector. Both vectors must be of the same size, meaning they must contain the same number of elements. 

Mathematically, if you have two vectors:

$$
\mathbf{a} = [a_1, a_2, a_3, \dots, a_n]
$$
$$
\mathbf{b} = [b_1, b_2, b_3, \dots, b_n]
$$

The **dot product** is calculated as:

$$
\mathbf{a} \cdot \mathbf{b} = a_1 \cdot b_1 + a_2 \cdot b_2 + a_3 \cdot b_3 + \dots + a_n \cdot b_n
$$

In other words, the dot product involves multiplying each element in vector **a** by the corresponding element in vector **b**, then summing up all these products.

### Example of a Dot Product

Let’s say we have two vectors:

$$
\mathbf{a} = [1, 2, 3]
$$
$$
\mathbf{b} = [4, 5, 6]
$$

The dot product would be:

$$
\mathbf{a} \cdot \mathbf{b} = (1 \cdot 4) + (2 \cdot 5) + (3 \cdot 6) = 4 + 10 + 18 = 32
$$

So, the dot product of vectors **a** and **b** is **32**.

### Use in Machine Learning

The dot product is widely used in machine learning, especially in the context of neural networks. For example, in a fully connected neural network layer, the dot product is used to calculate the weighted sum of inputs, which is then passed through an activation function to produce the neuron’s output.


![neon](img/dl1.8.png)

In [None]:
'''A dot product of two vectors is a sum of products of consecutive vector elements. Both vectors
must be of the same size (have an equal number of elements).
Let’s write out how a dot product is calculated in Python. For it, you have two vectors, which we
can represent as lists in Python. We then multiply their elements from the same index values and
then add all of the resulting products. Say we have two lists acting as our vectors:
'''

In [7]:
a = [ 1  ,  2  ,  3  ]
b = [ 2  ,  3  ,  4  ]

dot_product  =  a[ 0  ]  *  b[ 0  ]  +  a[ 1  ]  *  b[ 1  ]  +  a[ 2  ]  *  b[ 2  ]
print (dot_product)

20


![neuron](img/dl1.9.png)

In [None]:
Now, what if we called ​ a ​ “inputs” and ​ b “ ​ weights?” Suddenly, this dot product looks like a
succinct way to perform the operations we need and have already performed in plain Python. We
need to multiply our weights and inputs of the same index values and add the resulting values
together. The dot product performs this exact type of operation; thus, it makes lots of sense to use
here. Returning to the neural network code, let’s make use of this dot product. Plain Python does
not contain methods or functions to perform such an operation, so we’ll use the NumPy package,
which is capable of this, and many more operations that we’ll use in the future.
We’ll also need to perform a vector addition operation in the not-too-distant future. Fortunately,
NumPy lets us perform this in a natural way — using the plus sign with the variables containing
vectors of the data. The addition of the two vectors is an operation performed element-wise,
which means that both vectors have to be of the same size, and the result will become a vector of
this

![neon](img/dl1.10.png)

## Sum of Two Vectors

The **sum of two vectors** is calculated by adding their corresponding elements. If you have two vectors of the same size:

$$
\mathbf{a} = [a_1, a_2, a_3, \dots, a_n]
$$
$$
\mathbf{b} = [b_1, b_2, b_3, \dots, b_n]
$$

The sum of these vectors, denoted as \( \mathbf{c} \), is computed as:

$$
\mathbf{c} = \mathbf{a} + \mathbf{b}
$$

Where each element of \( \mathbf{c} \) is given by:

$$
c_i = a_i + b_i \quad \text{for} \; i = 1, 2, \dots, n
$$

### Example

Consider two vectors:

$$
\mathbf{a} = [2, 4, 6]
$$
$$
\mathbf{b} = [1, 3, 5]
$$

Their sum is:

$$
\mathbf{c} = \mathbf{a} + \mathbf{b} = [2+1, 4+3, 6+5] = [3, 7, 11]
$$

So, the resulting vector \( \mathbf{c} \) is:

$$
\mathbf{c} = [3, 7, 11]
$$

### Use in Applications

Vector addition is fundamental in many areas such as physics and computer science. It is used to represent combined forces, additive color mixing, and in machine learning, operations on feature vectors or embeddings.


# A Layer of Neurons with NumPy
Let us back to the point where we’d like to calculate the output of a layer of 3 neurons, which means the weights will be a matrix or list of weight vectors. In plain Python, we wrote this as a list of lists. With NumPy, this will be a 2-dimensional array, which we’ll call a matrix. Previously with the 3-neuron example, we performed a multiplication of those weights with a list containing inputs, which resulted in a list of output values ​—​ one per neuron. We also described the dot product of two vectors, but the weights are now a matrix, and we need to perform a dot product of them and the input vector. NumPy makes this very easy for us ​— treating this matrix as a list of vectors and performing the dot product one by one with the vector of inputs, returning a list of dot products.
The dot product’s result, in our case, is a vector (or a list) of sums of the weight and input products for each of the neurons. From here, we still need to add corresponding biases to them. The biases can be easily added to the result of the dot product operation as they are a vector of the same size. We can also use the plain Python list directly here, as NumPy will convert it to an array internally.
Previously, we had calculated outputs of each neuron by performing a dot product and adding a bias, one by one. Now we have changed the order of those operations ​—​ we’re performing dot product first as one operation on all neurons and inputs, and then we are adding a bias in the next operation. When we add two vectors using NumPy, each i-th element is added together, resulting in a new vector of the same size. This is both a simplification and an optimization, giving us simpler and faster code.


In [1]:
import numpy as np
inputs = [1.0, 2.0, 3.0, 2.5]
weights = [[0.2, 0.8, -0.5, 1],
[0.5, -0.91, 0.26, -0.5],
[-0.26, -0.27, 0.17, 0.87]]
biases = [2.0, 3.0, 0.5]
layer_outputs = np.dot(weights, inputs) + biases
print(layer_outputs)



[4.8   1.21  2.385]


![neon](img/dl1.12.png)
![neon](img/dl1.13.png)

# A Layer of Neurons with NumPy
## Calculating the Output of a Layer of Neurons

To calculate the output of a layer with multiple neurons, we need to handle weights as a matrix (or 2-dimensional array). In plain Python, we represented this as a list of lists. With NumPy, this is more efficiently represented as a 2-dimensional array, also known as a matrix.

### Dot Product with NumPy

Previously, we calculated the dot product of weights and inputs for each neuron individually. With NumPy, this operation is simplified. When we have a weight matrix and an input vector, NumPy performs the dot product operation efficiently:

$$
\mathbf{W} \cdot \mathbf{x}
$$

where:

- \( \mathbf{W} \) is a matrix of weights.
- \( \mathbf{x} \) is the input vector.

The result of this dot product is a vector (or list) of sums of the products of weights and inputs for each neuron. 

### Adding Biases

After computing the dot product, we add the biases to the resulting vector. Biases can be represented as a vector of the same size as the output of the dot product operation:

$$
\mathbf{y} = (\mathbf{W} \cdot \mathbf{x}) + \mathbf{b}
$$

where:

- $ \mathbf{b} $ is the bias vector.
- $ \mathbf{y} $ is the final output vector.

In NumPy, adding two vectors is straightforward:

$$
\mathbf{y}_i = (\mathbf{W} \cdot \mathbf{x})_i + b_i
$$

### Example

Suppose we have the following:

- Weight matrix \( \mathbf{W} \):
  $$
  \mathbf{W} = \begin{bmatrix}
  w_{11} & w_{12} & w_{13} \\
  w_{21} & w_{22} & w_{23} \\
  w_{31} & w_{32} & w_{33}
  \end{bmatrix}
  $$

- Input vector \( \mathbf{x} \):
  $$
  \mathbf{x} = \begin{bmatrix}
  x_1 \\
  x_2 \\
  x_3
  \end{bmatrix}
  $$

- Bias vector \( \mathbf{b} \):
  $$
  \mathbf{b} = \begin{bmatrix}
  b_1 \\
  b_2 \\
  b_3
  \end{bmatrix}
  $$

The output vector \( \mathbf{y} \) is calculated as:

$$
\mathbf{y} = (\mathbf{W} \cdot \mathbf{x}) + \mathbf{b}
$$

This approach simplifies and speeds up the calculations by leveraging NumPy's optimized operations.

### Summary

By using NumPy, we perform the dot product operation across all neurons in one go and then add biases efficiently. This method is both a simplification and an optimization, resulting in cleaner and faster code.



In [3]:
import numpy as np

# Define inputs, weights, and biases
inputs = [1.0, 2.0, 3.0, 2.5]
weights = [
    [0.2, 0.8, -0.5, 1.0],
    [0.5, -0.91, 0.26, -0.5],
    [-0.26, -0.27, 0.17, 0.87]
]
biases = [2.0, 3.0, 0.5]

# Calculate the output of the layer
layer_outputs = np.dot(weights, inputs) + biases

# Print the results
print(layer_outputs)


[4.8   1.21  2.385]


![neon](img/d14.png)

![neon](img/d15.png)

This syntax involving the dot product of weights and inputs followed by the vector addition of bias is the most commonly used way to represent this calculation of ​ inputs·weights+bias ​ . To explain the order of parameters we are passing into ​ np.dot() , ​ we should think of it as whatever comes first will decide the output shape. In our case, we are passing a list of neuron weights first and then the inputs, as our goal is to get a list of neuron outputs. As we mentioned, a dot product
of a matrix and a vector results in a list of dot products. The ​ np.dot() ​ method treats the matrix as a list of vectors and performs a dot product of each of those vectors with the other vector. In this example, we used that property to pass a matrix, which was a list of neuron weight vectors and a vector of inputs and get a list of dot products ​ — ​ neuron outputs.

### A Batch of Data
To train, neural networks tend to receive data in ​ batches.​ So far, the example input data have been only one sample (or ​ observation​ ) of various features called a feature set:
inputs  =  [  1  ,  2  ,  3  , 2.5 ]
Here, the  [  1  ,  2  ,  3  ,  2.5 ] ​ ​ data are somehow meaningful and descriptive to the output we desire. Imagine each number as a value from a different sensor, from the example in chapter 1, all simultaneously. Each of these values is a feature observation datum, and together they form a feature set instance​ , also called an ​ observation​ , or most commonly, a sample .
![neon](img/d16.png)

Often, neural networks expect to take in many ​ samples​ at a time for two reasons. One reason is that it’s faster to train in batches in parallel processing, and the other reason is that batches help with generalization during training. If you fit (perform a step of a training process) on one sample at a time, you’re highly likely to keep fitting to that individual sample, rather than slowly producing general tweaks to weights and biases that fit the entire dataset. Fitting or training in batches gives you a higher chance of making more meaningful changes to weights and biases. For the concept of fitment in batches rather than one sample at a time, the
following animation can help:

![neon](img/d17.png)

An example of a batch of data could look like:
    ![neon](img/d18.png)

## Working with Batches of Observations

In Python, lists are useful for holding individual samples and batches of observations. Consider the following batch of observations, where each sub-list represents a sample:

$$
\text{inputs} = \left[ \begin{array}{cccc}
1 & 2 & 3 & 2.5 \\
2 & 5 & -1 & 2 \\
-1.5 & 2.7 & 3.3 & -0.8
\end{array} \right]
$$

Each row in this matrix represents a sample, also referred to as a feature set instance or observation.

### Matrix Operations

When dealing with batches of observations, we need to perform matrix operations. Specifically, we want to compute the dot product of a matrix of inputs with a matrix of weights. This operation is known as the **matrix product**.

Given:
- A matrix of inputs: 

$$
\mathbf{X} = \left[ \begin{array}{cccc}
1 & 2 & 3 & 2.5 \\
2 & 5 & -1 & 2 \\
-1.5 & 2.7 & 3.3 & -0.8
\end{array} \right]
$$

- A matrix of weights:

$$
\mathbf{W} = \left[ \begin{array}{cccc}
0.2 & 0.8 & -0.5 & 1.0 \\
0.5 & -0.91 & 0.26 & -0.5 \\
-0.26 & -0.27 & 0.17 & 0.87
\end{array} \right]
$$

The matrix product is computed as:

$$
\mathbf{Y} = \mathbf{X} \cdot \mathbf{W}
$$

where \( \mathbf{Y} \) is the resulting matrix of outputs. Each element \( y_{ij} \) in the result matrix is obtained by performing the dot product of the \(i\)-th row of \( \mathbf{X} \) with the \(j\)-th column of \( \mathbf{W} \).

### Example

If the input matrix \( \mathbf{X} \) is:

$$
\mathbf{X} = \left[ \begin{array}{cccc}
1 & 2 & 3 & 2.5 \\
2 & 5 & -1 & 2 \\
-1.5 & 2.7 & 3.3 & -0.8
\end{array} \right]
$$

And the weight matrix \( \mathbf{W} \) is:

$$
\mathbf{W} = \left[ \begin{array}{cccc}
0.2 & 0.8 & -0.5 & 1.0 \\
0.5 & -0.91 & 0.26 & -0.5 \\
-0.26 & -0.27 & 0.17 & 0.87
\end{array} \right]
$$

The resulting matrix \( \mathbf{Y} \) will be:

$$
\mathbf{Y} = \mathbf{X} \cdot \mathbf{W}
$$

This matrix product operation efficiently handles the dot products between all pairs of rows from the input matrix and columns from the weight matrix, producing a matrix of outputs. 

### Summary

In matrix operations, the dot product of matrices (or a matrix and a vector) produces a matrix where each element represents the result of a dot product between a row from one matrix and a column from the other matrix. This is fundamental in neural network computations, especially when handling batches of data.


# Matrix Product
The ​ matrix product​ is an operation in which we have 2 matrices, and we are performing dot
products of all combinations of rows from the first matrix and the columns of the 2nd matrix,
resulting in a matrix of those atomic ​ dot products​ :
    
  ![n](img/d19.png) 
    
    To perform a matrix product, the size of the second dimension of the left matrix must match the
size of the first dimension of the right matrix. For example, if the left matrix has a shape of ​ (5, 4)
then the right matrix must match this 4 within the first shape value ​ (4, 7) ​ . The shape of the
resulting array is always the first dimension of the left array and the second dimension of the right
array, ​ (5, 7) . ​ In the above example, the left matrix has a shape of ​ (5, 4) , ​ and the upper-right matrix
has a shape of ​ (4, 5) ​ . The second dimension of the left array and the first dimension of the second
array are both ​ 4 , ​ they match, and the resulting array has a shape of ​ (5, 5) . ​
To elaborate, we can also show that we can perform the matrix product on vectors. In
mathematics, we can have something called a column vector and row vector, which we’ll explain
better shortly. They’re vectors, but represented as matrices with one of the dimensions having a
size of 1:


To perform a matrix product, the size of the second dimension of the left matrix must match the
size of the first dimension of the right matrix. For example, if the left matrix has a shape of ​ (5, 4)
then the right matrix must match this 4 within the first shape value ​ (4, 7) ​ . The shape of the
resulting array is always the first dimension of the left array and the second dimension of the right
array, ​ (5, 7) . ​ In the above example, the left matrix has a shape of ​ (5, 4) , ​ and the upper-right matrix
has a shape of ​ (4, 5) ​ . The second dimension of the left array and the first dimension of the second
array are both ​ 4 , ​ they match, and the resulting array has a shape of ​ (5, 5) . ​
To elaborate, we can also show that we can perform the matrix product on vectors. In
mathematics, we can have something called a column vector and row vector, which we’ll explain
better shortly. They’re vectors, but represented as matrices with one of the dimensions having a
size of 1:
     ![neon](img/d20.png) 
    
    a ​ is a row vector. It looks very similar to a vector ​ a ​ (with an arrow above it) described earlier
along with the vector product. The difference in notation between a row vector and vector are
commas between values and the arrow above symbol ​ a ​ is missing on a row vector. It’s called a
row vector as it’s a vector of a row of a matrix. ​ b , ​ on the other hand, is called a column vector
because it’s a column of a matrix. As row and column vectors are technically matrices, we do not
denote them with vector arrows anymore.
When we perform the matrix product on them, the result becomes a matrix as well, like in the
previous example, but containing just a single value, the same value as in the dot product example
we have discussed previously:
     ![n](d21.png) 
     In other words, row and column vectors are matrices with one of their dimensions being of a
size of 1; and, we perform the ​ matrix product​ on them instead of the ​ dot product​ , which
results in a matrix containing a single value. In this case, we performed a matrix multiplication
of matrices with shapes ​ (1, 3) ​ and ​ (3, 1) ​ , then the resulting array has the shape ​ (1, 1) ​ or a size of
1x1 ​ .
     

### Transposition for the Matrix Product
How did we suddenly go from 2 vectors to row and column vectors? We used the relation of the
dot product and matrix product saying that a dot product of two vectors equals a matrix product of
a row and column vector (the arrows above the letters signify that they are vectors):
 ![n](img/d25.png) 
 
 Now we need to get back to row and column vector definitions and update them with what we
have just learned.
A row vector is a matrix whose first dimension’s size (the number of rows) equals 1 and the
second dimension’s size (the number of columns) equals ​ n ​ — the vector size. In other words, it’s
a 1×n array or array of shape (1, n).

A row vector is a matrix whose first dimension’s size (the number of rows) equals 1 and the second dimension’s size (the number of columns) equals ​ n ​ — the vector size. In other words, it’s a 1×n array or array of shape (1, n):
![image.png](attachment:image.png)
With NumPy and with 3 values, we would define it as: ![image-2.png](attachment:image-2.png)  Note the use of double brackets here. To transform a list into a matrix containing a single row
(perform an equivalent operation of turning a vector into row vector), we can put it into a list and
create numpy array:  a ​ = ​ [ ​ 1 ​ , ​ 2 ​ , ​ 3 ​ ]
np.array([a])
>>>
array([[​ 1 ​ , ​ 2 ​ , ​ 3 ​ ]])

 Where ​ np.expand_dims() ​ adds a new dimension at the index of the ​ axis . ​
A column vector is a matrix where the second dWe have achieved the same result as the dot product of two vectors, but performed on matrices
and returning a matrix ​ — ​ exactly what we expected and wanted. It’s worth mentioning that
NumPy does not have a dedicated method for performing matrix product ​ — ​ the dot product and
matrix product are both implemented in a single method: ​ np.dot() . ​
As we can see, to perform a matrix product on two vectors, we took one as is, transforming it into
a row vector, and the second one using transposition on it to turn it into a column vector. That
allowed us to perform a matrix product that returned a matrix containing a single value. We also
performed the matrix product on two example arrays to learn how a matrix product works ​ — ​ it
creates a matrix of dot products of all combinations of row and column vectorsimension’s size equals 1, in other words, it’s an![image-2.png](attachment:image-2.png) 
array of shape (n, 1): ![image.png](attachment:image.png) 

## From Dot Product to Matrix Product

The dot product of two vectors can be extended to matrix operations, where vectors can be viewed as special cases of matrices. To understand this transition, let's explore the concepts of dot products, row vectors, and column vectors.

### Dot Product of Two Vectors

For two vectors $ \mathbf{a} $ and $ \mathbf{b} $, the dot product is calculated as:

$$
\mathbf{a} \cdot \mathbf{b} = a_1 \cdot b_1 + a_2 \cdot b_2 + \cdots + a_n \cdot b_n
$$

where $ \mathbf{a} $ and $ \mathbf{b} $ are both vectors of the same dimension \( n \).

### Transition to Matrix Product

1. **Vectors as Matrices**: Vectors can be thought of as matrices with a single row or a single column:
   - **Row Vector**: A 1 × \( n \) matrix.
   - **Column Vector**: An \( n \) × 1 matrix.

2. **Dot Product as Matrix Product**:
   - If you have a row vector $ \mathbf{a} $ (1 × $ n $) and a column vector $ \mathbf{b} $ (\( n \) × 1), their dot product can be seen as the matrix product of these two vectors:
   
     $$
     \mathbf{a} \cdot \mathbf{b} = \mathbf{A} \cdot \mathbf{B}
     $$
   
   - Where:
     $$
     \mathbf{A} = [a_1, a_2, \dots, a_n]
     $$
     $$
     \mathbf{B} = \begin{bmatrix}
     b_1 \\
     b_2 \\
     \vdots \\
     b_n
     \end{bmatrix}
     $$

   - The resulting product is a single scalar, which is the sum of the element-wise products.

### Matrix Product

When extending to matrices, the concept of dot products scales to matrix products:
- **Matrix Product**: If $ \mathbf{X} $ is a matrix of size $ m \times n $ and $ \mathbf{W} $ is a matrix of size $ n \times p $, their product $ \mathbf{Y} $ is an $ m \times p $ matrix:

  $$
  \mathbf{Y} = \mathbf{X} \cdot \mathbf{W}
  $$

  Each element $ y_{ij} $ of $ \mathbf{Y} $ is computed as:

  $$
  y_{ij} = \sum_{k=1}^n x_{ik} \cdot w_{kj}
  $$

  where $ x_{ik} $ is an element from the \(i\)-th row of $ \mathbf{X} $ and $ w_{kj} $ is an element from the $j$-th column of $ \mathbf{W} $.

### Summary

- **Dot Product**: The dot product of two vectors can be viewed as the matrix product of a row vector and a column vector.
- **Matrix Product**: Extending this concept, a matrix product involves performing dot products between rows of the first matrix and columns of the second matrix, resulting in a matrix of output values.

This transition from vectors to matrices shows how the operations scale and how similar operations are applied in broader contexts, such as neural network computations where batches of data and layers are involved.


# A Layer of Neurons & Batch of Data with NumPy.

Let’s get back to our inputs and weights ​ — ​ when covering them, we mentioned that we need to
perform dot products on all of the vectors that consist of both input and weight matrices. As we
have just learned, that’s the operation that the matrix product performs. We just need to perform
transposition on its second argument, which is the weights matrix in our case, to turn the row
vectors it currently consists of into column vectors.
Initially, we were able to perform the dot product on the inputs and the weights without a
transposition because the weights were a matrix, but the inputs were just a vector. In this case, the
dot product results in a vector of atomic dot products performed on each row from the matrix and
this single vector. When inputs become a batch of inputs (a matrix), we need to perform the
matrix product. It takes all of the combinations of rows from the left matrix and columns from the
right matrix, performing the dot product on them and placing the results in an output array. Both
arrays have the same shape, but, to perform the matrix product, the shape’s value from the index 1
of the first matrix and the index 0 of the second matrix must match — they don’t right now.![image.png](attachment:image.png)

If we look at this from the perspective of the input and weights, we need to perform the dot
product of each input and each weight set in all of their combinations. The dot product takes the
row from the first array and the column from the second one, but currently the data in both arrays
are row-aligned. Transposing the second array shapes the data to be column-aligned. The matrix
product of inputs and transposed weights will result in a matrix containing all atomic dot products
that we need to calculate. The resulting matrix consists of outputs of all neurons after operations
performed on each input sample:

![](img/b2.png)
Figure B2: the dot product of inputs and transposed weights.
We mentioned that the second argument for ​ np.dot() ​ is going to be our transposed weights, so
first will be inputs, but previously weights were the first parameter. We changed that here.
Before, we were modeling neuron output using a single sample of data, a vector, but now we are
a step forward when we model layer behavior on a batch of data. We could retain the current
parameter order, but, as we’ll soon learn, it’s more useful to have a result consisting of a list of
layer outputs per each sample than a list of neurons and their outputs sample-wise. We want the
resulting array to be sample-related and not neuron-related as we’ll pass those samples further
through the network, and the next layer will expect a batch of inputs.
We can code this solution using NumPy now. We can perform ​ np.dot() ​ on a plain Python list of
lists as NumPy will convert them to matrices internally. We are converting weights ourselves
though to perform transposition operation first, ​ T ​ in the code, as plain Python list of lists does not
support it. Speaking of biases, we do not need to make it a NumPy array for the same reason ​ —
NumPy is going to do that internally Biases are a list, though, so they are a 1D array as a NumPy array. The addition of this bias vector
to a matrix (of the dot products in this case) works similarly to the dot product of a matrix and
vector that we described earlier; The bias vector will be added to each row vector of the matrix.
Since each column of the matrix product result is an output of one neuron, and the vector is going
to be added to each row vector, the first bias is going to be added to each first element of those
vectors, second to second, etc. That’s what we need ​ — ​ the bias of each neuron needs to be added
to all of the results of this neuron performed on all input vectors (samples).
![](img/b3.png)

Figure B3: Inputs multiplied by the weights, plus the bias

In [1]:
import numpy as np

# Input samples (3 samples with 4 features each)
inputs = [
    [1.0, 2.0, 3.0, 2.5],
    [2.0, 5.0, -1.0, 2.0],
    [-1.5, 2.7, 3.3, -0.8]
]

# Weights for each neuron (3 neurons, each with 4 weights corresponding to the 4 inputs)
weights = [
    [0.2, 0.8, -0.5, 1.0],
    [0.5, -0.91, 0.26, -0.5],
    [-0.26, -0.27, 0.17, 0.87]
]

# Biases for each neuron
biases = [2.0, 3.0, 0.5]

# Compute the dot product of inputs and the transposed weights, and add the biases
layer_outputs = np.dot(inputs, np.array(weights).T) + biases

# Output the result
print(layer_outputs)


[[ 4.8    1.21   2.385]
 [ 8.9   -1.81   0.2  ]
 [ 1.41   1.051  0.026]]


- The **inputs** are a list of three input samples, each with four features.
- The **weights** represent the weight matrix for the three neurons, each with four weights.
- The **biases** are applied after computing the dot product of the inputs and the transposed weights (since the weight matrix needs to align with the inputs matrix for the dot product).
- **`np.dot(inputs, np.array(weights).T)`** computes the matrix product, and the biases are added element-wise to the result.
