<a href="https://colab.research.google.com/github/rahiakela/data-learning-research-and-practice/blob/main/deep-learning-with-python-by-francois-chollet/2-mathematical-building-blocks/1_gears_of_neural_networks_tensor_operations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##The gears of neural networks: Tensor operations

Much as any computer program can be ultimately reduced to a small set of binary
operations on binary inputs (AND, OR, NOR, and so on), all transformations learned
by deep neural networks can be reduced to a handful of tensor operations (or tensor functions)
applied to tensors of numeric data. For instance, it’s possible to add tensors,
multiply tensors, and so on.

A Keras layer instance looks like this:

```python
keras.layers.Dense(512, activation="relu")
```

This layer can be interpreted as a function, which takes as input a matrix and returns
another matrix—a new representation for the input tensor. 

Specifically, the function
is as follows (where W is a matrix and b is a vector, both attributes of the layer):

```python
output = relu(dot(input, W) + b)
```

Let’s unpack this. We have three tensor operations here:

- A dot product (dot) between the input tensor and a tensor named W
- An addition (+) between the resulting matrix and a vector b
- A relu operation: relu(x) is max(x, 0); “relu” stands for “rectified linear unit”

##Element-wise operations

The relu operation and addition are element-wise operations: operations that are
applied independently to each entry in the tensors being considered. This means
these operations are highly amenable to massively parallel implementations.

If you want to write a naive Python implementation of
an element-wise operation, you use a for loop, as in this naive implementation of an
element-wise relu operation:

In [1]:
def naive_relu(x):
  # x is a rank-2 NumPy tensor
  assert len(x.shape) == 2

  # Avoid overwriting the input tensor
  x = x.copy()
  for i in range(x.shape[0]):
    for j in range(x.shape[1]):
      x[i, j] = max(x[i, j], 0)
  return x

You could do the same for addition:

In [2]:
def naive_add(x, y):
  # x and y are rank-2 NumPy tensors
  assert len(x.shape) == 2
  assert x.shape == y.shape
  
  # Avoid overwriting the input tensor
  x = x.copy()
  for i in range(x.shape[0]):
    for j in range(y.shape[1]):
      x[i, j] += y[i, j]
  return x

On the same principle, you can do element-wise multiplication, subtraction, and so on.

In practice, when dealing with NumPy arrays, these operations are available as welloptimized
built-in NumPy functions, which themselves delegate the heavy lifting to a
Basic Linear Algebra Subprograms (BLAS) implementation. BLAS are low-level,
highly parallel, efficient tensor-manipulation routines that are typically implemented
in Fortran or C.

So, in NumPy, you can do the following element-wise operation, and it will be blazing
fast:

In [3]:
import numpy as np

In [4]:
x = np.random.random((20, 100))
y = np.random.random((20, 100))

In [5]:
z = x + y
z

array([[1.14745926, 1.23403304, 1.07145376, ..., 0.69298026, 0.97656831,
        0.91711773],
       [1.00544481, 1.33681472, 0.49318897, ..., 0.48758283, 0.46381589,
        0.75134547],
       [0.58401856, 0.84126636, 1.53273405, ..., 1.22265976, 0.87427216,
        0.80675042],
       ...,
       [0.69448852, 1.619114  , 0.5624287 , ..., 1.54145763, 1.70469161,
        1.20280307],
       [1.01686823, 0.18868716, 1.2857621 , ..., 0.59698089, 1.49297035,
        0.94123989],
       [0.80501938, 0.87875109, 1.20982281, ..., 1.07455795, 1.06342163,
        1.07836931]])

In [6]:
z = np.maximum(z, 0)
z

array([[1.14745926, 1.23403304, 1.07145376, ..., 0.69298026, 0.97656831,
        0.91711773],
       [1.00544481, 1.33681472, 0.49318897, ..., 0.48758283, 0.46381589,
        0.75134547],
       [0.58401856, 0.84126636, 1.53273405, ..., 1.22265976, 0.87427216,
        0.80675042],
       ...,
       [0.69448852, 1.619114  , 0.5624287 , ..., 1.54145763, 1.70469161,
        1.20280307],
       [1.01686823, 0.18868716, 1.2857621 , ..., 0.59698089, 1.49297035,
        0.94123989],
       [0.80501938, 0.87875109, 1.20982281, ..., 1.07455795, 1.06342163,
        1.07836931]])

Let’s actually time the difference:

In [7]:
import time

In [8]:
t0 = time.time()

for _ in range(1000):
  z = x + y
  z = np.maximum(z, 0.0)
print("Took: {0:.2f} s".format(time.time() - t0))

Took: 0.01 s


This takes 0.01 s. Meanwhile, the naive version takes a stunning 2.73 s:

In [9]:
t0 = time.time()

for _ in range(1000):
  z = naive_add(x, y)
  z = naive_relu(z)
print("Took: {0:.2f} s".format(time.time() - t0))

Took: 2.79 s


Likewise, when running TensorFlow code on a GPU, element-wise operations are executed
via fully vectorized CUDA implementations that can best utilize the highly parallel
GPU chip architecture.

##Broadcasting

When possible, and if there’s no ambiguity, the smaller tensor will be broadcast to match the shape of the larger tensor. 

Broadcasting consists of two steps:
1. Axes (called broadcast axes) are added to the smaller tensor to match the ndim of the larger tensor.
2. The smaller tensor is repeated alongside these new axes to match the full shape of the larger tensor.

Let’s look at a concrete example. Consider X with shape `(32, 10)` and y with shape `(10,)`:

In [10]:
X = np.random.random((32, 10))    # X is a random matrix with shape (32, 10).
y = np.random.random((10, ))      # y is a random vector with shape (10,).

First, we add an empty first axis to y, whose shape becomes `(1, 10)`:

In [11]:
y = np.expand_dims(y, axis=0)    # The shape of y is now (1, 10).
y.shape

(1, 10)

Then, we repeat y 32 times alongside this new axis, so that we end up with a tensor Y with shape `(32, 10)`, where `Y[i, :] == y` for i in range`(0, 32)`:

In [12]:
# Repeat y 32 times along axis 0 to obtain Y, which has shape (32, 10).
Y = np.concatenate([y] * 32, axis=0)
Y.shape

(32, 10)

At this point, we can proceed to add X and Y, because they have the same shape.

The repetition operation is entirely virtual: it happens at the
algorithmic level rather than at the memory level. But thinking of the vector being repeated 10 times alongside a new axis is a helpful mental model.

In [13]:
def naive_add_matrix_and_vector(x, y):
  assert len(x.shape) == 2   # x is a rank-2 NumPy tensor
  assert len(y.shape) == 1   # y is a NumPy vector
  assert len(x.shape[1]) == len(y.shape[0])

  # Avoid overwriting the input tensor
  x = x.copy()
  for i in range(x.shape[0]):
    for j in range(x.shape[1]):
      x[i, j] += y[j]
  return x            

With broadcasting, you can generally perform element-wise operations that take two inputs tensors if one tensor has shape `(a, b, … n, n + 1, … m)` and the other has shape `(n, n + 1, … m)`. The broadcasting will then automatically happen for axes a through `n - 1`.

The following example applies the element-wise maximum operation to two tensors
of different shapes via broadcasting:

In [14]:
x = np.random.random((64, 3, 32, 10))   # x is a random tensor with shape (64, 3, 32, 10).
y = np.random.random((32, 10))          # y is a random tensor with shape (32, 10).

z = np.maximum(x, y)                    # The output z has shape (64, 3, 32, 10) like x
z.shape

(64, 3, 32, 10)

In [15]:
x = np.random.random((64, 3, 32, 10))   # x is a random tensor with shape (64, 3, 32, 10).
y = np.random.random((3, 32, 10))       # y is a random tensor with shape (3, 32, 10).

z = np.maximum(x, y)                    # The output z has shape (64, 3, 32, 10) like x
z.shape

(64, 3, 32, 10)

##Tensor product

The tensor product, or dot product (not to be confused with an element-wise product, the * operator), is one of the most common, most useful tensor operations.

In NumPy, a tensor product is done using the np.dot function (because the mathematical notation for tensor product is usually a dot):

In [16]:
x = np.random.random((32, ))
y = np.random.random((32, ))

z = np.dot(x, y)
z

7.890335521359871

In mathematical notation, you’d note the operation with a dot (•):

```python
z = x • y
```

Mathematically, what does the dot operation do? 

Let’s start with the dot product of
two vectors, x and y. It’s computed as follows:

In [17]:
def naive_vector_dot(x, y):
  # x and y are NumPy vectors
  assert len(x.shape) == 1
  assert len(y.shape) == 1
  assert x.shape[0] == y.shape[0]

  z = 0.0
  for i in range(x.shape[0]):
    z += x[i] * y[i]
  return z

You’ll have noticed that the dot product between two vectors is a scalar and that only vectors with the same number of elements are compatible for a dot product.

You can also take the dot product between a matrix x and a vector y, which returns a vector where the coefficients are the dot products between y and the rows of x.

In [18]:
def naive_matrix_vector_dot(x, y):
  # x is a NumPy matrix
  assert len(x.shape) == 2
  # y is a NumPy vector
  assert len(y.shape) == 1
  # The first dimension of x must be the same as the 0th dimension of y!
  assert x.shape[1] == y.shape[0]

  # This operation returns a vector of 0s with the same shape as y.
  z = np.zeros(x.shape[0])
  for i in range(x.shape[0]):
    for j in range(x.shape[1]):
      z[i] += x[i, j] * y[j]
  return z

You could also reuse the code we wrote previously, which highlights the relationship between a matrix-vector product and a vector product:

In [19]:
def naive_matrix_vector_dot(x, y):
  # x is a NumPy matrix
  assert len(x.shape) == 2
  # y is a NumPy vector
  assert len(y.shape) == 1
  # The first dimension of x must be the same as the 0th dimension of y!
  assert x.shape[1] == y.shape[0]

  # This operation returns a vector of 0s with the same shape as y.
  z = np.zeros(x.shape[0])
  for i in range(x.shape[0]):
    z[i] = naive_matrix_vector_dot(x[i, :], y)
  return z

Note that as soon as one of the two tensors has an ndim greater than 1, dot is no longer symmetric, which is to say that `dot(x, y)` isn’t the same as `dot(y, x)`.

Of course, a dot product generalizes to tensors with an arbitrary number of axes.

The most common applications may be the dot product between two matrices. You can take the dot product of two matrices `x` and `y (dot(x, y))` if and only if `x.shape[1] == y.shape[0]`. The result is a matrix with shape `(x.shape[0], y.shape[1])`, where the coefficients are the vector products between the rows of x and the columns of y.

In [20]:
def naive_matrix_dot(x, y):
  # x and y are NumPy matrices
  assert len(x.shape) == 2
  assert len(y.shape) == 2
  # The first dimension of x must be the same as the 0th dimension of y!
  assert x.shape[1] == y.shape[0]

  # This operation returns a matrix of 0s with a specific shape.
  z = np.zeros((x.shape[0], y.shape[1]))
  for i in range(x.shape[0]):     # Iterates over the rows of x . . .
    for j in range(y.shape[1]):    # . . . and over the columns of y
      row_x = x[i, :]
      column_y = y[:, j]
      z[i, j] = naive_vector_dot(row_x, column_y)
  return z

To understand dot-product shape compatibility, it helps to visualize the input and output tensors by aligning them as shown.

<img src='https://github.com/rahiakela/data-learning-research-and-practice/blob/main/deep-learning-with-python-by-francois-chollet/2-mathematical-building-blocks/images/1.png?raw=1' width='800'/>

In the figure, x, y, and z are pictured as rectangles (literal boxes of coefficients). Because the rows of x and the columns of y must have the same size, it follows that the width of x must match the height of y. 

If you go on to develop new machine learning algorithms, you’ll likely be drawing such diagrams often.

More generally, you can take the dot product between higher-dimensional tensors,
following the same rules for shape compatibility as outlined earlier for the 2D case:

```python
(a, b, c, d) • (d,) → (a, b, c)
(a, b, c, d) • (d, e) → (a, b, c, e)
```

And so on.

##Tensor reshaping