This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.

**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**

This notebook was generated for TensorFlow 2.6.

# The mathematical building blocks of neural networks

To provide sufficient context for introducing tensors and gradient descent, we’ll begin the
chapter with a practical example of a neural network. Then we’ll go over every new concept
that’s been introduced, point by point.

### Geometric interpretation of tensor operations

It’s a point in a 2D space (see figure 2.6). It’s common to picture a vector as an arrow linking the origin to the point

![](https://drek4537l1klr.cloudfront.net/chollet2/v-7/Figures/ch02-geometric_interpretation_1.png)
![](https://drek4537l1klr.cloudfront.net/chollet2/v-7/Figures/ch02-geometric_interpretation_2.png)

Tensor addition thus represents the action of **translating** an object (moving the object without distorting it) by a certain amount in a certain direction.
![](https://drek4537l1klr.cloudfront.net/chollet2/v-7/Figures/ch02-geometric_interpretation_3.png)

In general, elementary geometric operations such as translation, rotation, scaling, skewing, and so on can be expressed as tensor operations. 

- Translation:
    ![](https://drek4537l1klr.cloudfront.net/chollet2/v-7/Figures/translation.png)
- Rotation:
    ![](https://drek4537l1klr.cloudfront.net/chollet2/v-7/Figures/rotation.png)
- Scaling:
    ![](https://drek4537l1klr.cloudfront.net/chollet2/v-7/Figures/scaling.png)
- Linear transform: A dot product with an arbitrary matrix implements a linear transform. Note that scaling and rotation, seen above, are by definition linear transforms.
- Affine transform: An affine transform is the combination of a linear transform (achieved via a dot product some matrix) and a translation (achieved via a vector addition). As you have probably recognized, that’s exactly the $y = W \cdot x + b$ computation implemented by the `Dense` layer! A Dense layer without an activation function is an affine layer.
    ![](https://drek4537l1klr.cloudfront.net/chollet2/v-7/Figures/affine_transform.png)
   
- `Dense` layer with relu activation: An important observation about affine transforms is that if you apply many of them repeatedly, **you still end up with an affine transform** (so you could just have applied that one affine transform in the first place). Let’s try it with two: affine2(affine1(x)) = $W2 • (W1 • x + b1) + b2 = (W2 • Wa) • x + (W2 • b1 + b2)$. That’s an affine transform where the linear part is the matrix $W2 • W1$ and the translation part is the vector $W2 • b1 + b2$. As a consequence, a multi-layer neural network made entirely of Dense layers without activations would be equivalent to a **single Dense layer**. This "deep" neural network would just be a linear model in disguise! This is why we need activation functions, like relu. Thanks to activation functions, **a chain of Dense layer can be made to implement very complex, non-linear geometric transformation**, resulting in very rich hypothesis spaces for your deep neural networks. We cover this idea in more detail in the next chapter.

    ![](https://drek4537l1klr.cloudfront.net/chollet2/v-7/Figures/dense_transform.png)

### A geometric interpretation of deep learning

In 3D, the following mental image may prove useful. Imagine two sheets of colored paper: one red and one blue. Put one on top of the other. Now crumple them together into a small ball. That crumpled paper ball is your input data, and each sheet of paper is a class of data in a classification problem. What a neural network is meant to do is **figure out a transformation of the paper ball that would uncrumple it, so as to make the two classes cleanly separable again**. With deep learning, this would be implemented as a series of simple transformations of the 3D space, such as those you could apply on the paper ball with your fingers, one movement at a time.

![](https://drek4537l1klr.cloudfront.net/chollet2/v-7/Figures/ch02-geometric_interpretation_4.png)

Uncrumpling paper balls is what machine learning is about: finding neat representations for complex, highly-folded data **manifolds** in high-dimensional spaces (a manifold is a continuous surface, like our crumpled sheet of paper). At this point, you should have a pretty good intuition as to why deep learning excels at this: it takes the approach of incrementally decomposing a complicated geometric transformation into a long chain of elementary ones, which is pretty much the strategy a human would follow to uncrumple a paper ball. **Each layer in a deep network applies a transformation that disentangles the data a little**—and a deep stack of layers makes tractable an extremely complicated disentanglement process.