In [1]:
%run ../common/import_all.py

from common.setup_notebook import set_css_style, setup_matplotlib, config_ipython
config_ipython()
setup_matplotlib()
set_css_style()

# Tensors

Tensors are generalisations of vectors and matrices to higher dimensions. They're vastly used on neural networks representations of data! 

* a 0D tensor (tensor of rank 0) is a scalar (a number)
* a 1D tensor (tensor of rank 1) is a vector
* a 2D tensor (...) is a matrix
* a 3D tensor is ... a 3D tensor
* ...

Each dimension of a tensor is called an *axis*. Note that you can create arrays of any rank with Numpy! 

## Operations

Operations on tensors are the scaled up versions of operations on vectors and matrices. 

## Tensors for data representation in Deep Learning

When passing a training set to a deep model, it is customary to place each sample in an array and build a tensor.

Suppose you're working on some images task for instance. Images are matrices (when grayscale), that is, tensors of rank 2. The typical way to pass a training set of images to a deep network is via the construction of a tensor of rank 3 where all images are placed in an array, effectively where the first axis denotes the sample. Same with colour images, which are tensors of rank 3, the third dimension being the colour channel; a set of those is a tensor of rank 4 where the first axis is the sample, the second and third are height and width and the fourth the colour.

Note that Tensorflow uses the convention here described, Theano puts the colour channel on the second axis, and then height and width instead. 

This structure is employed for each sort of dimensionality: you build a tensor where the first axes stores each sample. Videos go on 5D tensors as each frame is an image!

## Broadcasting

Broadcasting is the procedure that makes it possible to compute operations over tensors of different rank, like for instance an addition between a vector and a matrix. What broadcasting does is "extending" the smallest of the tensors by replicating it across the missing axes so that it matches the shape of the other tensor. For example, if you want to sum vector $v = (1, 2, 2)$ to matrix $A = \begin{bmatrix}
    2  & 3 & 1 \\
    1  & 1 & 1
\end{bmatrix}
$, you replicate the vector over the missing axis to build a matrix, effectively then summing $V = \begin{bmatrix}
    1  & 2 & 2 \\
    1  & 2 & 2
\end{bmatrix}$ to $A$, which yields $\begin{bmatrix}
    3  & 5 & 3 \\
    2  & 3 & 3
\end{bmatrix}$ .

## Symbolic computation

"This means that, given a chain of operations with a known derivative, they can compute a gradient function for the chain (by applying the chain rule) that maps network parameter values to gradient values. When you have access to such a function, the backward pass is reduced to a call to this gradient function. Thanks to symbolic differentiation, you’ll never have to implement the Backpropagation algorithm by hand."

## References

1. F Chollet, **Deep Learning with Python**, *Manning*, 2017
2. [**TensorFlow** on tensors](https://www.tensorflow.org/programmers_guide/tensors)
3. [**TensorFlow on broadcasting**](https://www.tensorflow.org/performance/xla/broadcasting)