# Thinking in tensors, writing in PyTorch

A hands-on course by [Piotr Migdał](https://p.migdal.pl) (2019).

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/stared/thinking-in-tensors-writing-in-pytorch/blob/master/1%20Vectors%2C%20matrices%20and%20tensors.ipynb)


## Notebook 1: Vectors, matrices and tensors

So, what is a tensor?

**WORK IN PROGRESS**

### TO DO:

* more examples
* [Deep Spreadsheets with ExcelNet](http://www.deepexcel.net/)

In [1]:
import torch
from torch import tensor



Linear algebra is the language of deep learning... and quantum mechanics.

Note: in physics and engineering, tensor is not any array. There is a one-two-many rule: 

* 0: scalar
* 1: vector
* 2: matrix
* 3 and above: n-dimensional tensor

In theory, tensors can be of an arbitrarily high dimension. In deep learning, they rare exceed 5.

## Scalar

Scalar is "just a number". Real-world examples of a scalar are: temperature, pressure, price of an apple in a given shop, etc.

In [2]:
x = tensor(42.)
x

tensor(42.)

In [3]:
x.dim()

0

In [4]:
2 * x

tensor(84.)

In [5]:
x.item()

42.0

### Food for thought

> The scalar fallacy is the false but pervasive assumption that real-world things (hotels, sandwiches, people, mutual funds, chemo drugs, whatever) have some single-dimension ordering of "goodness".

> When you project a multi-dimensional space down to one dimension, you are involving a lot of context and preferences in the act of projecting. - [rlucas on HN](https://news.ycombinator.com/item?id=8132525)

See also: [Scalar fallacy](http://observationalepidemiology.blogspot.com/2011/01/scalar-fallacy.html).


## Vector

Vector is an ordered list of numbers, such as `[-5., 2., 0.]`.

In physics and mechanical engineering, not everything is a vector:

> it is not generally true that any three numbers form a vector. It is true only if, when we rotate the coordinate system, the components of the vector transform among themselves in the correct way. - [II 02: Differential Calculus of Vector Fields](http://www.feynmanlectures.caltech.edu/II_02.html) from [The Feynman Lectures on Physics](http://www.feynmanlectures.caltech.edu/)

* position
* velocity
* electric field
* spatial gradient of a scalar field ($\nabla T$)


In deep learning we are more... relaxed. Usually vectors are abstract, 


* feature vector after a ImageNet-trained vector
* a word representation in (see: [king - man + woman is queen; but why?](https://p.migdal.pl/2017/01/06/king-man-woman-queen-why.html))
* user and product vectors in [Factorization Machines](https://www.reddit.com/r/MachineLearning/comments/65d3lt/r_factorization_machines_2010_a_classic_paper_in/) and related recommendation systems


$$\vec{v} = \left[ v_1, v_2, \ldots, v_n \right]$$

In [6]:
v = tensor([1.5, -0.5, 3.0])
v

tensor([ 1.5000, -0.5000,  3.0000])

In [7]:
v.dim()

1

In [8]:
v.size()

torch.Size([3])

### Vector arithmetics

We can multiply vectors by a scalar: 

$$a \vec{v} = \left[a v_1, a v_2, \ldots, a v_n \right]$$

Or, provided that two vectors have the same length, add and subtract vectors to each other:

$$\vec{v} + \vec{u}$$

In [None]:
2 * v

In [None]:
u = tensor([1., 0., 1.])

In [None]:
v + u

### Vector length


$$|\vec{v}| = \sqrt{v_1^2 + v_2^2 + \ldots + v_n^2} = \sqrt{\sum_{i=1}^n v_i^2}$$

In [None]:
v.pow(2).sum().sqrt()

In [None]:
v**2

In [None]:
torch.pow(v, 2)

In [None]:
v.pow(2).sum()

In [None]:
# or to normalize a vector
v / v.norm()

## Matrix

[![Matrix transform - xkcd](https://imgs.xkcd.com/comics/matrix_transform.png)](https://xkcd.com/184/)

Typical operations:

* rotations
* next step in a stochastic process
* unitary operations and projections in quantum mechanics (these use complex numbers)
* scalar products
* [Hessian matrix](https://en.wikipedia.org/wiki/Hessian_matrix) of a scalar (i.e. second order derivatives of a scalar with respect to a vector)
* channel mixing (e.g. `RGB` to gray-scale and R-G)

In [None]:
M = tensor([[1., 2.], [3., 4.]])
M

In [None]:
M.matmul(M)

In [None]:
tensor([1., 0.]).matmul(M)

In [None]:
# for Python 3.5+
M @ M

In [None]:
M * M

In [None]:
tensor([1., 2.]).matmul(M)

In [None]:
M.det()

In [None]:
# or we can use Singular Value Decomposition, the key step of Principal Component Analysis
M.svd()

[Matrix factorization visualized](https://p.migdal.pl/matrix-decomposition-viz/) by Piotr Migdał (work in progress):

![](imgs/matrix_factorization_city_temperature.png)

## Tensor


Tensor is a generalization of vectors and matrices for more dimensions.

In physics and engineering they have more properties, as in:


![](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fe/StressEnergyTensor_contravariant.svg/250px-StressEnergyTensor_contravariant.svg.png)

[Electromagnetic tensor](https://en.wikipedia.org/wiki/Electromagnetic_tensor) from [Introduction to the mathematics of general relativity - Wikipedia](https://en.wikipedia.org/wiki/Introduction_to_the_mathematics_of_general_relativity), see also: [Tensor](https://en.wikipedia.org/wiki/Tensor).

In deep learning, there are any arrays.


# Further reading


For an introduction to linear algebra, I recommend [immersive linear algebra](http://immersivemath.com/ila/index.html) by by J. Ström, K. Åström, and T. Akenine-Möller (from my [Interactive Machine Learning, Deep Learning and Statistics websites
](http://p.migdal.pl/interactive-machine-learning-list/) collection).

I made some points about [tensor diagrams here](https://medium.com/@pmigdal/in-the-topic-of-diagrams-i-did-write-a-review-simple-diagrams-of-convoluted-neural-networks-6418a63f9281). In particular, I recommend

* [Einsum is All you Need - Einstein Summation in Deep Learning](https://rockt.github.io/2018/04/30/einsum) by Tim Rocktäschel.
* [Matrices as Tensor Network Diagrams](https://www.math3ma.com/blog/matrices-as-tensor-network-diagrams) by [Tai-Danae Bradley](https://twitter.com/math3ma):

![Scalar, vector, matrix, tensor - a drawing by Tai-Danae Bradley](https://uploads-ssl.webflow.com/5b1d427ae0c922e912eda447/5cd99a73f8ce4494ad86852e_arraychart.jpg)

Beware, that PyTorch can be tricky with the tensor dimension order:

* [Inconsistent dimension ordering for 1D networks - NCL vs NLC vs LNC]()

For a critique of 

* [Named tensors](http://nlp.seas.harvard.edu/NamedTensor) and [Named tensors (part 2)](http://nlp.seas.harvard.edu/NamedTensor2) by Alexander Rush - a proposal of type-checking tensor dimensions

> Is it only me, or does "Theano tensor dimension order" sound like some secret convent? - [Piotr Migdał's tweet](https://twitter.com/pmigdal/status/961344490500952070)
