<a href="https://colab.research.google.com/github/hellocybernetics/TensorFlow2.0_Eager_Execution_Tutorials/blob/master/tutorials/99_others/basic_einsum.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -q --upgrade tf-nightly-2.0-preview

[K    100% |████████████████████████████████| 86.5MB 286kB/s 
[K    100% |████████████████████████████████| 3.0MB 8.0MB/s 
[K    100% |████████████████████████████████| 358kB 10.0MB/s 
[K    100% |████████████████████████████████| 61kB 22.4MB/s 
[?25h  Building wheel for wrapt (setup.py) ... [?25ldone
[31mthinc 6.12.1 has requirement wrapt<1.11.0,>=1.10.0, but you'll have wrapt 1.11.1 which is incompatible.[0m
[31mspacy 2.0.18 has requirement numpy>=1.15.0, but you'll have numpy 1.14.6 which is incompatible.[0m
[31mfastai 1.0.51 has requirement numpy>=1.15, but you'll have numpy 1.14.6 which is incompatible.[0m
[?25h

In [0]:
import tensorflow as tf

### scalar * scalar
$$
y = ax
$$

In [6]:
x = tf.constant(1.0)
a = tf.constant(2.0)

y = tf.einsum(",->", a, x)
y

<tf.Tensor: id=20, shape=(), dtype=float32, numpy=2.0>

### scalar * vector

$$
{\bf y} = a {\bf x}
$$

This calculation is represented with Ingredient indication as below.

$$
y[i] = a x[i]
$$

In [7]:
x = tf.constant([1.0, 2.0])
a = tf.constant(1.0)

y = tf.einsum(",i->i", a, x)
y

<tf.Tensor: id=32, shape=(2,), dtype=float32, numpy=array([1., 2.], dtype=float32)>

### vector * vector with dot product

$$
y =\bf a^T x
$$

This calculation is represented with Ingredient indication as below.

$$
y = \sum_i a[i]x[i]
$$

In [8]:
x = tf.constant([1.0, 2.0, 3.0])
a = tf.constant([1.0, 1.0, 1.0])

y = tf.einsum("i,i->", a, x)
y

<tf.Tensor: id=48, shape=(), dtype=float32, numpy=6.0>

### vector * vector with hadamard product

$$
\bf y = a \odot x
$$

This calculation is represented with Ingredient indication as below.

$$
y[i] = a[i]x[i]
$$

In [9]:
x = tf.constant([1.0, 2.0, 3.0])
a = tf.constant([1.0, 1.0, 1.0])

y = tf.einsum("i,i->i", a, x)
y

<tf.Tensor: id=58, shape=(3,), dtype=float32, numpy=array([1., 2., 3.], dtype=float32)>

### vector * vector with direct product

$$
\bf y = a \otimes x
$$

This calculation is represented with Ingredient indication as below.

$$
y[i, j] = a[i]x[j]
$$

In [10]:
x = tf.constant([1.0, 2.0, 3.0])
a = tf.constant([1.0, 1.0, 1.0])

y = tf.einsum("i,j->ij", a, x)
y

<tf.Tensor: id=72, shape=(3, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]], dtype=float32)>

### scalar * matrix


$$
{\bf Y} = a {\bf X}
$$

This calculation is represented with Ingredient indication as below.

$$
y[i, j] = ax[i, j]
$$

In [11]:
x = tf.constant([[1.0, 2.0, 3.0],
                 [4.0, 5.0, 6.0],
                 [7.0, 8.0, 9.0]])
a = tf.constant(2.0)

y = tf.einsum(",ij->ij", a, x)
y

<tf.Tensor: id=86, shape=(3, 3), dtype=float32, numpy=
array([[ 2.,  4.,  6.],
       [ 8., 10., 12.],
       [14., 16., 18.]], dtype=float32)>

### vector * matrix

$$
{\bf y} = \bf a^T X
$$

This calculation is represented with Ingredient indication as below.

$$
y[j] = \sum_i a[i]x[i, j]
$$

In [12]:
x = tf.constant([[1.0, 2.0, 3.0],
                 [4.0, 5.0, 6.0],
                 [7.0, 8.0, 9.0]])
a = tf.constant([2.0, 2.0, 2.0])


y = tf.einsum("i,ij->j", a, x)
y

<tf.Tensor: id=100, shape=(3,), dtype=float32, numpy=array([24., 30., 36.], dtype=float32)>

### matrix * vector

$$
{\bf y} = \bf A x
$$

This calculation is represented with Ingredient indication as below.

$$
y[i] = \sum_j a[i,j]x[ j]
$$

In [13]:
x = tf.constant([1.0, 2.0, 3.0])
a = tf.constant([[2.0, 2.0, 2.0],
                 [2.0, 2.0, 2.0],
                 [2.0, 2.0, 2.0]])

y = tf.einsum("ij,j->i", a, x)
y

<tf.Tensor: id=114, shape=(3,), dtype=float32, numpy=array([12., 12., 12.], dtype=float32)>

###  Einstein summation convention
No need to worry about the order of multiplication  when using ingredient indication because of just dealing with scalars. The above caluclation, 

$$
\bf y = Ax
$$

is represented by another ingredient indication.

$$
y[i] = \sum_j x[ j]a[i, j]
$$

And we understand below is same caluclation.

$$
y[k] = \sum_i x[i]a[k, i]
$$

You only have to worry about which axis you want to take the sum. In addition, we do NOT care of using which alphabet for a index. If we see below caluclation without $\sum$ , 

$$
y[i] = x[ j]a[i, j]
$$

don't we understand this mean? The index $j$ disappears on the left side, so we regard the index $j$  as used for $\sum_j$. However if we see below caluclation,

$$
 ... + x[ j]a[i, j] + z[k]b[k,i] + ...
$$

how do we understand this? If this equation is very very long,  we must be able to imagine calculations from this partial formulas. In this case (No, actually in all cases!) focus on the index that appears in common in one term. The $x[j]a[i,j]$ term have common index$j$. So the index is for $\sum$. The $z[k]a[k,i]$ term have common index$k$. So the index is for $\sum$. 

This rule is called "Einstein summation convention".

### matrix * matrix

$$
\bf Y = AX
$$

This is represented with Einstein summation convention.

$$
Y[i, k] = A[i,j]X[j,k]
$$

In [15]:
x = tf.constant([[1.0, 2.0, 3.0],
                 [4.0, 5.0, 6.0],
                 [7.0, 8.0, 9.0]])
a = tf.constant([[1.0, 1.0, 1.0],
                 [1.0, 1.0, 1.0],
                 [1.0, 1.0, 1.0]])


y = tf.einsum("ij,jk->ik", a, x)
y

<tf.Tensor: id=134, shape=(3, 3), dtype=float32, numpy=
array([[12., 15., 18.],
       [12., 15., 18.],
       [12., 15., 18.]], dtype=float32)>

### Batch matmul
A batch data is ${\bf X} \in \mathcal R^{N\times D}$. A weight matrix is ${\bf W}\in \mathcal R^{D\times M}$. A bias vector is ${\bf b} \in \mathcal R^{M}$. Batch matmul is 

$$
\bf Y = XW + 1_Nb^T
$$

where $\bf 1_N$ is a $N$-dim vector of all components $1$, so ${\bf 1_N b^T} \in R^{N\times M}$.

In Einstein summation convention.

$$
Y[n, m] = X[n,d]W[d,m] + 1[n]b[m]
$$

In [25]:
X = tf.constant([[1.0, 2.0, 3.0],
                 [4.0, 5.0, 6.0],
                 [7.0, 8.0, 9.0],
                 [-7.0, -8.0, -9.0],
                 [-4.0, -5.0, -6.0]])
W = tf.constant([[1.0, 2.0],
                 [-1.0, -1.0],
                 [3.0, 2.0]])

b = tf.constant([3., -1])
ones = tf.ones(shape=[5])

y = tf.einsum("nd,dm->nm", X, W) + tf.einsum("n,m->nm", ones, b)
y

<tf.Tensor: id=306, shape=(5, 2), dtype=float32, numpy=
array([[ 11.,   5.],
       [ 20.,  14.],
       [ 29.,  23.],
       [-23., -25.],
       [-14., -16.]], dtype=float32)>

TensorFlow have a function of broadcast, so we don't need to discribe bias vector $\bf b$ into $\bf 1_Nb$ 

In [28]:
y = tf.einsum("nd,dm->nm", X, W) + b
y

<tf.Tensor: id=324, shape=(5, 2), dtype=float32, numpy=
array([[ 11.,   5.],
       [ 20.,  14.],
       [ 29.,  23.],
       [-23., -25.],
       [-14., -16.]], dtype=float32)>