# Einstein summation

For matrices $a_{(i,j)}$  and $b_{(j,k)}$, a common subscript $j$ specifies the summation axis along which to add the elementwise multiplications $a_{i j} * b_{j k}$. This summation notation $a_{i j} b_{j k}$ is called Einstein summation. The dimension size ```J``` along the axis ```j``` matches in the shapes of ```a:(I,J)``` and ```b:(J,K)```.


$
\begin{align*}
a_{i j} b_{j k}
&= \sum\limits ^J_{j} a_{ij} * b_{jk}
\\
&=  a_{i 1} * b_{1 k} + a_{i 2} * b_{2 k} + \dots + a_{i j} * b_{j k} + \dots + a_{i J} * b_{j J}
\end{align*}
$



* [Einstein Summation](https://mathworld.wolfram.com/EinsteinSummation.html)

>Einstein summation is a notational convention for simplifying expressions including summations of vectors, matrices, and general tensors. There are essentially three rules of Einstein summation notation, namely:
>1. Repeated indices are implicitly summed over.
>2. Each index can appear at most twice in any term.
>3. Each term must contain identical non-repeated indices.




For ```np.einsum("ij,j", a, b)``` of the **green rectangle** in the diagram from the youtube, ```j``` is the dummy index and the element wise multiplication ```a[i][j] * b[j]``` is summed up along the ```j``` axis as $ \sum\limits_{j} (a_{ij} * b_j) $ . 


* [Einstein Summation Convention: an Introduction](https://www.youtube.com/watch?v=CLrTj7D2fLM)


<img src="images/einstein_summation.png" align="left">

In [1]:
import numpy as np

In [2]:
a = np.arange(6).reshape((2,3))
b = np.arange(10,14).reshape((2,2))
print(a)
print(b)

[[0 1 2]
 [3 4 5]]
[[10 11]
 [12 13]]


## $np.einsum(\text{"ij,il"}, a, b)$

For the dummy index $i$, which is common in $a_{ik}$ and $ b_{il}$,  ```np.einsum(a, b)``` appies the operation $a_{i k} b_{i l} $ for all the (k, l) combinations.

$
\begin{align*}
a_{i k} b_{i l} &= a_{0 k} * b_{0 l} + a_{1 k} * b_{1l} 
\\
&= \sum\limits _{i} a_{ik} * b_{il}
\\
&= a^T[k] \cdot b^T[l]
\end{align*}
$

The dummy index can appear anywhere as long as the rules (please see the youtube for details) are met. For the dummy index ```i``` in ```np.einsum(“ik,il", a, b)```, it is a row index of the matrices ```a``` and ```b```, hence a column from ```a```  and that from ```b``` are extracted to generate the **dot product**s. 

<img src="images/einsum_operation.png" align="left">



## Output form

Because the summation occurs along the **dummy index**, the dummy index disappears in the result matrix, hence ```i``` from ```“ik,il"``` is dropped and form the shape ```(k,l)```. We can tell ```np.einsum("... -> <shape>")``` to specify the output form by the **output subscript labels** with ```->``` identifier.

See the **explicit mode** in [numpy.einsum](https://numpy.org/doc/stable/reference/generated/numpy.einsum.html) for details.

>In explicit mode the output can be directly controlled by specifying output subscript labels. This requires the identifier ```‘->’``` as well as the list of output subscript labels. This feature increases the flexibility of the function since summing can be disabled or forced when required. The call ```np.einsum('i-', a)``` is like ```np.sum(a,axis=-1)```, and ```np.einsum('ii->i', a)``` is like ```np.diag(a)```. <br>
The difference
is that einsum does not allow broadcasting by default. Additionally
```np.einsum('ij,jh->ih', a, b)``` directly specifies the order of the
output subscript labels and therefore returns matrix multiplication,
unlike the example above in implicit mode.


For a, i is dummy index and row index. Hence it extracts column j (all i values)
For b, i is dummy index and row index. Hence it extracts column l
Then dot product of the columns are generated.
e.g. 
```
a[i=0,1][j=0] from a -> [0,3]
b[i=0,1][l=0] from b -> [10, 12]
36 = np.dot([0,3], [10, 12])
```
The result shape is (3,2) because 
```
a[i=*][j=0,1,2] -> 3
b[i=*][l=0,1]   -> 2 
```
c = np.einsum("ij,il", a, b)    # Shape (3,2)

In [3]:
np.einsum("ik,il", a, b)

array([[36, 39],
       [58, 63],
       [80, 87]])

## No dummy index 

No summation.
1. A term (subscript Indices, e.g. "ij") selects an element in each array.
2. Each left-hand side element is applied on the element on the right-hand side for element-wise multiplication (hence multiplication always happens).

```a``` has shape (2,3) each element of which is applied to ```b``` of shape (2,2). Hence it creates a matrix of shape ```(2,3,2,2)``` without no summation as ```(i,j)```, ```(k.l)``` are all free indices.

In [4]:
# --------------------------------------------------------------------------------
# For np.einsum("ij,kl", a, b)
# 1-1: Term "ij" or (i,j), two free indices, selects selects an element a[i][j].
# 1-2: Term "kl" or (k,l), two free indices, selects selects an element b[k][l].
# 2:   Each a[i][j] is applied on b[k][l] for element-wise multiplication a[i][j] * b[k,l]
# --------------------------------------------------------------------------------
# for (i,j) in a:
#    for(k,l) in b:
#        a[i][j] * b[k][l]
np.einsum("ij,kl", a, b)

array([[[[ 0,  0],
         [ 0,  0]],

        [[10, 11],
         [12, 13]],

        [[20, 22],
         [24, 26]]],


       [[[30, 33],
         [36, 39]],

        [[40, 44],
         [48, 52]],

        [[50, 55],
         [60, 65]]]])

---

## dot prducts of Matrix A rows and Matrix B columns


<img src="images/dot_proucts_from_matrix_A_rows_and_B_rows.png" align="left" />

In [4]:
import numpy as np
A = np.matrix('0 1 2; 3 4 5')
B = np.matrix('0 -3; -1 -4; -2 -5');

fmt="np.einsum('ij,ji->i', A, B)"
print(f"{fmt} = {np.einsum('ij,ji->i', A, B)}")

fmt="np.diagonal(np.matmul(A,B))"
print(f"{fmt} = {np.diagonal(np.matmul(A,B))}")

fmt="(A*B).diagonal()"
print(f"{fmt} = {(A*B).diagonal()}")

np.einsum('ij,ji->i', A, B) = [ -5 -50]
np.diagonal(np.matmul(A,B)) = [ -5 -50]
(A*B).diagonal() = [[ -5 -50]]
