Mental model to keep in mind

•  Each letter = one axis.
•  A letter that appears:
◦  In multiple operands and not in output → that axis is summed over.
◦  In operands and in output → that axis is kept (not summed).
◦  Only once in the whole expression → that axis is just carried through.
•  Implicit mode: 'ij,jk' → output indices are all letters that are not summed, in alphabetical order.
•  Explicit mode: 'ij,jk->ik' → you control output order explicitly.
•  ... (ellipsis) = “all the leftover axes here”.

### Exercises

In [2]:
import numpy as np

# rng = np.random.default_rng(0)

0.1. Create a 1D array a of length 5.  
Task:  
•  Write an einsum call that is exactly a no-op view: same data, no sum, no axis reordering.  
◦  Check: np.einsum(?, a) is a (same shape, and is gives False/True?).

0.2. Same a.  
Task:  
•  Sum all elements of a using einsum.  
◦  Compare to np.sum(a).

0.3. Create a 2D array A of shape (3, 4).  
Tasks:  
•  Sum all elements with einsum in two different ways (one implicit, one explicit).  
◦  Compare to np.sum(A).

0.4. For A (3, 4):  
Tasks:  
•  Sum over rows (axis 0) with einsum. Compare to np.sum(A, axis=0).  
•  Sum over columns (axis 1) with einsum. Compare to np.sum(A, axis=1).

In [3]:
a = np.arange(5)
print(a)

[0 1 2 3 4]


In [4]:
#1 
ans = np.einsum('i', a)
print(ans)

[0 1 2 3 4]


In [5]:
sum_all = np.einsum('i->', a)
print(sum_all)

10


In [6]:
b = np.arange(12).reshape(3,4)
print(b)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [7]:
# 3 rows, 4 columns
# index 'i' for rows, index 'j' for columns
einsum_sum_rows = np.einsum('ij->i', b)
ref_sum_rows = b.sum(axis=1)
print(einsum_sum_rows)
print(np.allclose(einsum_sum_rows, ref_sum_rows), einsum_sum_rows.shape, ref_sum_rows.shape)


einsum_sum_cols = np.einsum('ij->j', b)
ref_sum_cols = b.sum(axis=0)
print(einsum_sum_cols)
print(np.allclose(einsum_sum_cols, ref_sum_cols), einsum_sum_cols.shape, ref_sum_cols.shape)


einsum_total_sum = np.einsum('ij->', b)
ref_total_sum = b.sum()
print(einsum_total_sum)
print(np.allclose(einsum_total_sum, ref_total_sum), einsum_total_sum.shape, ref_total_sum.shape)

[ 6 22 38]
True (3,) (3,)
[12 15 18 21]
True (4,) (4,)
66
True () ()


Level 1 – Vectors, inner/outer, and explicit vs implicit

Use vectors x, y of length 4: x = rng.normal(size=4); y = rng.normal(size=4).

Exercises

1.1. Inner product of two vectors  
•  Use einsum to compute x·y.  
•  Compare to np.dot(x, y) and np.inner(x, y).

1.2. Outer product of two vectors  
•  Use einsum to compute the outer product of x and y.  
•  Compare to np.outer(x, y).

1.3. Squared L2 norm  
•  Use einsum to compute ||x||² = ∑ᵢ xᵢ² in two ways:
◦  Using one operand.
◦  Using x twice as operands.  
•  Compare to np.sum(x**2).

1.4. Implicit vs explicit  
Take A of shape (3, 4).

•  Write an implicit expression 'ij' (no ->) and inspect np.einsum('ij', A).shape.
•  Write an implicit expression 'ji' and inspect the shape.
•  Write an explicit expression 'ij->ji' and compare to A.T.  
Explain to yourself why 'ij' and 'ij->ij' are different (shape same, but rules differ for summation vs no op).


In [8]:
x = np.arange(4).reshape(4)
y = np.arange(4).reshape(4)
print(x)
print(y)

[0 1 2 3]
[0 1 2 3]


In [9]:
ref_dot = np.dot(x, y)
print(ref_dot)

14


In [10]:
#  If you include '->out_sub' part, that's explicit mode: you control the exact output labels and their order.
#  If you omit '->', einsum uses implicit mode: it follows Einstein summation rules and outputs axes ordered alphabetically by label (this can reorder axes).
epinsum_dot = np.einsum('i,i->', x, y)
print(epinsum_dot)
# but
epinsum_dot_columnwise = np.einsum('i,i->i', x, y)
print(epinsum_dot_columnwise)
# alphabetic order of indices matters
epinsum_dot_columnwise_alpha = np.einsum('i,i', x, y)
print(epinsum_dot_columnwise_alpha)

14
[0 1 4 9]
14


In [11]:
# 1.2. Outer product of two vectors  
ref_outer = np.outer(x, y)
print(ref_outer)
epinsum_outer = np.einsum('i,j->i',x,y)
print(epinsum_outer)

[[0 0 0 0]
 [0 1 2 3]
 [0 2 4 6]
 [0 3 6 9]]
[ 0  6 12 18]


In [12]:
# 1.3. Squared L2 norm  
z = np.arange(12).reshape(3,4)
l2_norm_ref = np.sum(z**2)
print(l2_norm_ref)
einsum_l2_sum = np.einsum('ij,ij->',z,z)
print(einsum_l2_sum)

506
506


In [13]:
# shapes
# same shape
print(np.einsum('ij',z).shape)
print(np.einsum('ij->ij',z).shape)
# transposed shape
print(np.einsum('ji',z).shape)
print(np.einsum('ij->ji',z).shape)


(3, 4)
(3, 4)
(4, 3)
(4, 3)


In [14]:
m = np.arange(16).reshape(4,4)
print(m)
# hint: diagnal matrix
print(np.einsum('ii', m))
print(np.einsum('jj', m))
print(np.einsum('ii->i', m))
print(np.einsum('jj->j', m))

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
30
30
[ 0  5 10 15]
[ 0  5 10 15]


Level 2 – Matrix–vector, matrix–matrix, and batched matmul

In [15]:
A = np.arange(12).reshape(3,4)
B = np.arange(20).reshape(4,5)
v = np.arange(4).reshape(4)
print(A)
print(B)
print(v)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]
[0 1 2 3]


In [16]:
# 2.1. Matrix–vector product  
ref_mat_vec_prod = np.matmul(A,v) # 3,4 @ 4 = 3
print(ref_mat_vec_prod)
episum_mat_vec_prod = np.einsum('ij,j->i',A,v) # not ij,i -> i
print(episum_mat_vec_prod)


[14 38 62]
[14 38 62]


In [26]:
# matrix-matrix product
ref_mat_mat_prod = np.matmul(A,B) # 3,4 @ 4,5 = 3,5
print(ref_mat_mat_prod)
eisum_mat_mat_prod = np.einsum('ij,jk->ik',A,B)
print(eisum_mat_mat_prod)
print(np.allclose(ref_mat_mat_prod, eisum_mat_mat_prod), eisum_mat_mat_prod.shape, ref_mat_mat_prod.shape)

[[ 70  76  82  88  94]
 [190 212 234 256 278]
 [310 348 386 424 462]]
[[ 70  76  82  88  94]
 [190 212 234 256 278]
 [310 348 386 424 462]]
True (3, 5) (3, 5)


In [31]:
# Batched matrix–matrix product
X_BLD = np.arange(120).reshape(10,3,4) 
y_BDF = np.arange(200).reshape(10,4,5)

In [32]:
ref_batched_mat_mat_prod_BDF = np.matmul(X_BLD,y_BDF) # 10,3,4 @ 10,4,5 = 10,3,5
print(ref_batched_mat_mat_prod_BDF.shape)
einsum_batched_mat_mat_prod_BDF = np.einsum('ijk,ikm->ijm',X_BLD,y_BDF)
print(einsum_batched_mat_mat_prod_BDF.shape)

(10, 3, 5)
(10, 3, 5)


In [None]:
#  Batched matrix–vector product
ref_batched_mat_vec_prod_BLD =