## Exercises

1. Prove that the transpose of the transpose of a matrix is the matrix itself: $(\mathbf{A}^\top)^\top = \mathbf{A}$.
1. Given two matrices $\mathbf{A}$ and $\mathbf{B}$, show that sum and transposition commute: $\mathbf{A}^\top + \mathbf{B}^\top = (\mathbf{A} + \mathbf{B})^\top$.
1. Given any square matrix $\mathbf{A}$, is $\mathbf{A} + \mathbf{A}^\top$ always symmetric? Can you prove the result by using only the results of the previous two exercises?
1. We defined the tensor `X` of shape (2, 3, 4) in this section. What is the output of `len(X)`? Write your answer without implementing any code, then check your answer using code. 
1. For a tensor `X` of arbitrary shape, does `len(X)` always correspond to the length of a certain axis of `X`? What is that axis?
1. Run `A / A.sum(axis=1)` and see what happens. Can you analyze the results?
1. When traveling between two points in downtown Manhattan, what is the distance that you need to cover in terms of the coordinates, i.e., in terms of avenues and streets? Can you travel diagonally?
1. Consider a tensor of shape (2, 3, 4). What are the shapes of the summation outputs along axes 0, 1, and 2?
1. Feed a tensor with three or more axes to the `linalg.norm` function and observe its output. What does this function compute for tensors of arbitrary shape?
1. Consider three large matrices, say $\mathbf{A} \in \mathbb{R}^{2^{10} \times 2^{16}}$, $\mathbf{B} \in \mathbb{R}^{2^{16} \times 2^{5}}$ and $\mathbf{C} \in \mathbb{R}^{2^{5} \times 2^{14}}$, initialized with Gaussian random variables. You want to compute the product $\mathbf{A} \mathbf{B} \mathbf{C}$. Is there any difference in memory footprint and speed, depending on whether you compute $(\mathbf{A} \mathbf{B}) \mathbf{C}$ or $\mathbf{A} (\mathbf{B} \mathbf{C})$. Why?
1. Consider three large matrices, say $\mathbf{A} \in \mathbb{R}^{2^{10} \times 2^{16}}$, $\mathbf{B} \in \mathbb{R}^{2^{16} \times 2^{5}}$ and $\mathbf{C} \in \mathbb{R}^{2^{5} \times 2^{16}}$. Is there any difference in speed depending on whether you compute $\mathbf{A} \mathbf{B}$ or $\mathbf{A} \mathbf{C}^\top$? Why? What changes if you initialize $\mathbf{C} = \mathbf{B}^\top$ without cloning memory? Why?
1. Consider three matrices, say $\mathbf{A}, \mathbf{B}, \mathbf{C} \in \mathbb{R}^{100 \times 200}$. Construct a tensor with three axes by stacking $[\mathbf{A}, \mathbf{B}, \mathbf{C}]$. What is the dimensionality? Slice out the second coordinate of the third axis to recover $\mathbf{B}$. Check that your answer is correct.


In [1]:
import torch

In [10]:
# QUESTION 1
torch.manual_seed(1)
m = torch.randint(1, 10, (1,)).item()  
n = torch.randint(1, 10, (1,)).item()  
A = torch.rand(m, n)
print(A)
print((A.T).T)

torch.equal(A, (A.T).T)

tensor([[0.4031, 0.7347, 0.0293, 0.7999, 0.3971, 0.7544],
        [0.5695, 0.4388, 0.6387, 0.5247, 0.6826, 0.3051],
        [0.4635, 0.4550, 0.5725, 0.4980, 0.9371, 0.6556],
        [0.3138, 0.1980, 0.4162, 0.2843, 0.3398, 0.5239],
        [0.7981, 0.7718, 0.0112, 0.8100, 0.6397, 0.9743]])
tensor([[0.4031, 0.7347, 0.0293, 0.7999, 0.3971, 0.7544],
        [0.5695, 0.4388, 0.6387, 0.5247, 0.6826, 0.3051],
        [0.4635, 0.4550, 0.5725, 0.4980, 0.9371, 0.6556],
        [0.3138, 0.1980, 0.4162, 0.2843, 0.3398, 0.5239],
        [0.7981, 0.7718, 0.0112, 0.8100, 0.6397, 0.9743]])


True

In [13]:
# QUESTION 2
torch.manual_seed(2)
m = torch.randint(1, 10, (1,)).item()  
n = torch.randint(1, 10, (1,)).item()  
A = torch.rand(m, n)
B = torch.rand(m, n)
print(A.T + B.T)
print((A + B).T)

torch.equal(A.T + B.T, (A + B).T)

tensor([[0.6944],
        [1.0402],
        [1.2468],
        [1.0091],
        [1.3514],
        [0.6291],
        [1.3215]])
tensor([[0.6944],
        [1.0402],
        [1.2468],
        [1.0091],
        [1.3514],
        [0.6291],
        [1.3215]])


True

In [16]:
# QUESTION 3
# Symmetric matrix = matrix that is equal to its transpose
# Prove that A + A.T = (A + A.T).T

torch.manual_seed(3)
n = torch.randint(1, 10, (1,)).item()  
A = torch.rand(n, n)
print(torch.equal(A+A.T, (A+A.T).T))

# Using past 2 exercises only:
    # (A.T + B.T) = (A + B).T
# Let B = A.T 
    # (A.T + A.T.T) = (A + A.T).T
# Since A.T.T = A:
    # (A.T + A) = (A + A.T).T
    # (A + A.T) = (A + A.T).T (shown)

True


In [21]:
# QUESTION 4
# len = 2
X = torch.rand(2, 3, 4)
X.shape, len(X)

(torch.Size([2, 3, 4]), 2)

In [None]:
# QUESTION 5
# The 0-th axis

In [26]:
# QUESTION 6
print(A)
print(A / A.sum(axis=1)) #performs broadcasting: each element of row i in A is divided by the sum of elements in row i. The result is a matrix where each row sums to 1, effectively normalizing each row.

tensor([[0.1056, 0.2858, 0.0270, 0.4716, 0.0601],
        [0.7719, 0.7437, 0.5944, 0.8879, 0.4510],
        [0.7995, 0.1498, 0.4015, 0.0542, 0.4594],
        [0.1756, 0.9492, 0.8473, 0.8749, 0.6483],
        [0.2148, 0.9493, 0.0121, 0.1809, 0.1877]])
tensor([[0.1111, 0.0829, 0.0145, 0.1349, 0.0389],
        [0.8124, 0.2156, 0.3188, 0.2540, 0.2920],
        [0.8415, 0.0434, 0.2153, 0.0155, 0.2974],
        [0.1848, 0.2752, 0.4545, 0.2503, 0.4197],
        [0.2260, 0.2753, 0.0065, 0.0517, 0.1215]])


In [29]:
# QUESTION 7
# Distance = sum of length and width of manhattan
# No diagonal travelling possible

In [37]:
# QUESTION 8
X = torch.rand(2, 3, 4)
print(X)
print(X.shape)
print(X.sum(axis=0))
print(X.sum(axis=0).shape)
print(X.sum(axis=1))
print(X.sum(axis=1).shape)
print(X.sum(axis=2))
print(X.sum(axis=2).shape)

tensor([[[0.6303, 0.8147, 0.1841, 0.0459],
         [0.5515, 0.8334, 0.8044, 0.4928],
         [0.3666, 0.6594, 0.8477, 0.0693]],

        [[0.4527, 0.8780, 0.8852, 0.9630],
         [0.6931, 0.9937, 0.0165, 0.5946],
         [0.5719, 0.8448, 0.5996, 0.8202]]])
torch.Size([2, 3, 4])
tensor([[1.0829, 1.6927, 1.0693, 1.0090],
        [1.2446, 1.8271, 0.8208, 1.0875],
        [0.9385, 1.5042, 1.4472, 0.8895]])
torch.Size([3, 4])
tensor([[1.5484, 2.3074, 1.8361, 0.6081],
        [1.7176, 2.7166, 1.5013, 2.3779]])
torch.Size([2, 4])
tensor([[1.6750, 2.6821, 1.9430],
        [3.1790, 2.2979, 2.8365]])
torch.Size([2, 3])


In [44]:
# QUESTION 9
X = torch.rand(2, 3, 4, 5)
norm = torch.linalg.norm(X)
print(norm) 
# Frobenius norm

tensor(6.5932)


In [46]:
# QUESTION 10
# Calculating AB first results in a matrix of size 2^10 x 2^5
# Thus to calculate (AB)C we then calculate a 2^10 x 2^5 matrix with a 2^5 x 2^14 matrix

# Calculating BC first results in a matrix of size 2^5 x 2^14 (more then when calculating AB first)
# Thus to calculate A(BC) we then calculate a 2^5 x 2^14 matrix with a 2^14 x 2^16 matrix

# Thus (AB)C is faster and less memory-intensive than A(BC)

In [47]:
# QUESTION 11
# AB is straightforward and efficient 
# AC.T involves transposing, which introduces overhead, and the multiplication is less cache-efficient
# If we initialize C=B.T then we will perform AC.T faster as no additional memory is made for the transpose

In [53]:
# QUESTION 12
A = torch.randn((100,200))
B = torch.randn((100,200))
C = torch.randn((100,200))
ABC = torch.stack([A, B, C])
print(ABC.shape)
print(ABC[1])
print(torch.equal(B, ABC[1]))

torch.Size([3, 100, 200])
tensor([[-0.1947, -1.4064,  0.7817,  ..., -0.8303, -0.2056, -0.0417],
        [ 0.0946,  0.1747, -0.2156,  ...,  1.4695, -0.2440,  1.3265],
        [-1.1548,  1.0822,  0.3364,  ..., -0.4410, -2.6759, -1.0082],
        ...,
        [ 0.5379, -0.0781,  0.3327,  ..., -0.2385,  0.2094, -0.5183],
        [ 1.2246, -0.6464, -0.0763,  ..., -0.4052,  0.5520, -0.8774],
        [ 1.5563,  1.0051, -0.1847,  ...,  0.8655,  0.4168, -0.2630]])
True
