In [7]:
import torch

### 1. Prove that the transpose of the transpose of a matrix is the matrix itself: $(A^T)^T = A$.

In [11]:
A = torch.tensor([[1.0, 2.0, 4.0], [3.0, 4.0, 5.0], [5.0, 6.0, 7.0]])
print(A)


tensor([[1., 2., 4.],
        [3., 4., 5.],
        [5., 6., 7.]])


In [15]:
B = A.T
print(B)

tensor([[1., 3., 5.],
        [2., 4., 6.],
        [4., 5., 7.]])


In [17]:
B.T == A

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

### 2. Given two matrices A and B, show that sum and transposition commute: $(A + B)^T = A^T + B^T$.

In [22]:
C = A + B

In [24]:
C.T == A.T + B.T

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

### 3. Given any square matrix A, is $A + A^T$ always symmetric? Can you prove the result by using only the results of the previous two exercises?

In [25]:
A + A.T

tensor([[ 2.,  5.,  9.],
        [ 5.,  8., 11.],
        [ 9., 11., 14.]])

### 4. We defined the tensor X of shape (2, 3, 4) in this section. What is the output of len(X)? Write your answer without implementing any code, then check your answer using code.

In [50]:
X = torch.arange(24).reshape(2, 3, 4)
print(X)

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])


In [51]:
len(X)

2

### 5. For a tensor X of arbitrary shape, does len(X) always correspond to the length of a certain axis of X? What is that axis?

axis 0

In [52]:
len(X)

2

 * If X has shape (3, 4), len(X) is 3.
* If X has shape (2, 5, 6), len(X) is 2.

### 6. Run A / A.sum(axis=1) and see what happens. Can you analyze the results?

In [53]:
A/A.sum(axis=1)


tensor([[0.1429, 0.1667, 0.2222],
        [0.4286, 0.3333, 0.2778],
        [0.7143, 0.5000, 0.3889]])

In [55]:
A / A.sum(axis=1, keepdims=True)

tensor([[0.1429, 0.2857, 0.5714],
        [0.2500, 0.3333, 0.4167],
        [0.2778, 0.3333, 0.3889]])

In [56]:
A.mean(), A.sum()/A.numel()

(tensor(4.1111), tensor(4.1111))

### 7. When traveling between two points in downtown Manhattan, what is the distance that you need to cover in terms of the coordinates, i.e., in terms of avenues and streets? Can you travel diagonally?

> Manhattan Distance = |x₂ - x₁| + |y₂ - y₁|

### 8. Consider a tensor of shape (2, 3, 4). What are the shapes of the summation outputs along axes 0, 1, and 2?

In [61]:
X

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

In [57]:
X.sum(axis=0)

tensor([[12, 14, 16, 18],
        [20, 22, 24, 26],
        [28, 30, 32, 34]])

In [59]:
X.sum(axis=1)

tensor([[12, 15, 18, 21],
        [48, 51, 54, 57]])

In [60]:
X.sum(axis=2)

tensor([[ 6, 22, 38],
        [54, 70, 86]])

### 9. Feed a tensor with three or more axes to the linalg.norm function and observe its output. What does this function compute for tensors of arbitrary shape?

In [66]:

# Shape: (2, 2, 2) - 3 boyutlu tensor
tensor = torch.tensor([[[1.0, 2.0], [3.0, 4.0]], [[5.0, 6.0], [7.0, 8.0]]])

result = torch.linalg.norm(tensor)
print(result)  # tensor(14.2829)

tensor(14.2829)


### 10. Consider three large matrices, say A ∈ R^(210×216), B ∈ R^(216×25) and C ∈ R^(25×214), initialized with Gaussian random variables. You want to compute the product ABC. Is there any difference in memory footprint and speed, depending on whether you compute (AB)C or A(BC)? Why?

In [97]:
A = torch.randn(210, 216)
B = torch.randn(216, 25)
C = torch.randn(25, 214)

In [None]:
D = torch.mm(A, B)

In [104]:
torch.mm(D, C)

tensor([[  56.6528,   26.2997,   40.8603,  ...,   21.3501, -100.0776,
          -71.0474],
        [ -15.0712,  223.0193,    9.4851,  ...,  -88.7356,  -87.2293,
          -32.0949],
        [ -58.4807,   54.9840, -132.1190,  ...,  107.7717,   78.8377,
           61.3944],
        ...,
        [ -94.5637,  -88.8546,  -52.8458,  ...,   39.6504,   88.7390,
          -93.2231],
        [  68.4794,  143.8714,   58.9820,  ...,  -32.5493,   17.5290,
          -46.4282],
        [  57.9212,   40.8753,  -44.9338,  ...,    5.5768,   46.6423,
          -51.4375]])

In [107]:
E = torch.mm(B, C)

In [108]:
torch.mm(A,E)

tensor([[  56.6528,   26.2997,   40.8603,  ...,   21.3501, -100.0776,
          -71.0474],
        [ -15.0712,  223.0194,    9.4851,  ...,  -88.7356,  -87.2294,
          -32.0949],
        [ -58.4808,   54.9840, -132.1190,  ...,  107.7717,   78.8377,
           61.3944],
        ...,
        [ -94.5637,  -88.8546,  -52.8458,  ...,   39.6504,   88.7390,
          -93.2231],
        [  68.4794,  143.8714,   58.9820,  ...,  -32.5493,   17.5290,
          -46.4282],
        [  57.9212,   40.8753,  -44.9339,  ...,    5.5768,   46.6424,
          -51.4374]])

### 11. Consider three large matrices, say A ∈ R^(210×216), B ∈ R^(216×25) and C ∈ R^(25×216). Is there any difference in speed depending on whether you compute AB or AC^T? Why? What changes if you initialize C = B^T without cloning memory? Why?

### 12. Consider three matrices, say A, B, C ∈ R^(100×200). Construct a tensor with three axes by stacking [A, B, C]. What is the dimensionality? Slice out the second coordinate of the third axis to recover B. Check that your answer is correct.