# **HW 1. PyTorch practice**
### Course: Deep Learning (DSBA and ICEF), 2026, HSE
### Authors: Alexey Boldyrev, ML Teaching Team

Release Date: 23.01.2026

Deadline: \
(Soft) 23:59 MSK 01.02.2026 \
(Hard) 18:00 MSK 02.02.2026

Authors: Alexey Boldyrev, ML Teaching Team

### About the assignment

The assignment is in two parts. The first part (PyTorch basics) is not assessed, but if you are not familiar with PyTorch or are unsure, we strongly recommend that you start with it. The second part (Practice) has graded and ungraded exercises on working with tensors and building your first neural network.\
The goal of the assignment is to gain practical skills in working with PyTorch.

### Assessment and penalties
Each of the tasks has a certain “cost” (indicated in brackets near the task). The maximum allowable grade for a task is 20 points.

You may not turn in an assignment after a strict (Hard) deadline.

The assignment is completed independently. “Similar” solutions are considered plagiarism and all students involved (including those who have been copied from) cannot receive more than 0 points for it. If you have found a solution to any of the assignments (or part of it) in an open source, you must provide a link to that source (most likely you will not be the only one who found it, so to exclude suspicion of plagiarism, a link to the source is required).

---

**When using AI, chatbots, generative and large language models** (ChatGPT, DeepSeek, Qwen, Llama, Mistral, Falcon, Gemma, Microsoft Copilot, Gemini, Claude, Grok, Perplexity, YandexGPT, GigaChat and others), **you must specify the following for each of them in LLM Documentation in the next cell**:
- The full name and version of the model, as well as a link to the service used
   * The application or browser version with built-in assistant, TG bot, etc.
- **All** used prompts
- Tell us how you rate the AI's work, and what specific problem did it help solve?
- The verdict on recognizing the decision as written by AI without observing these rules is made by a teaching team.
   * Such cases are to be considered as plagiarism.

**CONSENT**.
<input type="checkbox" disabled checked /> I confirm that I will use AI agents in this home assignment only on the condition that they are documented.

---

### Format of submission
Assignments are submitted through the Smart LMS system here https://edu.hse.ru/mod/assign/view.php?id=1944225. You should send a notebook with the completed assignment. Name the notebook itself in the format **HW1-pytorch-NameLastname.ipynb**, where Name and Lastname are your first and last name.

For ease of checking yourself, calculate your maximum grade (based on the set of problems solved) and indicate it below.

Score: **xx**

## 0. PyTorch basics

Inspired by https://nrehiew.github.io/blog/pytorch/

Load necessary libraries:

In [2]:
import torch
from torch import nn
import torch.nn.functional as F
import pandas as pd
import numpy as np
import random

PyTorch uses its own data representation (i.e. **tensor** objects) for several important reasons that are key to its purpose as an efficient, flexible, and scalable deep learning framework.

PyTorch's custom data representation enables
- High performance on hardware accelerators.
- Support for automatic differentiation.
- Flexibility through dynamic computational graphs.
- Advanced multidimensional data manipulation.
- Multi-device scalability.

These features make PyTorch tensors ideal for deep learning, unlike traditional Python data structures such as lists or arrays.

In the context of machine learning, a tensor is an *n*-dimensional matrix. OK, but what is a `torch.tensor`? More specifically, what actually happens when the following piece of code is executed: `a = torch.tensor(1.0, requires_grad=True)`? It turns out that PyTorch allocates the data on the heap and returns the pointer to that data as a shared pointer (see more [here](https://discuss.pytorch.org/t/where-does-torch-tensor-create-the-object-stack-or-heap/182753)). To better understand pointers in PyTorch, let's look at some examples.

The following cell creates a tensor with shape `(2, 6)`, two rows and six columns, containing values randomly distributed according to a normal distribution with mean zero and standard deviation one.

In [3]:
features = torch.randn((2, 6))

The next cell creates another tensor with the same shape as the features, again containing values from a normal distribution.

In [4]:
weights = torch.randn_like(features)

PyTorch tensors can be added, multiplied, subtracted, etc., just like Numpy arrays. In general, you'll use PyTorch tensors pretty much the same way you use Numpy arrays.

In [5]:
a = torch.arange(9).reshape(3, 3) # 3x3 tensor
a

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])

Original and derived tensors in PyTorch are objects linked to the same memory area:

In [6]:
b = a.t() # Transpose
b[0, 0] = 123
print(a, '\n\n', b)

tensor([[123,   1,   2],
        [  3,   4,   5],
        [  6,   7,   8]]) 

 tensor([[123,   3,   6],
        [  1,   4,   7],
        [  2,   5,   8]])


Note that `.t()` returns a pointer, which means that both a and b point to the same underlying data, and any changes to that underlying data can be seen from both pointers.

Compare with:

In [7]:
a = torch.arange(9).reshape(3, 3) # torch.int64
b = a.to(torch.float16) #
b[0][0] = 42
print(a[0][0].item(), b[0][0].item())

0 42.0


PyTorch needs to represent a `float16` differently to how `int64` is represented in memory. So in this example PyTorch needs to make a copy of the data and represent it differently.

### PyTorch broadcasting

If necessary, check the PyTorch [broadcasting semantics](https://pytorch.org/docs/stable/notes/broadcasting.html#general-semantics).


In [8]:
a = torch.tensor([1, 2]).reshape(1, 2) # 1 x 2
b = torch.tensor([[3, 4], [5, 6]]) # 2 x 2
c = torch.zeros((2, 2)) # we know that a + b gives a 2x2 tensor

# Now working on: First dimension from the right
c[0][0] = a[0][0] + b[0][0]
c[0][1] = a[0][1] + b[0][1]

# Now working on: Second dimension from the right
# note that a has been broadcasted to 2x2 along the second dimension
# we check that the broadcasted_shape[1] = 2 which is not equals to
# the original shape[1] = 1 so we know this is a broadcasted dimension
# Thus, we artifically just return the 0th element along this dimension
c[1][0] = a[0][0] + b[1][0]
c[1][1] = a[0][1] + b[1][1]

torch.equal(a + b, c) # true

True

#### Matrix Multiplication

In [9]:
a = torch.randn((3, 4, 1, 2)) # 3 x 4 x 1 x 2
b = torch.randn((1, 2, 3)) # 1 x 2 x 3

# Matrix Multiply Shape: 1x2 @ 2x3 -> 1x3
# Batch Shape: We broadcast (3, 4) and (1) -> (3, 4)
# Result shape: 3 x 4 x 1 x 3
c = torch.zeros((3, 4, 1, 3))

# iterate over the batch dimensions of (3, 4)
for i in range(3):
	for j in range(4):
		a_slice = a[i][j] # 1 x 2
		b_slice = b[0] # 2 x 3
		c[i][j] = a_slice @ b_slice # 1 x 3

print(torch.matmul(a, b).shape, c.shape)

torch.Size([3, 4, 1, 3]) torch.Size([3, 4, 1, 3])


### Backpropagation

The core of PyTorch is its automatic differentiation engine. In general terms, each time a differentiable operation between 2 tensors occurs, PyTorch will automatically build the entire computational graph through a callback function. The gradient of each tensor is then updated when `.backward()` is called. This is PyTorch's biggest abstraction. Often `.backward()` is called and we just hope that the gradients flow properly. In this section I will try to build some intuition for visualising gradient flows.

In [10]:
# Define a tensor with requires_grad=True to track computation
x = torch.tensor([2.0], requires_grad=True)

# Define a function y = x^2
y = x**2

# Compute the gradient (dy/dx)
y.backward()

# x.grad will hold the derivative of y with respect to x
print(x.grad)  # Output: tensor([4.])

tensor([4.])


A common problem in trying to understand backpropagation is that most people understand derivatives and the chain rule for scalars, but how this translates to higher dimensional tensors is not particularly obvious.

Thinking about gradients from this local scalar perspective has the added benefit of making the effect of tensor operations on gradients intuitive. For example, operations such as `.reshape()`, `.transpose()`, `.cat()` and `.split()` do not affect the single value and its gradient on a local scale. It follows naturally that the effect of these operations on the gradient tensor of a tensor is the operation itself. For example, flattening a tensor with `.reshape(-1)` will have the same effect on its gradient as calling `.reshape(-1)`.

In [11]:
z = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)  # Shape: (2, 2)

# Define a vector-valued function: f(z) = z^3 + 2z
f = z**3 + 2 * z  # Element-wise

# Compute gradients by summing over all outputs (scalarization)
# The .backward() method requires scalar output, hence `sum()`
f.sum().backward()

# Gradients of f w.r.t the input z
print(z.grad)  # Contains derivatives for each element in z

tensor([[ 5., 14.],
        [29., 50.]])


PyTorch also supports higher-order derivatives (e.g., second or third derivatives). To compute these, you need to set `create_graph=True` for the first derivative computation. For example:

In [12]:
x = torch.tensor([2.0], requires_grad=True)

# Define a function f(x) = x^3
f = x**3

# Compute first derivative df/dx using autograd.grad
first_derivative = torch.autograd.grad(f, x, create_graph=True)[0]

print("First derivative:", first_derivative)

# Compute second derivative d^2f/dx^2
second_derivative = torch.autograd.grad(first_derivative, x)[0]

print("Second derivative:", second_derivative)

First derivative: tensor([12.], grad_fn=<MulBackward0>)
Second derivative: tensor([12.])


To compute the Jacobian (partial derivatives of each output w.r.t each input):

In [13]:
x = torch.tensor([1.0, 2.0], requires_grad=True)

# Define a vector-valued function
y = torch.stack([x[0]**2, x[1]**3])  # Outputs: [x[0]^2, x[1]^3]

# Compute Jacobian (manually populate each row)
jacobian = []
for i in range(y.size(0)):
    x.grad = None  # Clear previous gradients
    y[i].backward(retain_graph=True)  # Compute partial derivatives
    jacobian.append(x.grad.clone())  # Store the gradient for output i

jacobian = torch.stack(jacobian)

print("Jacobian matrix:")
print(jacobian)  # Displays the partial derivatives

Jacobian matrix:
tensor([[ 2.,  0.],
        [ 0., 12.]])


## 1. Practice

### Documentation
Please use the PyTorch documentation in case of difficulties:
* The documentation on [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html#torch-tensor)
* The documentation on [`torch.cuda`](https://pytorch.org/docs/master/notes/cuda.html#cuda-semantics)


### [0 points] Create a random tensor with shape `(5, 5)`.

In [None]:
# Create random tensor of given shape

### [0 points] Perform matrix multiplication of the resulting tensor with another random tensor of shape `(1, 5)`.
Hint: you may have to transpose the second tensor.

In [None]:
# Create another random tensor

# Perform matrix multiplication

### [1 point] Set the random seed to `42` and repeat code in two previous cells.
The output should be:\
```tensor([[1.1697],
        [0.9278],
        [0.9534],
        [0.6976],
        [0.7460]])```

In [None]:
# Set manual seed

# Create two random tensors

# Matrix multiply tensors

### [1 point] Set random seed on the GPU.

Hint: You'll need to read the [`torch.cuda`](https://pytorch.org/docs/master/notes/cuda.html#cuda-semantics) documentation for this.

In [None]:
# Set random seed on the GPU

### [1 point] Create two random tensors of shape `(3, 4)` and send them both to the GPU. Set random seed `123` when creating the tensors (this doesn't have to be the GPU random seed).

Hint: The output should contain `device='cuda:0'`.
Hint: How to access GPU in Colab?
Setting up the Runtime: In Google Colab, go to the "Runtime" menu and select "Change runtime type." A dialog box will appear where you can choose the runtime type and hardware accelerator. Select "T4 GPU" as the hardware accelerator and click "Save." This step ensures that your Colab notebook is configured to use the GPU.



In [None]:
# Set random seed

# Check for access to GPU

# Create two random tensors on GPU

### [0 points] Perform a matrix multiplication on the tensors you created in previous cell.
Hint: you may have to adjust the shapes of one of the tensors.


In [None]:
# Perform matrix multiplication

### [0 points] Find the minimum and maximum values of the output of previous cell.

In [None]:
# Find min

# Find max

### [0 points] Find the indices of these minimum and maximum values.

In [None]:
# Find arg min

# Find arg max

### [2 points] Make a random tensor with shape `(1, 1, 1, 8)` and then create a new tensor with all the `1` dimensions removed to be left with a tensor of shape `(8)`. Set the seed to `999` when you create it and print out the first tensor and it's shape as well as the second tensor and it's shape.

In [None]:
# Set seed

# Create random tensor

# Remove single dimensions

# Print out tensors and their shapes

### [1 point] Print the index of the maximum value of the second tensor.

In [None]:
# Find arg max

### Your first neural network


In this exercise you will implement a small neural network with two linear layers. The first layer takes an eight-dimensional input and the last layer outputs a one-dimensional tensor.
The following exercices are inspired by the DataCamp course [Introduction to Deep Learning with PyTorch](https://app.datacamp.com/learn/courses/introduction-to-deep-learning-with-pytorch).

### [0 points] Create a neural network of **two linear layers** that takes `input_tensor` as input, representing `8` features, and outputs a tensor of dimensions `1`.
Hint: use [`torch.nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) and [`torch.nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html).

In [None]:
input_tensor = torch.Tensor([[1, 5, 4, 7, 3, 6, 0, 2]])

# Implement a small neural network with two linear layers
model = nn.Sequential(
  # write your code here
)

output = model(input_tensor)
print(output)

### [0 points] Create a [`torch.nn.Sigmoid`](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html) and apply it on `input_tensor` to generate a probability for a binary classification task.

In [None]:
input_tensor = torch.tensor([[1.3]])

# Create a sigmoid function and apply it on input_tensor
sigmoid = # write your code here
probability = sigmoid(input_tensor)
print(probability)

### [0 points] Create a [`torch.nn.Softmax`](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html) and apply it on `input_tensor` to generate a probability for a multiclass classification task.

In [None]:
input_tensor = torch.tensor([[6.6, -3.2, -4.3, 0.3, -0.7, -4.7]])

# Create a softmax function and apply it on input_tensor
softmax = # write your code here
probabilities = softmax(input_tensor)
print(probabilities)

### [0 points] How can you avoid the following warning message when calling the softmax function from PyTorch?

> UserWarning: Implicit dimension choice for softmax has been deprecated.



### Neural Networks for Classification and Regression

### [2 points] Create a neural network that takes a tensor of dimension `1x8` as input and returns an output of the correct shape for binary classification.

In [None]:
input_tensor = torch.Tensor([[1, 5, 4, 7, 3, 6, 0, 2]])

# Implement a small neural network for binary classification
# Pass the output of the linear layer to a sigmoid, which both takes in and return a single float.
model = nn.Sequential(
# write your code here
)

output = model(input_tensor)
print(output)

### [2 points] Create a 4-layer linear neural network compatible with input_tensor as input and a regression value as output.

In [None]:
input_tensor = torch.Tensor([[1, 5, 4, 7, 3, 6, 0, 2, 6, 2]])

# Implement a neural network with exactly four linear layers
model = nn.Sequential(
# write your code here
)

output = model(input_tensor)
print(output)

### [1 point] Create a one-hot encoded vector of the ground truth label `y` using [`torch.nn.functional.one_hot`](https://pytorch.org/docs/stable/generated/torch.nn.functional.one_hot.html).

In [None]:
y = 1
num_classes = 4

# Create the one-hot encoded vector using PyTorch
one_hot = # write your code here

### [2 points] Calculate the cross entropy loss using the one-hot encoded vector of the ground truth label `y`, with `4` features (one for each class).

In [None]:
from torch.nn import CrossEntropyLoss

y = [2]
scores = torch.tensor([[3.1, -5.0, 1.8, 4.3]])

# Create a one-hot encoded vector of the label y
one_hot_label = # write your code here

# Create the cross entropy loss function
criterion = # write your code here

# Calculate the cross entropy loss
loss = # write your code here

print(loss)

### [1 point] Accessing the model parameters.
Hint: model parameters are weights and biases.\
Hint: use [`torch.nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) documentation.\
Hint: try to discover [PyTorch discussion forum](https://discuss.pytorch.org/t/access-weights-of-a-specific-module-in-nn-sequential/3627).

In [None]:
model = nn.Sequential(nn.Linear(16, 8),
                      nn.Linear(8, 4)
                     )

# Access the weight of the first linear layer
weight_0 = # write your code here

# Access the bias of the second linear layer
bias_1 = # write your code here

print(weight_0, bias_1)

### [3 point] Create a 3-layer neural network.



In [None]:
input_tensor = torch.Tensor([[1, 5, 4, 7, 3, 6, 0, 2, 6, 2, 4, 1, 4, 0, 5, 12]])

# Create the model of three linear layers and Softmax activation
model = nn.Sequential(
# write your code here
)

# Run a forward pass
prediction = model(input_tensor)

# Calculate the loss
criterion = CrossEntropyLoss()
target = tensor([[1., 0.]])

loss = criterion(prediction, target)

# Compute the gradients
loss.backward()

### [2 points] Update the weights of the first layer using the gradients scaled by the learning rate `0.001`.

In [None]:
# Learning rate is typically small
lr = 0.001

weight = # write your code here

# Access the gradients of the weight of each linear layer
grads = # write your code here

# Update the weights using the learning rate and the gradients
weight = # write your code here

### [1 point] Update the model's parameters using the [`torch.optim.SGD`](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html) optimizer and the learning rate `0.001`.

In [None]:
# Create the SGD optimizer
optimizer = # write your code here

loss = criterion(prediction, target)
loss.backward()

# Update the model's parameters using the optimizer

# write your code here

That's all for this PyTorch practice.

**Don't forget to rename the jupyter notebook to HW1-pytorch-NameLastname.ipynb, where Name and Lastname are your first and last name. Then send a jupyter notebook file to SmartLMS.**