# CS6493 - Tutorial 1
## Introduction to JupyterHub and PyTorch

Welcome to CS6493 tutorial. In this tutorial, you will get familiar with our exprimental environment, and also practice with some basic PyTorch operations.

## 1. JupyterHub

You can use the JupyterHub to run the toy models. Here are some notes for JupyterHub:

- You are supposed to be familiar with Python and Jupyter;
- Enter https://mljh.cs.cityu.edu.hk/ and login with your **EID** to use JupyterHub;
- Each student will be allocated with **8GB GPU memories**;
- Accommodate up to **60 students** to simultaneously use the JpuyterHub;
- **DO NOT** attempt to load large models, such as T5 and GPT-3 on JupyterHub;
- **DO NOT** run a program more than 1 week.

## 2. PyTorch

We use [PyTorch](https://pytorch.org/) framework to finish the implementations. In this section, we will introduce the installation, the basic operations of PyTorch.

### 2.1 Installation
Here, we install PyTorch with GPU supports. So, we first need to know the CUDA version in the server.

In [None]:
!nvidia-smi

Then, we find the suitable version package from https://pytorch.org/. Past the command and run in the cell.

Here we choose the version 1.10.1 with CUDA 10.2.

In [None]:
!pip3 install torch torchvision torchaudio

Now we can check the version of our installed package and whether it supports to GPUs,

In [None]:
import torch
print("PyTorch version: ", torch.__version__)
print("GPU support: ", torch.cuda.is_available())
print("Available devices count: ", torch.cuda.device_count())

## 2.2 Quick start - Tensor in PyTorch

In this section, we introcue some basic concepts and operations of Tensor.

In [None]:
import numpy as np

Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or other hardware accelerators.

### Create Tensors

Tensors can be created directly from data or NumPy arrays. You can assign the data type to the tensor. Otherwise, the data type would be automatically inferred.

In [None]:
data = [[0,1], [2,3]]
tensor_data = torch.tensor(data)
tensor_data_float = torch.tensor(data).float()
print(f"Long Tensor: \n {tensor_data} \n")  # the data type is LongTensor
print(f"Float Tensor: \n {tensor_data_float} \n")

In [None]:
np_data = np.array(data)
tensor_np_data = torch.tensor(np_data)
tensor_np_data_float = torch.tensor(np_data).float()
print(f"Long Tensor: \n {tensor_np_data} \n")  # the data type is LongTensor
print(f"Float Tensor: \n {tensor_np_data_float} \n")

You can also create the tensors filled with constant (e.g., 0 and 1) or random values,

In [None]:
zeros_tensor = torch.zeros((2,3))
ones_tensor = torch.ones((2,3))
random_tensor = torch.rand((2,3))
print(f"Zeros Tensor: \n {zeros_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Random Tensor: \n {random_tensor} \n")

### Attributes of a Tensor

Tensor attributes describe their shape, datatype, and the device on which they are stored.

In [None]:
tensor = torch.rand(2,3)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

### Operations on Tensors

There are over 100 tensor operations, including arthmetic, linear algebra, matrix manipulation and more. In this section, we only introduce some frequently used operations in our later tutorials and projects.

**Move Tensor to Device**

By default, tensors are created on the CPU. We need to explicitly move tensors to the GPU using `.to()` method (after checking for GPU availability). Keep in mind that copying large tensors across devices can be expensive in terms of time and memory!

In [None]:
# move tensor to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
tensor = tensor.to(device)
print(f"Device tensor is stored on: {tensor.device}")

**Tensor indexing, slicing and reshape**

In [None]:
tensor = torch.rand(4, 6)
tensor

In [None]:
# let take a look at its first row and column
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:,0]}")
print(f"Last column: {tensor[:, -1]}")

In [None]:
# reshape
print(f"Reshape to (2,12): \n {tensor.view(2, 12)} \n")
print(f"Reshape to (2,2,6): \n {tensor.view(-1, 2, 6)} \n")

**Joining tensors.** You can use torch.cat to concatenate a sequence of tensors along a given dimension.

In [None]:
t1 = torch.zeros(4, 2)
new_t = torch.cat([tensor, t1, t1], dim=1)
new_t

**Arithmetic operations**

The basic arithmetic operations of Pytorch are similar with those in Numpy, such as `.pow()`, `.div()`, `.sum()` and more. Here we talk more about multiplication in Pytorch.

In [None]:
# This computes the matrix multiplication between two tensors. y1, y2 will have the same value
print(f"Shape of original tensor: {tensor.shape}")
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

print(f"Shape of matrix multiplication resulting tensor: {y1.shape}")

# This computes the element-wise product. z1, z2, z3 will have the same value
z1 = tensor * tensor
z2 = tensor.mul(tensor)

print(f"Shape of element-wise product resulting tensor: {z1.shape}")

## 2.3 Practice

In NLP, we have a very popular and famous techique, termed **Attention** which is used to measure the improtance among each components. Formally, we define the attention mechanism as:

$Attention(\mathbf{Q},\mathbf{K},\mathbf{V}) = \text{Softmax}(\frac{\mathbf{Q}\mathbf{K}^T}{\sqrt{d_k}})\mathbf{V}$

$\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$,

you can attempt to implement softmax function and attention by yourself.

In [None]:
v= torch.rand((2,4,8))
k = v
q = torch.rand((2,4,8))
d_k = 8

In [None]:
# insert your code
def softmax(x, dim=0):
    pass

def attention(q, k, v):
    pass

**Think more** how wo implement self-attention if you only have access to a hidden state **H**. (Mapping **H** with a learnable matrix **W** to **Q, K, V**)

In [None]:
# insert your code