<a href="https://colab.research.google.com/github/heimmer/NLP/blob/main/tutorial-full%20version/Tutorials/tutorial_1/tutorial_week1_introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CS6493 - Tutorial 1
## Introduction to JupyterHub and PyTorch

Welcome to CS6493 tutorial. In this tutorial, you will get familiar with our exprimental environment, and also practice with some basic PyTorch operations.

## 1. JupyterHub

You can use the JupyterHub to run the toy models. Here are some notes for JupyterHub:

- You are supposed to be familiar with Python and Jupyter;
- Enter https://mljh.cs.cityu.edu.hk/ and login with your **EID** to use JupyterHub;
- Each student will be allocated with **8GB GPU memories**;
- Accommodate up to **60 students** to simultaneously use the JpuyterHub;
- **DO NOT** attempt to load large models, such as T5 and GPT-3 on JupyterHub;
- **DO NOT** run a program more than 1 week.

## 2. PyTorch

We use [PyTorch](https://pytorch.org/) framework to finish the implementations. In this section, we will introduce the installation, the basic operations of PyTorch.

### 2.1 Installation
Here, we install PyTorch with GPU supports. So, we first need to know the CUDA version in the server.

In [1]:
!nvidia-smi

Tue Apr 25 08:09:57 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P8     9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

Then, we find the suitable version package from https://pytorch.org/. Past the command and run in the cell.

Here we choose the version 1.10.1 with CUDA 10.2.

In [2]:
!pip3 install torch torchvision torchaudio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Now we can check the version of our installed package and whether it supports to GPUs,

In [3]:
import torch
print("PyTorch version: ", torch.__version__)
print("GPU support: ", torch.cuda.is_available())
print("Available devices count: ", torch.cuda.device_count())

PyTorch version:  2.0.0+cu118
GPU support:  True
Available devices count:  1


## 2.2 Quick start - Tensor in PyTorch

In this section, we introcue some basic concepts and operations of Tensor.

In [4]:
import numpy as np

Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or other hardware accelerators.

### Create Tensors

Tensors can be created directly from data or NumPy arrays. You can assign the data type to the tensor. Otherwise, the data type would be automatically inferred.

In [5]:
data = [[0,1], [2,3]]
tensor_data = torch.tensor(data)
tensor_data_float = torch.tensor(data).float()
print(f"Long Tensor: \n {tensor_data} \n")  # the data type is LongTensor
print(f"Float Tensor: \n {tensor_data_float} \n")

Long Tensor: 
 tensor([[0, 1],
        [2, 3]]) 

Float Tensor: 
 tensor([[0., 1.],
        [2., 3.]]) 



In [6]:
np_data = np.array(data)
tensor_np_data = torch.tensor(np_data)
tensor_np_data_float = torch.tensor(np_data).float()
print(f"Long Tensor: \n {tensor_np_data} \n")  # the data type is LongTensor
print(f"Float Tensor: \n {tensor_np_data_float} \n")

Long Tensor: 
 tensor([[0, 1],
        [2, 3]]) 

Float Tensor: 
 tensor([[0., 1.],
        [2., 3.]]) 



You can also create the tensors filled with constant (e.g., 0 and 1) or random values,

In [7]:
zeros_tensor = torch.zeros((2,3))
ones_tensor = torch.ones((2,3))
random_tensor = torch.rand((2,3))
print(f"Zeros Tensor: \n {zeros_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Random Tensor: \n {random_tensor} \n")

Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]]) 

Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Random Tensor: 
 tensor([[0.7450, 0.1870, 0.7166],
        [0.6036, 0.4082, 0.9448]]) 



### Attributes of a Tensor

Tensor attributes describe their shape, datatype, and the device on which they are stored.

In [8]:
tensor = torch.rand(2,3)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([2, 3])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Operations on Tensors

There are over 100 tensor operations, including arthmetic, linear algebra, matrix manipulation and more. In this section, we only introduce some frequently used operations in our later tutorials and projects.

**Move Tensor to Device**

By default, tensors are created on the CPU. We need to explicitly move tensors to the GPU using `.to()` method (after checking for GPU availability). Keep in mind that copying large tensors across devices can be expensive in terms of time and memory!

In [9]:
# move tensor to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
tensor = tensor.to(device)
print(f"Device tensor is stored on: {tensor.device}")

Device tensor is stored on: cuda:0


**Tensor indexing, slicing and reshape**

In [10]:
tensor = torch.rand(4, 6)
tensor

tensor([[0.3975, 0.0207, 0.3443, 0.3125, 0.4004, 0.7201],
        [0.6078, 0.8387, 0.5852, 0.0792, 0.2421, 0.3974],
        [0.9580, 0.8436, 0.4111, 0.6594, 0.4107, 0.7563],
        [0.4197, 0.8792, 0.0617, 0.5252, 0.8393, 0.4848]])

In [11]:
# let take a look at its first row and column
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:,0]}")
print(f"Last column: {tensor[:, -1]}")

First row: tensor([0.3975, 0.0207, 0.3443, 0.3125, 0.4004, 0.7201])
First column: tensor([0.3975, 0.6078, 0.9580, 0.4197])
Last column: tensor([0.7201, 0.3974, 0.7563, 0.4848])


In [12]:
# reshape
print(f"Reshape to (2,12): \n {tensor.view(2, 12)} \n")
print(f"Reshape to (2,2,6): \n {tensor.view(-1, 2, 6)} \n")

Reshape to (2,12): 
 tensor([[0.3975, 0.0207, 0.3443, 0.3125, 0.4004, 0.7201, 0.6078, 0.8387, 0.5852,
         0.0792, 0.2421, 0.3974],
        [0.9580, 0.8436, 0.4111, 0.6594, 0.4107, 0.7563, 0.4197, 0.8792, 0.0617,
         0.5252, 0.8393, 0.4848]]) 

Reshape to (2,2,6): 
 tensor([[[0.3975, 0.0207, 0.3443, 0.3125, 0.4004, 0.7201],
         [0.6078, 0.8387, 0.5852, 0.0792, 0.2421, 0.3974]],

        [[0.9580, 0.8436, 0.4111, 0.6594, 0.4107, 0.7563],
         [0.4197, 0.8792, 0.0617, 0.5252, 0.8393, 0.4848]]]) 



**Joining tensors.** You can use torch.cat to concatenate a sequence of tensors along a given dimension.

In [13]:
t1 = torch.zeros(4, 2)
new_t = torch.cat([tensor, t1, t1], dim=1)
new_t

tensor([[0.3975, 0.0207, 0.3443, 0.3125, 0.4004, 0.7201, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.6078, 0.8387, 0.5852, 0.0792, 0.2421, 0.3974, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.9580, 0.8436, 0.4111, 0.6594, 0.4107, 0.7563, 0.0000, 0.0000, 0.0000,
         0.0000],
        [0.4197, 0.8792, 0.0617, 0.5252, 0.8393, 0.4848, 0.0000, 0.0000, 0.0000,
         0.0000]])

**Arithmetic operations**

The basic arithmetic operations of Pytorch are similar with those in Numpy, such as `.pow()`, `.div()`, `.sum()` and more. Here we talk more about multiplication in Pytorch.

In [14]:
# This computes the matrix multiplication between two tensors. y1, y2 will have the same value
print(f"Shape of original tensor: {tensor.shape}")
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

print(f"Shape of matrix multiplication resulting tensor: {y1.shape}")

# This computes the element-wise product. z1, z2, z3 will have the same value
z1 = tensor * tensor
z2 = tensor.mul(tensor)

print(f"Shape of element-wise product resulting tensor: {z1.shape}")

Shape of original tensor: torch.Size([4, 6])
Shape of matrix multiplication resulting tensor: torch.Size([4, 4])
Shape of element-wise product resulting tensor: torch.Size([4, 6])


## 2.3 Practice

In NLP, we have a very popular and famous techique, termed **Attention** which is used to measure the improtance among each components. Formally, we define the attention mechanism as:

$Attention(\mathbf{Q},\mathbf{K},\mathbf{V}) = \text{Softmax}(\frac{\mathbf{Q}\mathbf{K}^T}{\sqrt{d_k}})\mathbf{V}$

$\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$,

you can attempt to implement softmax function and attention by yourself.

In [19]:
import math
v= torch.rand((2,4,8))
k = v
q = torch.rand((2,4,8))
d_k = 8

In [21]:
# insert your code
def softmax(x, dim=0):
  # s = x.clone().detach()
  # denominator = torch.sum(torch.exp(x))
  # for i in x:
  #   s[i] = torch.exp(x[i])/denominator
  x_exp = x.exp()
  partition = x_exp.sum(dim=dim, keepdim=True)
  return x_exp/partition

def attention(q, k, v):
  # return softmax((q@k.T)/torch.sqrt(d_k)) @ v
  return torch.bmm(softmax(torch.bmm(q,k.transpose(2,1))/math.sqrt(d_k),dim=1),v)

In [22]:
attention(q,k,v)

tensor([[[0.3062, 0.5595, 0.5241, 0.4910, 0.3022, 0.6360, 0.3312, 0.5216],
         [0.4192, 0.7582, 0.6774, 0.6444, 0.3777, 0.8300, 0.4416, 0.6756],
         [0.4569, 0.8445, 0.7502, 0.7205, 0.4239, 0.9129, 0.4603, 0.7712],
         [0.3435, 0.6333, 0.5775, 0.5476, 0.3299, 0.7010, 0.3552, 0.5877]],

        [[0.5498, 0.3704, 0.4734, 0.3011, 0.3211, 0.2853, 0.6813, 0.2172],
         [0.7086, 0.4699, 0.5952, 0.3611, 0.4021, 0.3706, 0.8162, 0.2893],
         [0.5982, 0.4007, 0.4965, 0.3083, 0.3414, 0.3131, 0.6943, 0.2414],
         [0.6934, 0.4863, 0.5570, 0.3725, 0.3759, 0.3805, 0.8019, 0.2751]]])

**Think more** how wo implement self-attention if you only have access to a hidden state **H**. (Mapping **H** with a learnable matrix **W** to **Q, K, V**)

In [None]:
# insert your code