# A Quick PyTorch Introduction for ML4NLP1

### Tutorial 4 (18.10.2021)

- PyTorch: https://pytorch.org/
- PyTorch tutorials: https://pytorch.org/tutorials/
- PyTorch documentation: https://pytorch.org/docs/stable/index.html
- PyTorch forums: https://discuss.pytorch.org/
- A collection of pytorch example code snippets: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

In [2]:
import numpy as np
import torch

## Basic Data Types

In [3]:
t = torch.tensor([1, 2, 3])
print(t.dtype)
t = torch.tensor([1.0, 2.0, 3.0])
print(t.dtype)

torch.int64
torch.float32


In [4]:
t = torch.tensor([1, 2, 3], dtype=torch.long)
print(t.dtype)
t = torch.tensor([1, 2, 3], dtype=torch.float64)
print(t.dtype)

torch.int64
torch.float64


In [5]:
t = torch.LongTensor([1, 2, 3])
print(t.dtype)
t = torch.FloatTensor([1, 2, 3])
print(t.dtype)

torch.int64
torch.float32


In [6]:
a = np.array([1, 2, 3])
t = torch.from_numpy(a)
b = t.numpy()
print(a, type(a))
print(t, type(t))
print(b, type(b))

[1 2 3] <class 'numpy.ndarray'>
tensor([1, 2, 3]) <class 'torch.Tensor'>
[1 2 3] <class 'numpy.ndarray'>


For a list of PyTorch tensor types, go to: https://pytorch.org/docs/stable/tensors.html

Tensors have a shape:

In [7]:
s = torch.FloatTensor([0.5])
t = torch.FloatTensor([1, 3.2, 19])
u = torch.FloatTensor([[0.2, 2], 
                       [0.3, 3], 
                       [0.4, 4]])
print(s.shape)
print(t.shape)
print(u.shape)

torch.Size([1])
torch.Size([3])
torch.Size([3, 2])


Alternatively you may sometimes see `tensor.size()`:

In [8]:
print(s.size())
print(t.size())
print(u.size())

torch.Size([1])
torch.Size([3])
torch.Size([3, 2])


You can change the shape of a tensor using  [`tensor.view()`](https://pytorch.org/docs/stable/tensor_view.html):



In [9]:
u.shape

torch.Size([3, 2])

In [10]:
u.view(2, 3)

tensor([[0.2000, 2.0000, 0.3000],
        [3.0000, 0.4000, 4.0000]])

In [11]:
u.view(1, 6)

tensor([[0.2000, 2.0000, 0.3000, 3.0000, 0.4000, 4.0000]])

In [12]:
u.view(2, -1)

tensor([[0.2000, 2.0000, 0.3000],
        [3.0000, 0.4000, 4.0000]])

In [13]:
u.view(-1, 2)

tensor([[0.2000, 2.0000],
        [0.3000, 3.0000],
        [0.4000, 4.0000]])

More infos on `view()`: https://pytorch.org/docs/stable/tensor_view.html

## Flatten and Squeeze

In [14]:
u.flatten()  # same as u.view(1, -1)

tensor([0.2000, 2.0000, 0.3000, 3.0000, 0.4000, 4.0000])

In [15]:
a = torch.FloatTensor([[0.35, 16.1]])
b = torch.FloatTensor([[[1.3, 2.9]]])
print(a.shape)
print(b.shape)

torch.Size([1, 2])
torch.Size([1, 1, 2])


In [16]:
c = a.squeeze()
d = b.squeeze()
print(c, c.shape)
print(d, d.shape)

tensor([ 0.3500, 16.1000]) torch.Size([2])
tensor([1.3000, 2.9000]) torch.Size([2])


In [17]:
e = c.unsqueeze(dim=0)
f = c.unsqueeze(dim=1)
g = c.unsqueeze(dim=-1)
print(e, e.shape)
print(f, f.shape)
print(g, g.shape)

tensor([[ 0.3500, 16.1000]]) torch.Size([1, 2])
tensor([[ 0.3500],
        [16.1000]]) torch.Size([2, 1])
tensor([[ 0.3500],
        [16.1000]]) torch.Size([2, 1])


## Automatically Fill Tensors with Values 

In [18]:
print(torch.zeros(2, 3))

tensor([[0., 0., 0.],
        [0., 0., 0.]])


In [19]:
print(torch.ones(2, 3))

tensor([[1., 1., 1.],
        [1., 1., 1.]])


In [20]:
print(torch.rand(2, 3))  # numbers taken from uniform distribution on interval [0,1)

tensor([[0.5403, 0.4143, 0.8308],
        [0.0635, 0.8832, 0.2031]])


In [21]:
print(torch.randint(low=0, high=6, size=(2, 3)))  # "generated uniformly between low (inclusive) and high (exclusive)""

tensor([[4, 1, 1],
        [3, 0, 3]])


## Tensor Slicing

In [22]:
t = torch.FloatTensor([1, 3.2, 19])
print(t[0])
print(t[1:])
print(t[-1])

tensor(1.)
tensor([ 3.2000, 19.0000])
tensor(19.)


In [23]:
u = torch.FloatTensor([[0.2, 2], 
                       [0.3, 3], 
                       [0.4, 4]])
print(u, u.shape)
print(u[1, 1]) # second element of second row
print(u[1])  # second row
print(u[:, 1])  # second column  
print(u[:-1, 1])  # second column without last element

tensor([[0.2000, 2.0000],
        [0.3000, 3.0000],
        [0.4000, 4.0000]]) torch.Size([3, 2])
tensor(3.)
tensor([0.3000, 3.0000])
tensor([2., 3., 4.])
tensor([2., 3.])


## Mathematical Operations

In [24]:
t = torch.FloatTensor([1, 3.2, 19])

In [25]:
t + 1

tensor([ 2.0000,  4.2000, 20.0000])

In [26]:
t / 3

tensor([0.3333, 1.0667, 6.3333])

In [27]:
t ** 2

tensor([  1.0000,  10.2400, 361.0000])

In [28]:
torch.max(t)

tensor(19.)

In [29]:
print(torch.argmax(t))
print(torch.argmin(t))

tensor(2)
tensor(0)


In [30]:
x = torch.FloatTensor([[.2, -0.3, 1.3], 
                       [0.1, 0.5, 0.01]])
print(x.shape)
print(torch.argmax(x))
print(torch.argmax(x, dim=0))
print(torch.argmax(x, dim=1))
print(torch.argmax(x, dim=-1))

torch.Size([2, 3])
tensor(2)
tensor([0, 1, 0])
tensor([2, 1])
tensor([2, 1])


If the dim argument is confusing, go to: https://stackoverflow.com/questions/55691819/why-does-dim-1-return-row-indices-in-torch-argmax

## [More attributes](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html#attributes-of-a-tensor)

In [31]:
a = torch.FloatTensor([0.4, 0.5, 1.3])
print(a.shape)
print(a.dtype)
print(a.device)

torch.Size([3])
torch.float32
cpu


Moving a Tensor to a Device

In [32]:
print(torch.cuda.is_available())
print(torch.cuda.device_count())

True
1


In [33]:
gpus = [torch.cuda.device(i) for i in range(torch.cuda.device_count())]
print(gpus)

[<torch.cuda.device object at 0x7f767cc21b10>]


In [34]:
device = 'cuda:0'
b = a.to(device)
b

tensor([0.4000, 0.5000, 1.3000], device='cuda:0')

The typical line you see in most pytorch scripts for NN training:

In [35]:
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
# then move to device

## PyTorch Datasets and Dataloaders

For more information, go to: https://pytorch.org/docs/stable/data.html

Best practices sheet:

In [36]:
from torch.utils.data import IterableDataset, DataLoader

In [37]:
num_features = 10
num_train_examples = 100
num_test_examples = 50
num_classes = 4

train_X = torch.rand(num_train_examples, num_features)
train_y = torch.randint(low=0, high=num_classes, size=(num_train_examples, 1))
test_X = torch.rand(num_test_examples, num_features)
test_y = torch.randint(low=0, high=num_classes, size=(num_test_examples, 1))

In [38]:
class MyDataset(IterableDataset):
    def __init__(self, data_X, data_y):
        assert len(data_X) == len(data_y)
        self.data_X = data_X.to(device)
        self.data_y = data_y.to(device)
    
    def __len__(self):
        return len(self.data_X)
    
    def __iter__(self):
        for i in range(len(self.data_X)):
            yield (self.data_X[i], self.data_y[i])

In [39]:
train_set = MyDataset(train_X, train_y)
test_set = MyDataset(test_X, test_y)

In [40]:
train_loader = DataLoader(train_set, batch_size=8)
test_loader = DataLoader(test_set, batch_size=8)

Recommened DataLoader settings:

In [41]:
for batch in train_loader:
    print(batch[0].shape, batch[1].shape)

torch.Size([8, 10]) torch.Size([8, 1])
torch.Size([8, 10]) torch.Size([8, 1])
torch.Size([8, 10]) torch.Size([8, 1])
torch.Size([8, 10]) torch.Size([8, 1])
torch.Size([8, 10]) torch.Size([8, 1])
torch.Size([8, 10]) torch.Size([8, 1])
torch.Size([8, 10]) torch.Size([8, 1])
torch.Size([8, 10]) torch.Size([8, 1])
torch.Size([8, 10]) torch.Size([8, 1])
torch.Size([8, 10]) torch.Size([8, 1])
torch.Size([8, 10]) torch.Size([8, 1])
torch.Size([8, 10]) torch.Size([8, 1])
torch.Size([4, 10]) torch.Size([4, 1])


## A Small Neural Net

In [42]:
class TinyNN(torch.nn.Module):

    def __init__(self, num_features, num_classes):
        super(TinyNN, self).__init__()
        self.num_features = num_features
        self.num_classes = num_classes
        self.hidden_size = int(round(num_features / 2))
        self.input_layer = torch.nn.Linear(num_features, self.hidden_size)
        self.hidden_layer = torch.nn.Linear(self.hidden_size, self.hidden_size)
        self.output_layer = torch.nn.Linear(self.hidden_size, num_classes)

    def forward(self, inputs):
        out1 = self.input_layer(inputs)
        out2 = self.hidden_layer(out1)
        out3 = self.output_layer(out2)
        log_probs = torch.nn.functional.log_softmax(out3, dim=-1)
        return log_probs

## Let's Train it

In [43]:
import torch.optim as optim

model = TinyNN(num_features=num_features, num_classes=num_classes) 
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
model.to(device)

loss_func = torch.nn.NLLLoss()  # for nllloss vs cat-cross-ent: https://discuss.pytorch.org/t/difference-between-cross-entropy-loss-or-log-likelihood-loss/38816/2
lr = 0.001
optimizer = optim.Adam(model.parameters(), lr=lr)

num_epochs = 10

In [44]:
num_batches = len(train_loader)
for epoch in range(1, num_epochs + 1):
    for batch_num, (inputs, y_true) in enumerate(train_loader, 1):
        optimizer.zero_grad()
        y_pred = model(inputs)
        loss = loss_func(y_pred, y_true.squeeze())
        loss_batch = loss.item()
        loss.backward()
        optimizer.step()
        print(f'Epoch [{epoch}/{num_epochs}], batch: [{batch_num}/{num_batches}, loss: {loss_batch:.4f}]')

Epoch [1/10], batch: [1/13, loss: 1.3361]
Epoch [1/10], batch: [2/13, loss: 1.4134]
Epoch [1/10], batch: [3/13, loss: 1.4105]
Epoch [1/10], batch: [4/13, loss: 1.3792]
Epoch [1/10], batch: [5/13, loss: 1.4159]
Epoch [1/10], batch: [6/13, loss: 1.4143]
Epoch [1/10], batch: [7/13, loss: 1.3302]
Epoch [1/10], batch: [8/13, loss: 1.3608]
Epoch [1/10], batch: [9/13, loss: 1.3590]
Epoch [1/10], batch: [10/13, loss: 1.4079]
Epoch [1/10], batch: [11/13, loss: 1.4315]
Epoch [1/10], batch: [12/13, loss: 1.4166]
Epoch [1/10], batch: [13/13, loss: 1.4533]
Epoch [2/10], batch: [1/13, loss: 1.3216]
Epoch [2/10], batch: [2/13, loss: 1.4040]
Epoch [2/10], batch: [3/13, loss: 1.4085]
Epoch [2/10], batch: [4/13, loss: 1.3636]
Epoch [2/10], batch: [5/13, loss: 1.4231]
Epoch [2/10], batch: [6/13, loss: 1.4053]
Epoch [2/10], batch: [7/13, loss: 1.3358]
Epoch [2/10], batch: [8/13, loss: 1.3533]
Epoch [2/10], batch: [9/13, loss: 1.3621]
Epoch [2/10], batch: [10/13, loss: 1.4066]
Epoch [2/10], batch: [11/13, 

Let's test the accuracy:

In [45]:
predictions = []
true_labels = []
model.eval()  # without dropout and alike not really necessary
with torch.no_grad():  # disable gradient computation, since it is only needed when backward() is called
    for test_X, test_y in test_loader:
        pred_y = model(test_X)
        batch_preds = [x.item() for x in torch.argmax(pred_y, dim=-1)]
        predictions.extend(batch_preds)
        true_labels.extend([y.item() for y in test_y.squeeze()])

In [46]:
from sklearn.metrics import accuracy_score
acc = accuracy_score(true_labels, predictions)
acc

0.22