<img src="https://drive.google.com/uc?export=view&id=1x-QAgitB-S5rxGGDqxsJ299ZQTfYtOhb" width=180, align="center"/>

Master's degree in Intelligent Systems

Subject: 11754 - Deep Learning

Year: 2022-2023

Professor: Miguel Ángel Calafat Torrens

# Lab 3 - Pytorch

[PyTorch](http://pytorch.org/) is a Python library that is designed to be able to develop high-level _Deep Learning_ jobs. It has the advantage of being quite easy to understand compared to other platforms.

In a way, pytorch can be used quite similarly to numpy. In previous labs, numpy has been used to deal with numpy-arrays, which are basically vectors and matrices. In this case, pytorch works with tensors, which is the same, but admitting more dimensions. Ultimately, tensors are arrays; when they have a single dimension we call them vectors, when they have two dimensions we call them matrices, and when they have more dimensions we simply call them tensors.

Pytorch tensors have a number of features that make them easy to use, such as the ability to run on a GPU or the fact that they are optimized for automatic differentiation. Otherwise, its use from a mathematical point of view is quite similar to the use you have already tried with numpy. Many of the features you saw in numpy are available in Pytorch. Some of the most relevant are listed below:

* torch.tensor()  --> It creates a tensor. The different data types are available [here](https://pytorch.org/docs/stable/tensors.html)
* torch.zeros()  --> It creates a zeros tensor.
* torch.zeros_like()  --> It creates a zeros tensor with the shape of a given tensor.
* torch.ones()  --> It creates a ones tensor.
* torch.ones_like()  --> It creates a ones tensor with the shape of a given tensor.
* torch.full()  --> It creates a tensor of constant values.
* torch.full_like()  --> It creates a tensor of constant values with the shape of a given tensor.
* torch.rand()  --> It creates a tensor of random numbers between 0 and 1.
* torch.rand_like()  --> It creates a tensor of random numbers between 0 and 1 with the shape of a given tensor.
* torch.randn()  --> It creates a tensor of random numbers between 0 and 1, following a normal distribution.
* torch.randn_like()  --> It creates a tensor of random numbers between 0 and 1, following a normal distribution, and with the shape of a given tensor.
* torch.arange()  --> It creates a tensor from a specified sequence.
* torch.linspace()  --> It creates a tensor from a specified sequence.
* torch.cat()  --> It concatenates tensors.
* torch.split()  --> It splits tensors.
* torch.squeeze()  --> It returns a tensor that removes dimensions with value 1.
* torch.unsqueeze()  --> It returns a new tensor to which it adds a dimension at the specified position
* torch.reshape()  --> It returns a tensor with the specified shape.
* torch.from_numpy()  --> It creates a tensor from a numpy-array.
* tensor.numpy()  --> It creates a numpy-array from a tensor.

In addition, you will also find basic mathematical operations such as: _abs, add, div, mult, sub, ceil, floor, sqrt, exp, log, round, power_, etc.

In [None]:
# This cell connects to your drive
from google.colab import drive
drive.mount('/content/gdrive')
%cd '/content/gdrive/MyDrive/Colab Notebooks/2022-2023-Lab.DL'
%ls -l

# Here the path of the project folder (which is where this file is) is inserted
# into the python path. There's nothing to do; just execute the cell.
import pathlib
import sys

PROJECT_DIR = str(pathlib.Path().resolve())
sys.path.append(PROJECT_DIR)

In [None]:
# Importing some libraries
import torch
import numpy as np
import matplotlib.pyplot as plt
import helper_PR3 as hp
from google.colab import files

In [None]:
# This is a way to view the content of the help file without having to edit it
# in a new tab. You can also browse to it through the left bar.
files.view('helper_PR3.py')

You probably remember that in the last lab we defined an activation function called sigmoid. In that case numpy was used:
```
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
```

Note how doing the same in pytorch is straightforward (just replace `np.exp` with `torch.exp`). Let's define a function that can handle both objects.

In [None]:
def sigmoid(x):
    try:
        out = 1 / (1 + torch.exp(-x))
    except:
        out = 1 / (1 + np.exp(-x))
    finally:
        return out

In [None]:
# Random seed is selected to ensure reproducibility
np.random.seed(42)
torch.manual_seed(42)

## Gradient calculation in numpy and pytorch

### _Forward pass_

Next, the calculation of the forward pass is broken down step by step, both in numpy and in pytorch for the case of the perceptron. The goal is not really to learn how to do this step by step, but rather to lay the groundwork for understanding how automatic gradient calculation works in pytorch.

<img src="https://drive.google.com/uc?export=view&id=1as6Vm-uivPHatB_ly73LlxEr2n9wHPDw" width=600>

In [None]:
# A test dot with three coordinates is created. The third coordinate is 1,
# and is used to calculate the weight of the bias.

# This is a case analogous to the one seen in the previous lab. In this case,
# for clarity, instead of an array of dots, the calculation is done with
# a single dot.

# Dot x in numpy
np_x = np.array([[0.5, 0.7, 1.0]], dtype=np.float32)
print('np_x shape: {}'.format(np_x.shape))

# Dot x in pytorch
tr_x = torch.from_numpy(np_x)
print('tr_x shape: {}'.format(tr_x.shape))

In [None]:
# Some initial weights are defined to put them into the matrix of weights W
weight1 = 0.1
weight2 = 1.0
bias = -0.6

# An np-array and a tensor are created with the initial weights
tr_W = torch.tensor([[weight1], [weight2], [bias]], dtype=torch.float32)
np_W = tr_W.numpy().copy()

print('np_W shape:{}\n{}\n'.format(np_W.shape, np_W))
print('tr_W shape:{}\n{}'.format(tr_W.shape, tr_W))


Tensor objects, unlike numpy-array objects, have a property called `requires_grad`. This property, which returns a Boolean value, indicates whether automatic gradient calculation is enabled or not. In this case, we're
interested in it being enabled. It is set to `True` since when building the tensors this property comes by default to `False`.

In [None]:
# The requires_grad property of this tensor is activated
# (the default value is disabled)
tr_W.requires_grad = True

In [None]:
# The linear combination h is calculated

# Case numpy
np_h = np.dot(np_x, np_W)
print('np_h value: {:.6f}'.format(float(np_h)))

# Case pytorch
tr_h = torch.mm(tr_x, tr_W)
print('tr_h value: {:.6f}'.format(float(tr_h)))

In [None]:
# The predicted output is calculated by applying the activation function
np_yp = sigmoid(np_h)
print('numpy yp value: {:.6f}'.format(float(np_yp)))

tr_yp = sigmoid(tr_h)
print('pytorch yp value: {:.6f}'.format(float(tr_yp)))

In [None]:
# The current correct output is selected.

np_y = np.zeros((1, 1), dtype=np.float32)
print('np_yp shape: {}'.format(np_y.shape))

tr_y = torch.from_numpy(np_y)
print('tr_yp shape: {}'.format(tr_y.shape))

### Loss functions (lossFcn)

In the previous lab, a loss function was used that calculated the squared error. In this case, both numpy and pytorch will do the same thing; however in pytorch a predefined function will be used.

To use the default loss functions in pytorch see the following [link](https://pytorch.org/docs/stable/nn.html#loss-functions)

In [None]:
# Calculation of losses in numpy
# As losses, as was done in the previous practice, the squared differences are
# used. In this case, the scaling factor 2 is eliminated.
np_loss = np.square(np_y - np_yp)
print('loss vale: {:.6f}'.format(float(np_loss)))

In [None]:
# Calculation of losses in pytorch

# Import the neural networks module
from torch import nn

# The MSELoss function will be used, which means Mean Squared Error Loss,
# that is, the loss function that calculates the mean squared error. This is
# the same as what was done in numpy in the previous cell
criterion = nn.MSELoss()
tr_loss = criterion(tr_yp, tr_y)
print('loss vale: {:.6f}'.format(float(tr_loss)))

### Backpropagation

Now it's time to start backpropagation. As a reminder and as a check of the result, it is done first step by step in numpy and afeterwards it is done in pytorch.

In [None]:
# Backpropagation

# Output derivative
np_dyp_dh = np_yp * (1 - np_yp)

# Error term delta
np_delta = -2 * (np_y - np_yp) * np_dyp_dh

# W increments (prior to multiplication by lr)
np_incW = np.dot(np_x.T, np_delta)

print('The W increment is:\n{}'.format(np_incW))

Next, notice how backpropagation in pytorch is much easier.

Previously, the `requires_grad` property of tensor W has been enabled. Due to this, pytorch has been remembering the operations that have been carried out. When the `.backward()` method is applied to the losses, all tensors that have been passed to it with the `requires_grad` property enabled will have the corresponding gradient calculated, which will be stored in the tensor's `.grad` property.

In [None]:
# Backpropagation using pytorch
tr_loss.backward()  # Enabled tensors' gradient's calculation

# Returns the gradient of the tensor tr_W
print('The W increment is:\n{}'.format(tr_W.grad))

Finally, with the calculated gradients, we would just need to update the weights and continue with the training. Later on, it will be seen how to do this automatically.

## Building neural networks in pytorch

Pytorch provides a module for creating neural network type objects. The bases for its use are developed below.

First of all, don't forget that you must have imported the nn module from torch. It is common to directly name it as nn.

The class to be defined will be a child class of the nn.Module class.

At least two methods must be defined:
* The `__init__` method: It will define the different components of the neural network; that is, its layers. By overriding this method, you must necessarily use `super()` to import the properties of the parent class.
* The `forward` method: This method is the one that will be in charge of doing all the calculations of the forward step.

Let's see a simple example.

In [None]:
# Example of a simple neural network

# First, the class to be created is indicated, which will inherit all its
# attributes from the parent class nn.Module
class Network_1(nn.Module):
    # The __init__ method is defined, which is the one that initializes the
    # current instance of the object
    def __init__(self):
        # Attributes of the parent class are imported
        super().__init__()
        
        # The different types of layers that will be in the neural network are
        # defined as modules. See https://pytorch.org/docs/stable/nn.html
        
        # The first layer is a linear layer; that is, made up of perceptrons.
        # We named it fc for 'fully connected'. It is also called a dense layer.
        # Notice that this layer only does the linear combination; that is, it
        # does not have any activation function.
        self.fc = nn.Linear(in_features=2, out_features=1, bias=True)
        
        # Activation layer. We use the sigmoid function layer.
        self.sigmoid = nn.Sigmoid()
    
    # The method for the forward pass is defined
    def forward(self, x):
        # The inputs will be propagated forward through all the defined layers.
        # The behavior is specified by each function.
        x = self.fc(x)
        x = self.sigmoid(x)
        
        return x

In this same exercise, in a previous task, the output of a neural network consisting of only one perceptron has been calculated, based on an input with two coordinates. In that case, the coordinates of dot x were 0.5 and 0.7 respectively (before, a third coordinate with value 1 has been set to emulate the bias), the weights were 0.1, 1.0 and -0.6 (the latter is the bias). The result obtained at the output has been ŷ = 0.537430. Now let's compare with the functionality of the new network.

In [None]:
# A new object if the class Network_1 is instantiated
model = Network_1()

### Objects exploration

Before we can continue with the emulation of the previous result, it is necessary the initial weights to be entered in the network. Weight initialization will be discussed in detail later; but now we are going to see how this verification could be done quickly.

In general, it is convenient to know how to explore objects in python, since there can be a lot of hidden information in them that can be extracted without having to go looking for the specifications or the source code. If you type `dir(model)` you will get a list of all the methods and properties of the given object. In principle you can skip all the ones that start with an underscore, and go directly to the others. You will see that there is a property called `fc`, which refers to the dense layer that has been defined in the `__init__` method by doing `self.fc = ...`

In [None]:
# In this case, instead of using dir(), you can use a helper function (open
# the helper_PR3.py file for details). This function makes use of 'dir()', but
# it makes it easier to get just the properties or methods of an object.

# print(dir(model))
print(hp.inspect_obj(model)['methods'])

Obviously the weights have to be somewhere in `model.fc`. You can continue exploring it with dir() and type(), or keep using the help function.

In [None]:
# Explore the fc layer
# Note how it has a property called weight and another called bias. These are
# the weights we're searching for.

# print(dir(model.fc))
print(hp.inspect_obj(model.fc)['attributes'])

In [None]:
# Determine the type of object for model.fc.weight and model.fc.bias
print(type(model.fc.weight))
print(type(model.fc.bias))

# If possible, print out the contents of model.fc.weight
print(model.fc.weight)

In [None]:
# These are objects of type Parameter. Beyond the specifications
# of this type of object, it is clear that they contain a tensor whose values
# constitute the weights we are looking for. Let's see what properties they have. 
# Some of them will allow us to access the weights.

# print(dir(model.fc.weight))
print(hp.inspect_obj(model.fc.weight)['attributes'])

In [None]:
# You'll see that it has a lot of properties and methods, many of which look
# like tensor methods.

# There is a property called 'data'. This property contains the data we are
# looking for.
print(type(model.fc.weight.data))

# As you can see, it's a tensor
print(model.fc.weight.data)

In [None]:
# Now, since we already know how to create tensors, we are ready to replace
# the given weight tensor with the one we consider appropriate

# A tensor of the required dimensions is created
W = torch.ones_like(model.fc.weight.data, dtype=torch.float32)

# Se le asignan los valores que se quieren probar
W[0, 0], W[0, 1] = 0.1, 1.0

# This tensor is assigned to the model weights
model.fc.weight.data = W

In [None]:
# Do the same with the bias
# Create a tensor of the required dimensions
b = torch.ones_like(model.fc.bias.data, dtype=torch.float32)

# Assign the values to be tested
b[0] = -0.6

# Assign this tensor to the model weights
model.fc.bias.data = b

### Checking _forward pass_

At this point you are ready to carry out the test in order to verify that the network works as expected. The expected output is `ŷ = 0.537430`

In [None]:
# Se define el punto x de prueba
x = torch.tensor([[0.5, 0.7]], dtype=torch.float32)

# Forward pass is carried out
output = model.forward(x)

print('Expected output = 0.537430   Real output = {:.6}'.format(float(output.data)))

At this point you are already sure that the model works perfectly with regard to the forward pass.

### Other ways to define the model

The explained way of defining the neural network model is not the only one that can be done in pytorch. There are other ways to do it that at some point may be convenient, especially when it comes to small networks, as in this case.

The first alternative shown below is to use the activation functions as functions and not as layers (as they were defined in the previous model). The code would be as shown below.

In [None]:
# Simple neural network example defining activation functions with "functional"

class Network_2(nn.Module):
    # The __init__ method is defined, which is the one that initializes the
    # current instance of the object
    def __init__(self):
        # The attributes of the parent class are imported
        super().__init__()
        
        # The linear layer is defined.
        # ACTIVATION LAYERS ARE NOT DEFINED
        self.fc = nn.Linear(in_features=2, out_features=1, bias=True)

    
    # The method for the forward pass is defined
    def forward(self, x):
        # The activation function is applied to the output of the dense layer
        x = torch.sigmoid(self.fc(x))
        
        return x

model_2 = Network_2()

There is a third, even simpler way of defining the model. In this case, a new class is not created explicitly, but the model would be created directly using pytorch tools.

It consists of specifying the different layers directly in the call.

In [None]:
# First of all we import OrderedDict
from collections import OrderedDict

# nn.Sequential is used to specify each layer, being able to refer to each
# one of them by name
model_3 = nn.Sequential(OrderedDict([('fc', nn.Linear(2, 1)),
                                     ('sigmoid', nn.Sigmoid())]))

Note that the above way is much faster for small networks. In fact, it wasn't really necessary to use an ordered dictionary; the definition could be done directly, without reference to layer names. It would be as follows:
```
model = nn.Sequential(nn.Linear(2, 1),
                      nn.Sigmoid())
```

Logically, for any network of a certain entity, it is recommended to use the versions that define a new class.

Let's see below how all versions work the same.

In [None]:
# Version 1:

W = model.fc.weight.data.clone()
b = model.fc.bias.data.clone()
print(model.forward(x))

In [None]:
# Version 2:

model_2.fc.weight.data = W.clone()
model_2.fc.bias.data = b.clone()
print(model_2.forward(x))

In [None]:
# Version 3:

model_3.fc.weight.data = W.clone()
model_3.fc.bias.data = b.clone()
print(model_3.forward(x))

## Weights initialization

In the previous exercises you have seen that, in order to reproduce the same results, the weights were initialized to certain values. In the networks seen in this course, this initialization may not be very relevant; however, in large networks that take a long time to train, it is desirable to initialize the values in such a way that the training time can be reduced.

In general, it is considered that in a linear network the best initialization is one in which the weights will be close to 0, but without being too small or equal to each other. So, for example, an initialization with all weight values set to zero would not be a good idea.

When defining a neural network with linear layers in pytorch the weights are initialized by default following a given distribution; however, there is the possibility of initializing the values of the weights according to various distributions. This is done by using `nn.init`. See all the options in: https://pytorch.org/docs/1.9.1/nn.init.html?highlight=init


In general, the initialization of weights does not cause problems; however, note that the weights are tensors with the _requires_grad_ property enabled by default. Any manipulation in them once the training has started may have an effect on the calculation of the gradient. Therefore, it is a good practice, if this is the case, to disable the calculation of gradients. This can be done directly or by making the changes within a block `with torch.no_grad():`
```
with torch.no_grad():
    model.fc.weight = nn.Parameter(torch.randn_like(model.fc.weight))
```

In [None]:
# Let's look at the set of weights of the model object
print(model.fc.weight)

In [None]:
# Initialization of values following a uniform distribution
nn.init.uniform_(model.fc.weight.data)

# Notice that this instruction is executed in-place; that is, the tensor
# has been modified
print(model.fc.weight)

In [None]:
# As an example let's see other possible initializations
# Initialization of values following a normal distribution
nn.init.normal_(model.fc.weight.data)
print(model.fc.weight)

# Initialization of values following a constant distribution
nn.init.constant_(model.fc.weight.data, 0.5)
print(model.fc.weight)

# Initialization of values following a kaiming uniform distribution
nn.init.kaiming_uniform_(model.fc.weight.data)
print(model.fc.weight)

Once you see this, you are ready to answer the following question:

What initial distribution does pytorch default to linear layers, and with what parameters?

For this, the following [link](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/linear.py) is provided, where you can find the source code of the linear layers .

**Answer:**

It initializes them with the kaiming_uniform function. You just have to follow the code.

On the other hand, you have seen that the nn.init function is used to initialize values of the weights of a single layer. In the most common case you will have more than one layer. To do the initialization of all layers you can pass a custom initialization function to `nn.Module.apply`

In [None]:
# Custom function for initializing weights in linear layers
def init_linear_weights(my_module):
    # Check if it is a linear layer instance
    if isinstance(my_module, nn.Linear):
        torch.nn.init.kaiming_normal_(my_module.weight.data)
        torch.nn.init.constant_(my_module.bias.data, 0.05)

# The way to apply the function would be
model.apply(init_linear_weights)

## The training

At this point it seems that all the ingredients are already there to be able to do a basic training. You already know how to operate with tensors, define neural networks, do the forward pass and the backward pass. We also know how to initialize the weights if required. So let's go.


### Problem definition
In principle, the first idea is to do basic training. In this case we will try to adjust a network so that it can indicate if a determined dot of the plane belongs to one class or another. This is the same problem that was developed in the previous lab.

In [None]:
import importlib
import helper_PR3_0
importlib.reload(helper_PR3_0)

In [None]:
# Creation of a cloud of dots.
# dots_gt is short for dots ground truth
# dots_tst is short for dots test
np.random.seed(0)

# Two np-array with shapes 3 x n_dots are generated
dots_tst, dots_gt = hp.p_gen3(100)

# Data adaptation (this is to fit the data type)
# They are cast to float32
# dots_gt = dots_gt.astype(dtype=np.float32)
# dots_tst = dots_tst.astype(dtype=np.float32)

# Visualization of the cloud
hp.my_plot(dots_gt, dots_gt)

### The inputs
The input to the system will be a dot tensor. Right now we have the numpy-array called _dots_gt_. It is a matrix with 3 rows and 100 columns. In the first row there are the x coordinates of the dots, in the second row there are the y coordinates, while in the third row there are the labels (0 -> red, 1 -> blue)

In [None]:
# The system inputs will be the x and y coordinates, which correspond to rows 0
# and 1 of the np-array dots
print(dots_gt.shape)

# Let's look at the first few points. Notice how the points are already
# mixed up; that is, not all the reds are together, but both red and blue
# appear interchangeably
print(dots_gt[:, :13])

In [None]:
# In order to emulate loading data in batches, we are going to resize the inputs
# considering batch = 20 (the dots will enter in batches of 20 elements)
inputs = torch.from_numpy(dots_gt[:2, :]).reshape(2, 20, -1).permute(2, 1, 0)
labels = torch.from_numpy(dots_gt[2, :]).reshape(1, 20, -1).permute(2, 1, 0)

# There will be 5 batches of 20 points each batch with two coordinates each
# point so that a scan can be made by the first coordinate of inputs and outputs
print(inputs.shape)

### The neural network

At this point we are going to see the behavior of the most basic network possible, the perceptron.

In [None]:
class Network_01(nn.Module):

    def __init__(self):
        super().__init__()
        # Dense layer
        self.fc1 = nn.Linear(in_features=2, out_features=1, bias=True)
        # Activation layer
        self.sigmoid = nn.Sigmoid()


    def forward(self, x):
        x = self.sigmoid(self.fc1(x))
        return x

SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)
model = Network_01()

### The optimizer

Backpropagation has been explained at the time; but the pytorch example ended at the point where the weight gradients were calculated; that is, the algorithm was not finished.

Pytorch has a certain type of object that is in charge of applying a specific optimization algorithm to the indicated tensors. It also controls a series of additional parameters such as the learning rate (_learning rate_) or the reset of gradients. This type of object is the [_optimizer_](https://pytorch.org/docs/1.10.1/optim.html?highlight=optimizer#torch.optim.Optimizer)


In [None]:
from torch import optim

# We will use the optimizer with the SGD algorithm (Stochastic Gradient Descent)
optimizer = optim.SGD(model.parameters(), lr=0.1)

The optimizer will control the first and the last step of the algorithm. The first is to reset the current epoch gradients (with `optimizer.zero_grad()`) and the last is an optimization step and update weights (with `optimizer.step()`.

### The loss function

In this case, as seen in the previous examples, we will use the root mean square error.

In [None]:
criterion = nn.MSELoss()

### The algorithm

Let's go with the algorithm. A detail to take into account is that in each season you have to reset the gradients (set them to zero). This is necessary because we only want them to be calculated on the operations of each single epoch, and not on all the accumulated history.

In [None]:
epochs = 1000

for epoch in range(epochs):
    batch_loss = 0.0
    for Xb, Yb in zip(inputs, labels):
        # Training step
        #_______________________________________________________________________
        # Reset gradients
        optimizer.zero_grad()

        # Forward pass
        Yp = model.forward(Xb)

        # Calculate loss
        loss = criterion(Yp, Yb)

        # Calculate gradients
        loss.backward()

        # Update weights
        optimizer.step()
        #_______________________________________________________________________

        batch_loss += loss.item()
    
    print('Epoch {}: loss = {}'.format(epoch, batch_loss))


In [None]:
# # The first test training is already done. Let's see graphically what
# the result is.

# The test dots (first two rows) are picked up and passed to tensors to be able
# to calculate the output of the model
t_test_points = torch.from_numpy(dots_tst[:2, :].T)

# The model outputs are calculated
with torch.no_grad():
    model.eval()
    y_test = model.forward(t_test_points)

# Tensors are transformed into np-arrays for the my_plot function
y_test = y_test.numpy().round().T
test_block = np.concatenate((dots_tst[:2, :], y_test), axis=0)
hp.my_plot(test_block, dots_gt)

What you see above is the result of the fit achieved. The blue area should contain the dark blue dots, while the red area should contain the dark red dots. It is clear that it is not so; this fit is not good, but maybe it's not that bad for a first approximation, without making any consideration about the learning rate, the complexity of the network or the number of epochs to train.

## Final considerations

###  _train_ / _eval_ modes

Pytorch models have a _train_ (training) mode and an _eval_ (evaluation) mode. The difference is that in evaluation mode some layers that are only used for training are disabled (for example, the Dropout layers, which in this practice have not yet been used). You will have seen in the function _show_adjust_ above that it is put into evaluation mode; It is for the stated reason. By default, when a model is created it is in _train_ mode, but it is convenient to put it explicitly.

The way to do it is by doing:
 * `model.train()`  -> Select train mode
 * `model.eval()`   -> Select evaluation mode


### Work on CPU/GPU

It has been said at the beginning of the practice that pytorch is optimized to work with GPUs. To do this, all you have to do is pass the input data tensors and the model to the GPU, and train on it.
  Be careful, it is important to take into account that a tensor that is in the GPU can only be operated with other tensors that are in the GPU, through a model that is in the GPU. The same is applicable if the tensor is on the CPU. In short, either everything is on one side or it is on the other; but not in both:
 
 * model.cuda()  --> Move the model to the GPU (can also be applied to a tensor: tensor.cuda())
 * model.to('cuda')  --> Move the model to the GPU
 * model.cpu()  --> Move the model to the CPU
 * model.to('cpu')  --> Move the model to the CPU
 * torch.cuda.is_available  --> Return `True` wether GPU is available

Possibly the most general way to deal with the location of the tensors and the model on one or the other device (CPU or GPU) is as follows:

```
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
inputs.to(device)
labels.to(device)
```

In [None]:
# Define the global variable 'device' to be used where appropriate
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(DEVICE)