<a href="https://colab.research.google.com/github/kcelestinomaria/Algo-python/blob/master/LearningPytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import torch

In [None]:
x = torch.Tensor(10).random_(0,10)

In [None]:
x.to("cuda")

In [None]:
x.to("cpu")

In [None]:
torch.cuda.is_available()

In [None]:
tensor_1 = torch.tensor([2,6,1,0,9])
tensor_2 = torch.tensor([[8,10,2,3,7],[2,5,67,3,1]])

In [None]:
tensor_1.shape

In [None]:
tensor_2.shape

In [None]:
tensor = torch.tensor([7,3,0,12,9]).cuda()

When using a GPU-enabled machine, the above modification is implemented to define a tensor

In [None]:
example_1 = torch.randn(3,3)
example_2 = torch.randint(low=0, high=2, size=(3,3)).type(torch.FloatTensor)

Creating dummy data using PyTorch tensors is fairly simple, similar to what you would do in NumPy. For instance, torch.randn() returns a tensor filled with random numbers of the dimensions specified within the parentheses, while torch.randint() returns a tensor filled with integers (the minimum and maximum values can be defined) of the dimensions defined within the parentheses. See the code above

As can be seen, example_1 is a two-dimensional tensor filled with random numbers, with each dimension of size equal to 3, while example_2 is a two-dimensional tensor filled with 0s and 1s (the high parameter is upper-bound exclusive), with each dimension's size equal to 3.

Any tensor filled with integers must be converted into floats so that we can feed it to any PyTorch model(That's why we used torch.FloatTensor).

The PyTorch autograd Library
The autograd library consists of a technique called automatic differentiation. Its purpose is to numerically calculate the derivative of a function. This is crucial for a concept we will learn about in the next chapter called backward propagation, which is carried out while training a neural network.

The derivative (also known as the gradient) of an element refers to the rate of change of that element in a given time step. In deep learning, gradients refer to the dimension and magnitude in which the parameters of the neural network must be updated in a training step in order to minimize the loss function

In [None]:
a = torch.tensor([5.0, 8.9], requires_grad=True)
b = torch.tensor([9.2, 1.0])

ab = ((a + b)**2).sum()
ab.backward()

In the preceding code, two tensors were created. We use the requires_grad argument here to tell PyTorch to calculate the gradients of that tensor. However, when building your neural network, this argument is not required.

Next, a function was defined using the values of both tensors. Finally, the backward() function was used to calculate the gradients.

By printing the gradients for both a and b, it is possible to confirm that they were only calculated for the first variable (a), while for the second one (b), it throws an error:

In [None]:
print(a.grad.data)

In [None]:
print(b.grad.data)

The autograd library alone can be used to build simple neural networks, considering that the trickier part (the calculation of gradients) has been taken care of. However, this methodology can be troublesome, hence the introduction of the nn module.

The nn module is a complete PyTorch module used to create and train neural networks, which, through the use of different elements, allows for simple and complex developments. For instance, the Sequential() container allows for the easy creation of network architectures that follow a sequence of predefined modules (or layers) without the need for much knowledge of defining network architectures.

This module also has the capability to define the loss function to evaluate the model and many more advanced features that will be discussed in this course.

The process of building a neural network architecture as a sequence of predefined modules can be achieved in just a couple of lines, as shown below:

In [None]:
import torch.nn as nn

model = nn.Sequential(nn.Linear(input_units, hidden_units),
                      nn.ReLU(),
                      nn.Linear(hidden_units, output_units),
                      nn.Sigmoid())
loss_funct = nn.MSELoss()

The process of building a neural network architecture as a sequence of predefined modules can be achieved in just a couple of lines, as shown here:

import torch.nn as nn
model = nn.Sequential(nn.Linear(input_units, hidden_units),
                      nn.ReLU(),
                      nn.Linear(hidden_units, output_units),
                      nn.Sigmoid())
loss_funct = nn.MSELoss()
First, the module is imported. And then, the model architecture is defined. input_units refers to the number of features that the input data contains, hidden_units refers to the number of nodes of the hidden layer, and output_units refers to the number of nodes of the output layer.

As can be seen in the preceding code, the architecture of the network contains one hidden layer, followed by a ReLU activation function and an output layer, followed by a sigmoid activation function, making it a two-layer network.

Finally, the loss function is defined as the Mean Squared Error (MSE).

Note: The most popular loss functions for different data problems will be explained throughout this course. To create models that do not follow a sequence of existing modules, custom nn modules are used. We'll introduce these later in this course.

In [None]:
input_units = 10
output_units = 1

model = nn.Sequential(nn.Linear(input_units, output_units),
                      nn.Sigmoid())

print(model)

In [None]:
#Define the loss function as the MSE and store it in a variable named loss_funct
loss_funct = nn.MSELoss()
print(loss_funct)

Above, is a successfully created single-network architecture

**The PyTorch optim Package**

The optim package is used to define the optimizer that will be used to update the parameters in each iteration (which will be further explained in the following chapters) using the gradients calculated by the autograd module. Here, it is possible to choose from different optimization algorithms that are available, such as Adam, Stochastic Gradient Descent (SGD), and Root Mean Square Propagation (RMSprop), among others

In [None]:
import torch

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

Here, the model.parameters() argument refers to the weights and biases from the model that were previously created, while lr refers to the learning rate, which was set to 0.01.

Weights are the values that are used to determine the level of importance of a bit of information in a general context. This means that every bit of information has an accompanying weight for every neuron in the network. Moreover, bias is similar to the intercept element that’s added to a linear function and is used to adjust the output from the computation of relevance in a given neuron.

The learning rate is a running parameter that’s used in optimization processes to determine the extent of the steps to be taken toward minimizing the loss function

Next, the process of running the optimization for 100 iterations is shown below, which, as you can see, uses the model created by the nn module and the gradients calculated by the autograd library

In [None]:
for i in range(100):
  #We make a call to the model to perform a prediction
  y_pred = model(x)

  #Calulation of loss function based on y_pred and y
  loss = loss_funct(y_pred, y)

  #Zero the gradients so that the previous ones do not accumulate
  optimizer.zero_grad()

  #Calculate the gradients of the loss function
  loss.backward()

  #Call to the optimizer to perform an update of the parameters


For each iteration, the model is called to obtain a prediction (y_pred). This prediction and the ground truth values (y) are fed to the loss functions in order to determine the ability of the model to approximate to the ground truth.

Next, the gradients are zeroed, and the gradients of the loss function are calculated using the backward() function.

Finally, the step() function is called to update the weights and biases based on the optimization algorithm and the gradients calculated previously.

Next, we will learn how to train the single-layer network from the previous exercise, using PyTorch's optim package. Considering that we will use dummy data as input, training the network won't solve a data problem, but it will be performed for learning purposes.

In [None]:
import torch
import torch.optim as optim
import matplotlib.pyplot as plt

#Let's create dummy data, x being the input and y being the output
x = torch.randn(20, 10)
y = torch.randint(0, 2, (20, 1)).type(torch.FloatTensor)

input_units = 10
output_units = 1

model = nn.Sequential(nn.Linear(input_units, output_units),
                      nn.Sigmoid())


#We are going to use the Adam optimizer
optimizer = optim.Adam(model.parameters(), lr=0.01)


losses = []
for i in range(20):
  y_pred = model(x)

  loss = loss_funct(y_pred, y)

  losses.append(loss.item())

  optimizer.zero_grad()

  loss.backward()

  optimizer.step()

  if i%5 == 0:
    print(i, loss.item())


In [None]:
%matplotlib inline
plt.plot(range(0, 20), losses)
plt.show()