# 1. PyTorch AI Summer

In this article, we will give a brief introduction about PyTorch framework and we'll discuss the 2 main features : `Torch.Tensor` and `autograd`.

# 1.1 Introduction to Pytorch


PyTorch is an open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing.
Torch is an open-source machine learning library, a scientific computing framework, and a scripting language based on the Lua programming language. It provides a wide range of algorithms for deep learning, and uses the scripting language LuaJIT, and an underlying C implementation. As of 2018, Torch is no longer in active development. PyTorch is primarily developed by Facebook's AI Research lab (FAIR). It is free and open-source software released under the Modified BSD license. Although the Python interface is more polished and the primary focus of development, PyTorch also has a C++ interface.

Many pieces of Deep Learning applications are built on top of PyTorch, including Uber's Pyro, HuggingFace's Transformers, and Catalyst.

PyTorch provides two high-level features:

  - Tensor computing (like NumPy) with strong acceleration via graphics processing units (GPU)
  - Deep neural networks building with an automatic differential system


PyTorch and Torch use C libraries containing all the same performance: TH, THC, THNN, THCUNN, and they will continue to share these libraries.

This answer is very clear, in fact, PyTorch and Torch both use the same bottom layer, but have different upper packaging languages.
You can reuse your favourite Python packages such as NumPy, SciPy and Cython to extend PyTorch when needed.

- [GitHub](https://github.com/pytorch/pytorch)
- [WebPage](https://pytorch.org/)

<!---
## 1.1.1 Compare PyTorch and Tensorflow
There is no such thing as what's the best framework, but only which is more suitable. [This article](https://zhuanlan.zhihu.com/p/28636490) has a simple comparison, so I won’t go into details here.
And the technology is developed, and the comparison is not absolute.
For example, Tensorflow introduced the Eager Execution mechanism to implement dynamic graphs in version 1.5, PyTorch visualization, windows support, and tensor flips along the dimension have all been issues. Not a problem.
-->

- PyTorch is a very simple, elegant, efficient and fast framework
- The design pursues the least package, and try to avoid re-creating the wheels
- It has the most elegant object-oriented design in all frameworks, and the design is most in line with people's thinking. It allows users to focus on implementing their own ideas as much as possible.
- Big  support, similar to Google ’s Tensorflow, FAIR support is enough to ensure PyTorch gets continuous development updates
- Good documentation (compared to other FB projects, PyTorch's documentation is almost perfect), a forum personally maintained by PyTorch authors for users to communicate and ask questions
- Easy to get started with machine / deep learning

So if the above information has something that appeals to you, then be sure to finish reading this article.

# PyTorch Tensor operations
<a href="https://colab.research.google.com/github/iliasprc/pytorch-tutorials/blob/master/chapter1/2_autograd_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



Getting Started
---------------
Now we will learn how to create Tensors as well as the main functions and operations.

In [1]:
from __future__ import print_function
%matplotlib inline
import torch

## Tensors basic operations
Tensors are very similar to NumPy’s ``ndarrays``, except that they can also be used on a GPU device to accelerate computing.
In the following figure, you can see a visualization of Tensors with different dimensions

![tensor](images/Pytorch_book_chapter1_tensor.jpg)


To construct a 5x3  uninitialized matrix you can use `` torch.empty(5,3)`` function:

In order to find the tensor's dimensions use
`` tensor.shape`` or ``tensor.size()`` that returns `[5,3]`(``torch.Size`` is in fact a tuple, so it supports all tuple operations)
and  ``tensor.dtype`` to find tensor´s data type i.e. `int,float,....`



In [2]:
x = torch.empty(5, 3)
print("{}\n Tensor shape = {}\n Data type = {} ".format(x,x.shape,x.dtype))

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
 Tensor shape = torch.Size([5, 3])
 Data type = torch.float32 


To initialize randomly a 2D matrix:
use ``torch.rand()``



In [3]:
x = torch.rand(5, 3)

print(x)

tensor([[0.4214, 0.1018, 0.3309],
        [0.6035, 0.1967, 0.5448],
        [0.1855, 0.5156, 0.5147],
        [0.7176, 0.1810, 0.7158],
        [0.4004, 0.2005, 0.9869]])


You can also construct a matrix filled with zeros:



In [4]:
x = torch.zeros(5, 3)
print(x)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])


If you have a list of numbers you can also convert it directly  to a tensor as follows:

In [5]:
x = torch.tensor([5.5, 3])
print(x)

tensor([5.5000, 3.0000])


Now we'll see how to create a tensor based on an existing tensor. These methods
will reuse properties of the input tensor, e.g. `dtype`, unless
new values are provided by user


In [6]:
x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)
x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x)               # result has the same size



tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[ 1.0079,  1.8776, -0.3918],
        [ 1.4096, -1.5037, -2.3139],
        [ 0.8209, -1.0360,  0.8332],
        [ 1.8210, -2.3557,  2.0603],
        [-0.5113, -0.0757,  1.6113]])



## Operations
With tensors, you can do all arithmetic and logical operations similar to Numpy.
There are multiple syntaxes for those operations. In the following
example, we will take a look at the addition operation.

First syntax of matrix addition`(x+y)`

![](images/chapter_1_1_tensors_addition.png)


In [7]:
y = torch.rand(5, 3)
c = x + y
print("{}\n Tensor shape = {}\n Data type = {} ".format(c,c.shape,c.dtype))

tensor([[ 1.0510,  2.4556,  0.0803],
        [ 1.6055, -1.4762, -2.1654],
        [ 1.7307, -0.0573,  1.5550],
        [ 2.7760, -1.5332,  2.8199],
        [-0.3677,  0.1496,  1.7654]])
 Tensor shape = torch.Size([5, 3])
 Data type = torch.float32 


An alternative command is `torch.add(x,y)`:

In [8]:
c = torch.add(x, y)
print(x)

tensor([[ 1.0079,  1.8776, -0.3918],
        [ 1.4096, -1.5037, -2.3139],
        [ 0.8209, -1.0360,  0.8332],
        [ 1.8210, -2.3557,  2.0603],
        [-0.5113, -0.0757,  1.6113]])


or you can provide an output tensor as an argument:

In [9]:
c = torch.empty(5, 3)
torch.add(x, y, out=c)

print(x)

tensor([[ 1.0079,  1.8776, -0.3918],
        [ 1.4096, -1.5037, -2.3139],
        [ 0.8209, -1.0360,  0.8332],
        [ 1.8210, -2.3557,  2.0603],
        [-0.5113, -0.0757,  1.6113]])


In addition, one can do in-place operations.



In [10]:
# adds x to y
y.add_(x)
print(y)

tensor([[ 1.0510,  2.4556,  0.0803],
        [ 1.6055, -1.4762, -2.1654],
        [ 1.7307, -0.0573,  1.5550],
        [ 2.7760, -1.5332,  2.8199],
        [-0.3677,  0.1496,  1.7654]])


Note: Any operation that mutates a tensor in-place is post-fixed with an ``_``.
For example: ``x.copy_(y)``, ``x.t_()``, will change ``x``.
Now, let's see how to retrieve any element of the Tensor. It's very simple. You can use standard NumPy-like indexing to access any element of the tensor.



In [11]:
print(x[:, 1])
print(x[0,0])
print(x[0:2,:-1])

tensor([ 1.8776, -1.5037, -1.0360, -2.3557, -0.0757])
tensor(1.0079)
tensor([[ 1.0079,  1.8776],
        [ 1.4096, -1.5037]])


If you have an one element tensor, you can use  ``.item()`` to get the value as a
Python number.



In [12]:
x = torch.randn(1)
x.item()

0.3687810003757477

Create tensor within a range from 0 to N using `torch.arange(N)`.

![](images/chapter_1_1_tensors_range.png)

In [13]:
x = torch.arange(10)
x

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Create an identity matrix
![](images/chapter_1_1_tensors_identity.png)

In [14]:
x = torch.eye(3, 3)
x

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])



NumPy Conversion
------------

Converting a Torch Tensor to a NumPy array and vice versa is very easy.

The Torch Tensor and NumPy array will share their underlying memory
locations, and changing one will change the other.

Converting a Torch Tensor to a NumPy Array




In [15]:
a = torch.ones(5)
print(a)
b = a.numpy()
print(b)

tensor([1., 1., 1., 1., 1.])
[1. 1. 1. 1. 1.]


See how the numpy array b changed its values after a change in a



In [16]:
a.add_(1)
print(a)
print(b)

tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]


Converting numpy arrays to torch tensors

See how changing the np array changed the torch tensor automatically



In [17]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


All the tensors on the CPU except a Chartensor support converting to
NumPy and back.

CUDA Tensors
------------

Tensors can be moved onto any device using the ``.to()`` method.

Let's run the following  cell to check if any CUDA-capable device is available.
Then we  will use ``torch.device`` objects to move tensors in and out of the GPU.

In [18]:

if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to()`` can also change dtype together!

tensor([[2., 1., 1.],
        [1., 2., 1.],
        [1., 1., 2.]], device='cuda:0')
tensor([[2., 1., 1.],
        [1., 2., 1.],
        [1., 1., 2.]], dtype=torch.float64)


You can also use ``.cpu()`` and ``.cuda()`` to transfer tensors between cpu and gpu memory


That was the first tutorial on Tensor's basic functions.
In the next section, we will learn about
PyTorch's automatic differentiation package, `autograd`.
  For more  Tensor operations, including transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers, etc.,
  check the following link  `https://pytorch.org/docs/torch`.


# Autograd: Automatic Differentiation
===================================

The main advantage of PyTorch framework is the ``autograd`` package.
Let’s  briefly describe this and then we will learn how to train our
first neural network in the following articles.
The ``autograd`` package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backpropagation is
defined by how your code is developed and ran, as well as that every single iteration can be
different.

Let's see this in more details with some examples.

Tensor
--------

``torch.Tensor`` is the main class of the package. If you set its attribute
``.requires_grad`` as ``True``, it starts to track all tensor's operations. When
you finish your computation you can call ``.backward()`` and have all the
gradients computed automatically. The gradient for this tensor will be
accumulated into ``.grad`` attribute.

To stop a tensor from tracking its history, you can call ``.detach()`` to detach
it from the computation history, and to prevent future computation from being
tracked. To prevent tracking history (and using memory), you can also wrap the code block
in ``with torch.no_grad():``. This can be particularly helpful when evaluating a
model because the model may have trainable parameters with `requires_grad=True`,
but for which we don't need the gradients. There’s one more class which is very important for autograd
implementation - a ``Function``.

``Tensor`` and ``Function`` are interconnected and build up an acyclic
graph, that encodes a complete history of computation. Each tensor has
a ``.grad_fn`` attribute that references a ``Function`` that has created
the ``Tensor`` (except for Tensors created by the user - their
``grad_fn is None``).

If you want to compute the derivatives, you can call ``.backward()`` on
a ``Tensor``. If ``Tensor`` is a scalar (i.e. it holds a one element
data), you don’t need to specify any arguments to ``backward()``.
However, if the tensor has more elements, you need to specify a ``gradient``
argument that is a tensor of the same shape to your network's output.



In [19]:
import torch
torch.manual_seed(0)

<torch._C.Generator at 0x7f4ca9cb7930>

Then, create a tensor and set `requires_grad=True` to track computation history.



In [20]:
x = torch.ones(2, 2, requires_grad=True)
x

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

Now, let's do an operation with the created tensor:

In [21]:

y = 2 * x
y

tensor([[2., 2.],
        [2., 2.]], grad_fn=<MulBackward0>)

``y`` was created as a result of a multiplication, so it has a ``grad_fn`` attribute that references to a function .
The ``grad_fn`` will be an MulBackward0 object that confirms that operation


In [22]:
y.grad_fn

<MulBackward0 at 0x7f4c18c94828>

You can do more operations on ``y`` and the tensors will still track the history
of those operations:


In [23]:
z = 3 * y * y
out  = z.mean()
out

tensor(12., grad_fn=<MeanBackward0>)

``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
flag in-place. The input flag defaults to ``False`` if not given.


In [24]:
x.requires_grad_(False)
x.requires_grad

False

Gradients
---------
Let's do a backpropagation now.
Because ``out`` contains a single scalar, ``out.backward()`` is
equivalent to ``out.backward(torch.tensor([value]))``.



In [25]:
out.backward()

Now let's print the gradients $\frac{d(out)}{dx}$


In [26]:
x.grad

You should have got a matrix filled with  ``4.5``. Let’s call the ``out``
*Tensor* “$o$”.
We have that $o = \frac{1}{4}\sum_i z_i$,
$z_i = 3(2x_i)^2$ and $z_i\bigr\rvert_{x_i=1} = 12$.
Therefore,
$\frac{\partial o}{\partial x_i} = (6x_i)$, hence
$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = 6 $.


You can even calculate your own gradients and call `backward()` as:

In [27]:
x = torch.randn(3, requires_grad=True)

y = x * 2

y

gradients = torch.randn([3], dtype=torch.float)
y.backward(gradients)
x.grad

tensor([ 1.1369, -2.1690, -2.7972])

If you no longer want autograd to track the history of tensor's  operations
with ``.requires_grad=True`` , wrap the code block with the following command
``with torch.no_grad()`` or use ``.detach()`` to remove the tensor from the computation graph:



In [28]:
x = torch.randn(3, requires_grad=True)
x.requires_grad
(x ** 2).requires_grad

with torch.no_grad():
	(x ** 2).requires_grad

x = x.detach()
(x ** 2).requires_grad

False

Now we are going  to calculate the gradients of the following equations on the next figure.
All partial derivatives have been calculated using the chain rule and are also illustrated on the figure.

![derivatives](images/chapter1_autograd.png)
Now let's test with ``autograd`` if we calculated correctly all the  derivatives.

In [29]:

a = torch.tensor([1.], requires_grad=True)
b = torch.tensor([2.], requires_grad=True)
c = torch.tensor([3.], requires_grad=True)

y = b*c
u = y+a
J = (u*u).sum()

J.backward()

for i in [y,u,a,b,c]:
    print(i.grad)


None
None
tensor([14.])
tensor([42.])
tensor([28.])


  if sys.path[0] == '':


Let's see more details about the computational graph  in the following code

In [30]:
x = torch.tensor(1.0, requires_grad = True)
y = torch.tensor(2.0, requires_grad = True)
z = x * y
# Displaying
z.backward()
for i, name in zip([x, y, z], "xyz"):
    print(f"{name}\ndata: {i.data}\nrequires_grad: {i.requires_grad}\n\
grad: {i.grad}\ngrad_fn: {i.grad_fn}\nis_leaf: {i.is_leaf}\n")

x
data: 1.0
requires_grad: True
grad: 2.0
grad_fn: None
is_leaf: True

y
data: 2.0
requires_grad: True
grad: 1.0
grad_fn: None
is_leaf: True

z
data: 2.0
requires_grad: True
grad: None
grad_fn: <MulBackward0 object at 0x7f4c18c82cf8>
is_leaf: False



  


## Graph Visualization with Tensorboard


Let's visualize the graph of a simple Linear Model.
We used Tensorboard to visualize the following graph of a simple linear model.
The computation graph is shown in the following Figure.

![](images/autograd_chapter2_linear_graph.png)


In [31]:
class Y(torch.nn.Module):
    def __init__(self):
        super(Y, self).__init__()
        self.y = torch.nn.Linear(3,3)

    def forward(self, x):

        out = self.y(x)
        return out

x = torch.randn(3, requires_grad=True)

m = Y()
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
writer.add_graph(m,x)


In this section, we described the main advantages of PyTorch's ``autograd``.

**Read Later:**
For more information read the documentation of ``autograd`` and ``Function`` is at
- https://pytorch.org/docs/autograd
- https://blog.paperspace.com/pytorch-101-understanding-graphs-and-automatic-differentiation/


In the next article, we will build and train our first neural network with PyTorch.