# Pytorch part 1 - tensor and Pytorch tensor
> This notebook covers the fundamental part of deep learning -tensor and overview of Pytorch as well.
- toc: true 
- badges: true
- comments: true
- categories: [self-taught]
- image: images/pytorch_ava.png

## Section 1: Introducing Pytorch

PyTorch is a `deep learning framework` and a `scientific computing package`.   
The scientific computing aspect of PyTorch is primarily a result PyTorch’s tensor library and associated tensor operations.

PyTorch tensors and their associated operations are very similar to numpy n-dimensional arrays. A tensor is actually an n-dimensional array. For example, PyTorch `torch.Tensor` objects that are created from numpy ndarray objects, share memory. This makes the transition between PyTorch and NumPy very cheap from a performance perspective.

With PyTorch tensors, GPU support is built-in. It’s very easy with PyTorch to move tensors to and from a GPU if we have one installed on our system.
Tensors are super important for deep learning and neural networks because they are the data structure that we ultimately use for building and training our neural networks.

The initial release of PyTorch was in October of 2016, and before PyTorch was created, there was and still is, another framework called Torch which is also a machine learning framework but is based on the Lua programming language. The connection between PyTorch and this Lua version, called Torch, exists because many of the developers who maintain the Lua version are the individuals who created PyTorch.

> Note: Facebook Created PyTorch


These are the primary PyTorch components we’ll be learning about and using as we build neural networks in this series.

| Package             |                                                                            Description                                                                           |
|:---------------------|:----------------------------------------------|
| torch               |                                                         The top-level PyTorch package and tensor library.                                                        |
| torch.nn            | A subpackage that contains modules and extensible classes for building neural networks.                                                                          |
| torch.autograd      | A subpackage that supports all the differentiable Tensor operations in PyTorch.                                                                                  |
| torch.nn.functional | A functional interface that contains typical operations used for building neural networks like loss functions, activation functions, and convolution operations. |
| torch.optim         | A subpackage that contains standard optimization operations like SGD and Adam.                                                                                   |
| torch.utils         | A subpackage that contains utility classes like data sets and data loaders that make data preprocessing easier.                                                  |
| torchvision         | A package that provides access to popular datasets, model architectures, and image transformations for computer vision.                                          |

### Why Use PyTorch For Deep Learning?
- it is a thin framework, which makes it more likely that PyTorch will be capable of adapting to the rapidly evolving deep learning environment as things change over time.
- stays out of the way and this makes it so that we can focus on neural networks and less on the actual framework. When we build neural networks with PyTorch, we are super close to programming neural networks from scratch. When we write PyTorch code, we are just writing and extending standard Python classes, and when we debug PyTorch code, we are using the standard Python debugger.

PyTorch's development is guided by the following list:
- Stay out of the way
- Cater to the impatient
- Promote linear code-flow
- Full interop with the Python ecosystem
- Be as fast as anything else


PyTorch’s design is modern, Pythonic, and thin. The source code is easy to read for Python developers because it’s written mostly in Python, and only drops into C++ and CUDA code for operations that are performance bottlenecks.

Overall, PyTorch is a great tool for deepening our understanding of deep learning and neural networks.

### Why PyTorch is great for deep learning research

The reason for this research suitability is that Pytorch use dynamic computational graph, in contrast with tensorfow which uses static computational graph, in order to calculate derivatives.

Computational graphs are used to graph the function operations that occur on tensors inside neural networks.
These graphs are then used to compute the derivatives needed to optimize the neural network. Dynamic computational graph means that the graph is generated on the fly as the operations are created. Static graphs that are fully determined before the actual operations occur.

It just so happens that many of the cutting edge research topics in deep learning are requiring or benefiting greatly from dynamic graphs.

### Installing PyTorch 
The recommended best option is to use the Anaconda Python package manager. With Anaconda, it's easy to get and manage Python, Jupyter Notebook, and other commonly used packages for scientific computing and data science, like PyTorch!

Let’s go over the steps:

- Download and install [Anaconda](https://www.anaconda.com/distribution/) (choose the latest Python version).
- Go to [PyTorch's site](https://pytorch.org/) and find the get started locally section.
- Specify the appropriate configuration options for your particular environment.
- Run the presented command in the terminal to install PyTorch

For the example: 
`conda install pytorch torchvision cudatoolkit=10.0 -c pytorch`

Notice that we are installing both PyTorch and torchvision. Also, there is no need to install CUDA separately. The needed CUDA software comes installed with PyTorch if a CUDA version is selected in step (3). All we need to do is select a version of CUDA if we have a supported Nvidia GPU on our system.


In [69]:
!conda list torch

# packages in environment at /Users/phucnsp/anaconda3/envs/fastai2:
#
# Name                    Version                   Build  Channel
pytorch                   1.4.0                   py3.7_0    pytorch
torchsummary              1.5.1                    pypi_0    pypi
torchvision               0.5.0                  py37_cpu    pytorch


In [70]:
import torch
torch.__version__ # to verify pytorch version

'1.4.0'

In [71]:
torch.cuda.is_available() # to verify our GPU capabilities

False

### CUDA - Why Deep Learning Uses GPUs
To understand CUDA, we need to have a working knowledge of graphics processing units (GPUs). A GPU is a processor that is good at handling specialized computations. This is in contrast to a central processing unit (CPU), which is a processor that is good at handling general computations. CPUs are the processors that power most of the typical computations on our electronic devices.

A GPU can be much faster at computing than a CPU. However, this is not always the case. The speed of a GPU relative to a CPU depends on the type of computation being performed. The type of computation most suitable for a GPU is a computation that can be done in parallel.

Parallel computing is a type of computation where by a particular computation is broken into independent smaller computations that can be carried out simultaneously. The resulting computations are then recombined, or synchronized, to form the result of the original larger computation.

The number of tasks that a larger task can be broken into depends on the number of cores contained on a particular piece of hardware. Cores are the units that actually do the computation within a given processor, and CPUs typically have four, eight, or sixteen cores while GPUs have potentially thousands.

So why deep learning uses them - `Neural networks are embarrassingly parallel`.
Tasks that embarrassingly parallel are ones where it’s easy to see that the set of smaller tasks are independent with respect to each other. Many of the computations that we do with neural networks can be easily broken into smaller computations in such a way that the set of smaller computations do not depend on one another. One such example is a convolution.

### Nvidia Hardware (GPU) And Software (CUDA)
Nvidia is a technology company that designs GPUs, and they have created CUDA as a software platform that pairs with their GPU hardware making it easier for developers to build software that accelerates computations using the parallel processing power of Nvidia GPUs.
Developers use CUDA by downloading the `CUDA toolkit`. With the toolkit comes specialized libraries like `cuDNN` - the CUDA Deep Neural Network library.

With PyTorch, CUDA comes baked in from the start. There are no additional downloads required. All we need is to have a supported Nvidia GPU, and we can leverage CUDA using PyTorch. We don’t need to know how to use the CUDA API directly.

After all, PyTorch is written in all of these: Python, C++, CUDA

Suppose we have the following code:

In [72]:
t = torch.tensor([1,2,3])

The tensor object created in this way is on the CPU by default. As a result, any operations that we do using this tensor object will be carried out on the CPU.
Now, to move the tensor onto the GPU, we just write:

In [75]:
t = t.cuda()

This ability makes PyTorch very versatile because computations can be selectively carried out either on the CPU or on the GPU.



> Note: GPU Can Be Slower Than CPU. The answer is that a GPU is only faster for particular (specialized) tasks. For example, moving data from the CPU to the GPU is costly, so in this case, the overall performance might be slower if the computation task is a simple one.
Moving relatively small computational tasks to the GPU won’t speed us up very much and may indeed slow us down. Remember, the GPU works well for tasks that can be broken into many smaller tasks, and if a compute task is already small, we won’t have much to gain by moving the task to the GPU.
For this reason, it’s often acceptable to simply use a CPU when just starting out, and as we tackle larger more complicated problems, begin using the GPU more heavily.




## Section 2: Introducing Tensors
This section we'll talk all about tensors.

### Tensors - Data Structures of Deep Learning

A tensor is the primary data structure used by neural networks. The inputs, outputs, and transformations within neural networks are all represented using tensors, and as a result, neural network programming utilizes tensors heavily.

The below concepts, that we met in math or computer science, are all refered to `tensor` in deep learning.

| indexes required |  math  | computer science |
|:----------------:|:------:|------------------|
| 0                | scalar | number           |
| 1                | vector | array            |
| 2                | matrix | 2d-array         |

The relationship within each of these pairs is that both elements require the same number of indexes to refer to a specific element within the data structure.

In [59]:
# an array or a vector requires 1 index to access its element
a = [1,2,3,4]
a[3]

4

In [61]:
# an matrix or 2d-array requires 2 index to access its element
a = [
    [1, 2, 3, 4],
    [5, 6, 7, 8]
]
a[0][2]

3

When more than two indexes are required to access a specific element, we stop giving specific names to the structures, and we begin using more general language.

In mathematics, we stop using words like scalar, vector, and matrix, and we start using the word `tensor` or `nd-tensor`. The n tells us the number of indexes required to access a specific element within the structure.

In computer science, we stop using words like, number, array, 2d-array, and start using the word `multidimensional array` or `nd-array`. The n tells us the number of indexes required to access a specific element within the structure.

The reason we say a tensor is a generalization is because we use the word tensor for all values of n like so:
- A scalar is a 0 dimensional tensor
- A vector is a 1 dimensional tensor
- A matrix is a 2 dimensional tensor
- A nd-array is an n dimensional tensor

> Note: Tensors and nd-arrays are the same thing!


### Rank, Axes, and Shape -  fundamental tensor attributes for deep learning 

These concepts build on one another starting with rank, then axes, and building up to shape.

The `rank` of a tensor refers to the number of dimensions present within the tensor. A rank-2 tensor means all of the following:
- a matrix
- a 2d-array
- a 2d-tensor

> Note: A tensor's rank tells us how many indexes are needed to refer to a specific element within the tensor.


An `axis` of a tensor is a specific dimension of a tensor.  
If we say that a tensor is a rank 2 tensor, we mean that the tensor has 2 dimensions, or equivalently, the tensor has two axes.

The `length of each axis` tells us how many indexes are available along each axis.

In [62]:
dd = [
[1,2,3],
[4,5,6],
[7,8,9]
]

Each element along the first axis, is an array:



In [64]:
dd[0], dd[1], dd[2]

([1, 2, 3], [4, 5, 6], [7, 8, 9])

Each element along the second axis, is a number:

In [65]:
dd[0][0], dd[1][0], dd[2][0]

(1, 4, 7)

> Note: with tensors, the elements of the last axis are always numbers. Every other axis will contain n-dimensional arrays.

The `shape` of a tensor gives us the length of each axis of the tensor. 
>Note: The shape of a tensor is important because it encodes all of the relevant information about axes, rank, and therefore indexes. Additionally, one of the types of operations we must perform frequently when we are programming our neural networks is called `reshaping`.

 






In [68]:
t = torch.tensor([
    [1,2,3],
    [5,6,7]
], dtype=torch.float)
t.shape

torch.Size([2, 3])

The shape of 2 x 3 tells us that each axis of this rank two tensor has a length of 3 which means that we have three indexes available along each axis. 

> Note: size and shape of a tensor are the same thing.

## Section 3: Pytorch Tensors

PyTorch tensors are the data structures we'll be using when programming neural networks in PyTorch.
When programming neural networks, data preprocessing is often one of the first steps in the overall process, and one goal of data preprocessing is to transform the raw input data into tensor form.

### torch.Tensor class and its attributes
PyTorch tensors are instances of the `torch.Tensor` Python class.   
First, let’s look at a few tensor attributes. Every `torch.Tensor` has these attributes:
- torch.dtype: 
- torch.device
- torch.layout

In [33]:
t = torch.Tensor()
print(t.dtype)
print(t.device)
print(t.layout)

torch.float32
cpu
torch.strided


The `dtype` specifies the type of the data that is contained within the tensor.
> Note: 
        - Each type has a CPU and GPU version
        - Tensor operations between tensors must happen between tensors with the same type of data.
        - Tensors contain uniform (of the same type) numerical data with one of these types: 
![pytorch datatype](data/pytorch_dtype.png)

The `device` specifies the device (`CPU` or `GPU`) where the tensor's data is allocated. This determines where tensor computations for the given tensor will be performed.  
PyTorch supports the use of multiple devices, and they are specified using an index like so:

In [32]:
device = torch.device('cuda:0')
device

device(type='cuda', index=0)

If we have a device like above, we can create a tensor on the device by passing the device to the tensor’s constructor. 
> Note: tensor operations between tensors must happen between tensors that exists on the same device.


The `layout` specifies how the tensor is stored in memory. To learn more about stride check [here](https://en.wikipedia.org/wiki/Stride_of_an_array). For now, this is all we need to know.



### Create a new tensor using data

These are the primary ways of creating tensor objects (instances of the `torch.Tensor` class), with data (array-like) in PyTorch:
1. torch.Tensor(data): is the constructor of the `torch.Tensor` class
2. torch.tensor(data): is the `factory function` that constructs `torch.Tensor` objects. Factory functions are a software design pattern for creating objects. If you want to read more about it check [here]
3. torch.as_tensor(data)
4. torch.from_numpy(data)

Let’s look at each of these. They all accept some form of data and give us an instance of the `torch.Tensor` class. Sometimes when there are multiple ways to achieve the same result, things can get confusing, so let’s break this down.

In [56]:
import numpy as np
data = np.array([1,2,3])

o1 = torch.Tensor(data)
o2 = torch.tensor(data)
o3 = torch.as_tensor(data)
o4 = torch.from_numpy(data)

print(o1)
print(o2)
print(o3)
print(o4)

tensor([1., 2., 3.])
tensor([1, 2, 3])
tensor([1, 2, 3])
tensor([1, 2, 3])


In [57]:
print(torch.get_default_dtype())
o1.dtype == torch.get_default_dtype()

torch.float32


True

The table below compare 4 options and propose which one to use:

|          method         |         which one to use       |                 dtype         | data in memory |
|:-----------------------|:------------------------------:|:-----------------------------:|:--------------:|
| torch.Tensor(data)      | |infer from torch default dtype, unable to pass a `dtype` to the constructor. |copy|
| **torch.tensor(data)**    |best option to go, better doc and more config options than `torch.Tensor`|inferred from data or be explicitly set.|copy|
| *torch.as_tensor(data)* | to-go when we want to tune for performance, better than `torch.from_numpy` because it accepts a wide variety of array-like objects including other  Pytorch tensor. |inferred from data or be explicitly set.|share|
| torch.from_nummpy(data) | only accepts numpy.ndarray|inferred from data or be explicitly set.|share|

Data memory is shared means that the actual data in memory exists in a single place. As a result, any changes that occur in the underlying data will be reflected in both objects, the `torch.Tensor` and the `numpy.ndarray`.  
Sharing data is more efficient and uses less memory than copying data because the data is not written to two locations in memory.
However, there are something to keep in mind about memory sharing:



> Note: 
        - Since `numpy.ndarray` objects are allocated on the CPU, the `as_tensor()` function must copy the data from the CPU to the GPU when a GPU is being used.
        - The memory sharing of `as_tensor()` doesn’t work with built-in Python data structures like lists.
        - The `as_tensor()` call requires developer knowledge of the sharing feature. This is necessary so we don’t inadvertently make an unwanted change in the underlying data without realizing the change impacts multiple objects.
        - The `as_tensor()` performance improvement will be greater if there are a lot of back and forth operations between `numpy.ndarray` objects and tensor objects. However, if there is just a single load operation, there shouldn’t be much impact from a performance perspective.

### Create a new tensor without data
Here are some other creation options that are available.




In [41]:
# create identity matrix
torch.eye(2)

tensor([[1., 0.],
        [0., 1.]])

In [42]:
# create a tensor of zeros with the shape of specified shape argument
torch.zeros([2,2])

tensor([[0., 0.],
        [0., 0.]])

In [43]:
# create a tensor of ones with the shape of specified shape argument
torch.ones([2,2])

tensor([[1., 1.],
        [1., 1.]])

In [44]:
# create a tensor of random values with the shape of specified shape argument
torch.rand([2,2])

tensor([[0.3088, 0.4226],
        [0.8102, 0.9129]])

This is a small subset of the available creation functions that don’t require data. Check with the [PyTorch documentation](https://pytorch.org/docs/stable/index.html) for the full list.



## Section 4: Tensor Operations
We have the following high-level categories of tensor operations:
- Reshaping operations: gave us the ability to position our elements along particular axes. 
- Element-wise operations: allow us to perform operations on elements between two tensors.
- Reduction operations: allow us to perform operations on elements within a single tensor.
- Access operations

There are a lot of individual operations out there, so much so that it can sometimes be intimidating when you're just beginning, but grouping similar operations into categories based on their likeness can help make learning about tensor operations more manageable.



###  Broadcasting Tensors

To understand this concept, let's take a look at an example. Suppose we have the following tensors.


In [35]:
t1 = torch.tensor([
    [1,1],
    [1,1]
], dtype=torch.float32)

t2 = torch.tensor([2,4], dtype=torch.float32)
t1.shape, t2.shape


(torch.Size([2, 2]), torch.Size([2]))

What will be the result of this element-wise addition operation, t1 + t2 ?

Even though these two tenors have differing shapes, the element-wise operation is possible, and broadcasting is what makes the operation possible. The lower rank tensor t2 will be transformed via broadcasting to match the shape of the higher rank tensor t1, and the element-wise operation will be performed as usual.

we can check the broadcast transformation using the `broadcast_to()` numpy function.

In [37]:
import numpy as np
np.broadcast_to(t2.numpy(), t1.shape)

array([[2., 4.],
       [2., 4.]], dtype=float32)

In [38]:
t1 + t2

tensor([[3., 5.],
        [3., 5.]])

After broadcasting, the addition operation between these two tensors is a regular element-wise operation between tensors of the same shape.



###  Tensor reshape operation

Reshaping operations are perhaps the most important type of tensor operations because the shape of a tensor gives us something concrete we can use to shape an intuition for our tensors.
>Note: Reshaping changes the shape but not the underlying data elements.


In [2]:
import torch
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

In [3]:
t.reshape([3,4])

tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

Other ways to flatten a tensor

In [24]:
t.reshape(1,-1)[0]

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])

In [25]:
t.reshape(-1)

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])

In [26]:
t.view(t.numel())

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])

Using the `reshape()` function, we can specify the `row x column` shape that we are seeking. Notice that the product of the shape's components has to be equal to the number of elements in the original tensor.

Pytorch has another function called `view()` that does the same thing as `reshape` function.


As neural network programmers, we have to do the same with our tensors, and usually shaping and reshaping our tensors is a frequent task.  
Our networks operate on tensors, after all, and this is why understanding a tensor’s shape and the available reshaping operations are super important.

The next way we can change the shape of our tensors is by squeezing and unsqueezing them.
- Squeezing a tensor removes the dimensions or axes that have a length of one.
- Unsqueezing a tensor adds a dimension with a length of one.

In [5]:
t.reshape([1,12]).shape

torch.Size([1, 12])

In [7]:
t.reshape([1,12]).squeeze().shape

torch.Size([12])

In [9]:
t.reshape([1,12]).unsqueeze(dim=0).shape

torch.Size([1, 1, 12])

Let’s look at a common use case for reshaping a tensor - `flatten a tensor`.  
A flatten operation on a tensor reshapes the tensor to have a shape that is equal to the number of elements contained in the tensor. This is the same thing as a 1d-array of elements.

In [10]:
t.flatten()

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])

flatten operation = reshape operation + squeeze operation

In [12]:
def flatten_ex(t):
    t = t.reshape(1, -1)
    t = t.squeeze()
    return t
flatten_ex(t)

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])

It is possible to flatten only specific parts of a tensor.

In [22]:
t = torch.tensor([[
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
]], dtype=torch.float32)
t.shape

torch.Size([1, 3, 4])

In [23]:
t.flatten(start_dim=1).shape

torch.Size([1, 12])

A tensor flatten operation is a common operation inside convolutional neural networks. This is because convolutional layer outputs that are passed to fully connected layers must be flatted out before the fully connected layer will accept the input.



### Tensor element-wise operation

An element-wise operation is an operation between two tensors that operates on corresponding elements within the respective tensors. Two tensors must have the same shape in order to perform element-wise operations on them.

Some common element-wise operations

In [39]:
t1 = torch.tensor([[1,2],
                   [3,4]], dtype=torch.float32)
t2 = torch.tensor([[9,8],
                   [7,6]], dtype=torch.float32)

In [40]:
t1 + t2 # equivalent with t1.add(t2)

tensor([[10., 10.],
        [10., 10.]])

In [41]:
t1 + 2 # equivalent with t1.add(2)

tensor([[3., 4.],
        [5., 6.]])

In [42]:
t1 - 2 # equivalent with t1.sub(2)

tensor([[-1.,  0.],
        [ 1.,  2.]])

In [43]:
t1 * 2 # equivalent with t1.mul(2)

tensor([[2., 4.],
        [6., 8.]])

In [44]:
t1 / 2 # equivalent with t1.div(2)

tensor([[0.5000, 1.0000],
        [1.5000, 2.0000]])

Comparison Operation is element-wise


In [45]:
t.eq(0)

tensor([[ True, False,  True],
        [False,  True, False],
        [ True, False,  True]])

In [46]:
t.gt(0)

tensor([[False,  True, False],
        [ True, False,  True],
        [False,  True, False]])

In [47]:
t.lt(0)

tensor([[False, False, False],
        [False, False, False],
        [False, False, False]])

With element-wise operations that are functions, it’s fine to assume that the function is applied to each element of the tensor.

In [48]:
t.abs()

tensor([[0., 1., 0.],
        [2., 0., 2.],
        [0., 3., 0.]])

In [49]:
t.sqrt()

tensor([[0.0000, 1.0000, 0.0000],
        [1.4142, 0.0000, 1.4142],
        [0.0000, 1.7321, 0.0000]])

In [50]:
t.neg()

tensor([[-0., -1., -0.],
        [-2., -0., -2.],
        [-0., -3., -0.]])

In [51]:
t.neg().abs()

tensor([[0., 1., 0.],
        [2., 0., 2.],
        [0., 3., 0.]])

> Note: Element-wise = Component-wise = Point-wise

### Tensor reduction operations

A reduction operation on a tensor is an operation that reduces the number of elements contained within the tensor.
Tensors give us the ability to manage our data. The tensor can be reduced to a single scalar value or reduced along an axis.

Let’s look at common tensor reduction operations:

In [52]:
import torch

t = torch.tensor([
    [0,1,0],
    [2,0,2],
    [0,3,0]
], dtype=torch.float32)

Reducing to a tensor with a single element

In [53]:
t.sum(), t.prod(), t.mean(), t.std()

(tensor(8.), tensor(0.), tensor(0.8889), tensor(1.1667))

Reducing tensors By Axes

In [54]:
t.sum(dim=0)

tensor([2., 4., 2.])

`Argmax` tensor reduction operation is very common in neural network.  
This operation returns the index location of the maximum value inside a tensor.
In practice, we often use the `argmax()` function on a network’s output prediction tensor, to determine which category has the highest prediction value.



In [55]:
t.max(dim=1)

torch.return_types.max(
values=tensor([1., 2., 3.]),
indices=tensor([1, 2, 1]))

In [56]:
t.argmax(dim=1)

tensor([1, 2, 1])

### Tensor access operation

This operation provides the ability to access data within the tensor.   
Common tensor access operations are `item(), tolist(), numpy()`

In [57]:
t.mean().item()

0.8888888955116272

In [58]:
t.mean(dim=0).tolist()

[0.6666666865348816, 1.3333333730697632, 0.6666666865348816]

In [59]:
t.mean(dim=0).numpy()

array([0.6666667, 1.3333334, 0.6666667], dtype=float32)

## References
Some good sources:
- deeplizard : https://deeplizard.com/learn/video/v5cngxo4mIg
- effective pytorch - vahidk https://github.com/vahidk/EffectivePyTorch?fbclid=IwAR1MhsjnjccWy6dIVtibFOCZbWhLtAj5pSTobnkUDxw_gHgfEswnVzqrKQ0#torchscript  
- recommend walk with pytorch: https://forums.fast.ai/t/getting-comfortable-with-pytorch-projects/28371
- official tutorial: https://pytorch.org/tutorials/
- DL(with Pytorch): https://github.com/Atcold/pytorch-Deep-Learning
- Pytorch project template: https://github.com/moemen95/PyTorch-Project-Template
- nlp turorial with pytorch : https://github.com/graykode/nlp-tutorial
- UDACITY course https://www.udacity.com/course/deep-learning-pytorch--ud188
- awesome pytorch list: https://github.com/bharathgs/Awesome-pytorch-list
- deep learning with pytorch https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf
- pytorch zero to all: https://github.com/hunkim/PyTorchZeroToAll
- others:
    - https://medium.com/pytorch/get-started-with-pytorch-cloud-tpus-and-colab-a24757b8f7fc
    - grokking book