# Intro to Pytorch
Pytorch is a popular Python framework used by many deep learning
practitioners in both industry and academia. Pytorch provides all the 
tools necessary to load data, train a model, and accelerate matrix
computations on GPUs without needing to know specific knowledge
of the learning algorithms involved. Pytorch was designed with ease of use
in mind (like Python), so it is the perfect first deep learning language to work with.

In order to use Pytorch in your Python code you can import it with the following.

In [1]:
import torch
torch.manual_seed(0)

<torch._C.Generator at 0x7fe1d40f3c50>

The core data structure in Pytorch is the **tensor**. For those unfamiliar, tensors are essentially matrices expanded to higher dimensions. Tensors are difficult to visualize at times due to the fact they can scale to higher dimensions. Take a look at the following diagram to get a better visual of tensors themselves.


![alt text](https://www.cc.gatech.edu/~san37/img/dl/tensor.png)

 The idea behind tensors is you can index into lower dimensions from higher dimensions. As an example, one of the entries in a 6-D tensor is a 5-D tensor. Indexing into the 5-D tensor gives a 4-D tensor. 

In [2]:
# Make a 6-D tensor with random entries
t = torch.randn(6, 5, 4, 3, 3, 2)

# Indexing into 6-D gives a 5-D tensor
print(t[0].shape)

# Indexing into 5-D tensor gives 4-D tensor
print(t[0][0].shape)


torch.Size([5, 4, 3, 3, 2])
torch.Size([4, 3, 3, 2])


## Familiarizing yourself with Pytorch
Lets learn how to initialize Pytorch tensors in different ways before we get into the more complicated parts of Pytorch.

In [3]:
# Create an empty tensor with dimensions 5x3 (5 rows, 3 columns)
x = torch.empty(5, 3)
print(x.dtype)
print(x.shape)
print(x)

# Create a tensor of size 5x3 with entries drawn from the normal distribution
y = torch.randn(5, 3)
print(y.dtype)
print(y.shape)
print(y)

# Create a tensor of size 10x3x4 with floats that are zeros
z = torch.zeros(10, 3, 4, dtype=torch.float)
print(z.dtype)
print(z.shape)
print(z)

# Create a tensor from a Python list
a = torch.tensor([[3, 4], [5, 7]])
print(a.dtype)
print(a)

# Create a tensor with the same size as another tensor
b = torch.randn_like(z)
print(b.dtype)
print(b)

torch.float32
torch.Size([5, 3])
tensor([[1.4013e-45, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 2.3822e-44, 0.0000e+00],
        [2.3822e-44, 7.0065e-45, 1.4013e-45],
        [0.0000e+00, 3.2422e-36, 3.0728e-41],
        [0.0000e+00, 0.0000e+00, 0.0000e+00]])
torch.float32
torch.Size([5, 3])
tensor([[ 0.5781, -0.2402, -0.2009],
        [-0.4334, -2.2371, -1.2284],
        [ 0.6461, -0.6719, -0.7497],
        [ 0.5004, -0.8521,  0.2625],
        [ 1.1861,  0.2846, -0.4051]])
torch.float32
torch.Size([10, 3, 4])
tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0

## Basic Operations on Tensors
The main purpose of Pytorch is to do fast tensor operations. The following lines of code demonstrates how to do basic operations work with tensors.

In [4]:
x = torch.randn(10, 5)
y = torch.randn(10, 5)

# Different ways to add
print(x + y)
# Add and store result in new tensor
result = torch.empty(10, 5)
torch.add(x, y, out=result)
print(result)

# Add and store result in y, use _ to "in place" operations
y.add_(x)
print(y)

tensor([[-0.8586,  1.4652, -3.4238, -1.2892, -0.5180],
        [ 0.2684,  0.3453,  0.3403,  1.2260, -0.2613],
        [ 2.5284,  0.2674,  0.3439,  0.5047,  2.9751],
        [ 0.2573,  0.3008,  2.2651, -2.4236,  0.1482],
        [ 2.4482, -0.5288, -0.4339, -0.8041, -0.8903],
        [-2.5760, -0.5083,  1.0593, -1.7562, -0.8925],
        [-1.8932, -0.4112,  1.2690, -0.4528, -1.9837],
        [ 0.5995,  0.7161, -0.5758, -0.4285, -0.7273],
        [ 2.8721, -0.8013, -0.8027,  2.7529,  1.7555],
        [-0.4017,  0.6079,  1.2634,  0.4352, -1.9197]])
tensor([[-0.8586,  1.4652, -3.4238, -1.2892, -0.5180],
        [ 0.2684,  0.3453,  0.3403,  1.2260, -0.2613],
        [ 2.5284,  0.2674,  0.3439,  0.5047,  2.9751],
        [ 0.2573,  0.3008,  2.2651, -2.4236,  0.1482],
        [ 2.4482, -0.5288, -0.4339, -0.8041, -0.8903],
        [-2.5760, -0.5083,  1.0593, -1.7562, -0.8925],
        [-1.8932, -0.4112,  1.2690, -0.4528, -1.9837],
        [ 0.5995,  0.7161, -0.5758, -0.4285, -0.7273],
        [

## Tensor Indexing
Tensor indexing can be fairly complicated if you are unfamiliar with it. This section goes through some standard ways to index slice and interact with the data with a tensor. We will use the example of RGB images in order to demonstrate the utility behind some of these operations. The biggest thing is understanding how the colon operator works. Essentially, the colon operator selects every index in the specified range. 

In [5]:
# Create a random RGB image
rand_im = (torch.randn(32, 36, 3) + 1) * 128 

# Print the first entry in the red channel
print(rand_im[0, 0, 0]) 

# Get the first row for each channel
print(rand_im[:, 0].shape)

# Get the 4th, 5th, and 6th row for each channel
print(rand_im[:, 4:7].shape)

# Get the first row for the red channel
print(rand_im[:, 0, 0].shape)

# Get the 4th, 5th, and 6th row for each channel
# Split Image into invidual color channels 
red = rand_im[:, :, 0]
green = rand_im[:, :, 1]
blue = rand_im[:, :, 2]

tensor(187.5496)
torch.Size([32, 3])
torch.Size([32, 3, 3])
torch.Size([32])


This was a very brief overview of how to slice and index using the colon operator on tensors. I highly recommend checking out this more [in depth tutorial.](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html). While the tutorial is for numpy, the Pytorch developers have ported the same operations over to torch tensors. 

## Artificially Modifying Tensor Shapes/Indexing
There are many operations in Pytorch that can be used to modify the shape of tensors without actually changing the underlying data. This means that you can easily change tensor shapes in Pytorch without the operations taking a long time, because there is no actual memory movement. 

### View
The $\texttt{view()}$ function allows one to artificially change the shape of a tensor without moving any data. This makes it convenient for any times you need to quickly change the shape of a tensor, for example to match an expected input size.

In [6]:
x = torch.randn(5, 4)
a = x.view(20)
b = x.view(10, 2)
c = x.view(-1, 10) # infer the dimensions of the new view with -1
print(a.shape, b.shape, c.shape)

torch.Size([20]) torch.Size([10, 2]) torch.Size([2, 10])


### Permute
Permute is used to swap the axes of a tensor. Again, none of the data is actually moved, the axes are simply swapped. Say for example, you are using a convolutional layer that expects $\texttt{NCHW}$ and your current tensor has a shape of $\texttt{NHWC}$. You would then want to swap $C$ (the channel dimension) from the second to last axis.

In [7]:
x = torch.randn(32, 28, 28, 3)

print(x.shape)
# NHWC -> NCHW
x = x.permute(0, 3, 1, 2)

print(x.shape)

torch.Size([32, 28, 28, 3])
torch.Size([32, 3, 28, 28])


## Explicit Memory Control
While Pytorch doesn't expose a large amount of control over the underlying memory of a tensor, Pytorch does give the user some power over this matter. These operations are important to know as they can affect the network's ability to train. It is important that your memory stays valid for the operations you use. 

### Contiguous
Often times, after modifying a tensor with a combination of $\texttt{view}, \texttt{permute}$ and other operations, your data will no longer be indexed correctly. In this case, you will want specifically reorder the underlying data and ensure the data is arranged contiguously for the indexing you've specified. If you don't do this, it could lead to accuracy issues when training. Because contiguous copies data to a new tensor it can be a potentially very expensive operation, so try to only use it when necessary.


In [8]:
x = torch.arange(200)
print(x[0])
x = x.view(5, 5, 2, 2, 2)
x = x.permute(3, 4, 1, 2, 0)
x = x.view(4, 5, 2, 5)
x = x.permute(2, 3, 0, 1)

# Fail to call contiguous
try:
    x = x.view(200)
except Exception as e:
    print("Failed view() call!")
    print(str(e))

# Use Contiguous to view to original shape
x = x.contiguous()
x = x.view(200)
print(x[0])

tensor(0)
Failed view() call!
view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
tensor(0)


### Reshape
Reshape is similar to view or permute. You can use it to change the shape or swap the axes of a tensor. However, it will explicitly rearrange the underlying data and stores it in a new tensor. You can think of reshape as wrapping the view, permute, contiguous pattern into one function call. This can be more convenient and readable, but your performance can suffer greatly if reshape is called too many times.

In [9]:
x = torch.arange(200)
print(x[0])
x = x.reshape(2, 5, 4, 5)
x = x.reshape(200)
print(x[0])


tensor(0)
tensor(0)


## Combining Tensors
There are many situations where you will need to concatenate tensors together as part of an algorithm (concatenatation is a very common operation for tensors in deep learning). Below you will learn all the ways in which you can combine tensors together.

### Stacking/Concatenating
Concatenating two tensors will combine them along a specific dimension. Stacking on the other hand will combine the tensors across a new dimension. The example below illustrates the differences.

In [10]:
a = torch.randn(3, 5)
a_cat = torch.cat((a, a), dim=0)
a_stack = torch.stack((a, a), dim=0)
print(a_cat.shape)
print(a_stack.shape)

torch.Size([6, 5])
torch.Size([2, 3, 5])


### Squeeze/Unsqueeze
These functions are used to either add a dimension of size 1 at a certain axis, or remove a dimension of size 1. These will be useful for again, making sure tensor shapes are of the correct size.

In [11]:
a = torch.randn(3, 4, 3, 1)

# Remove the last dimension
print(a.shape)
a = a.squeeze(3)
print(a.shape)

# Unsqueeze at 1st dimension
a = a.unsqueeze(1)
print(a.shape)

torch.Size([3, 4, 3, 1])
torch.Size([3, 4, 3])
torch.Size([3, 1, 4, 3])


## GPU Accelerating Torch Operations
Pytorch is so popular not just because it provides easy to use tools to develop deep learning models, but also because it provides GPU acceleration for tensor operations. Thanks to support from different sources, we do have access to GPUs in order to train our models. The following code snippet shows how you can use CUDA (Nvidia's proprietary GPU programming language) to accelerate your Pytorch operations.

In [12]:
# Check if GPU is available and assign device
# to available GPU
if torch.cuda.is_available():
  device = torch.device('cuda')

# Create a tensor and port it to GPU
x = torch.randn(5, 5)
x = x.to(device)

# Create a tensor directly on the GPU
y = torch.randn(5, 5, device=device)

z = x + y

# Operating on two GPU tensors keeps all computation on the GPU
print(z.device)

cuda:0


You will likely have very little experience programming GPU's unless you have taken a graphics course or one of the specialized GPGPU courses. There are some important rules to remember when developing on GPU's.


*   Minimize memory transfers from CPU to GPU (create tensors directly on GPU)
*   Write device agnostic code (Code should run on GPU if available, otherwise use CPU)


You don't need to worry about knowing everything about a GPU when using Pytorch. However, you should at the very least follow these two rules as it will significantly increase your quality of life when developing.

## Conclusion
These are just some of the operations available to you in Pytorch. I have provided a cursory understanding of each function for your convenience, however I would encourage you to take a look at the [official Pytorch docs](https://pytorch.org/docs/stable/index.html) if you have any confusion. There are also a plethora of other functions that you can take advantage of and read about.