# Chapter 1
## Creating Tensors
First, we define a helper function, `describe(x)`, that will summarize various properties of a tensor
x, such as the type of the tensor, the dimensions of the tensor, and the contents of the tensor:

In [1]:
def describe(x):
    print("Type: {}".format(x.type()))
    print("Shape/size: {}".format(x.shape))
    print("Values: \n{}".format(x))

**PyTorch** allows us to create tensors in many different ways using the torch package. One way to
create a tensor is to initialize a random one by specifying its dimensions, as shown in this example

In [2]:
import torch
describe(torch.Tensor(2, 2))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[3.5720e-31, 3.0957e-41],
        [3.6103e-31, 3.0957e-41]])


In [3]:
describe(torch.randn(3, 3))

Type: torch.FloatTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[ 0.3579, -0.8369,  0.7518],
        [-0.0588, -0.2430,  0.8397],
        [-1.8349,  1.5984,  0.6687]])


We can also create tensors all filled with the same scalar. For creating a tensor of zeros or ones, we have built-in functions, and for filling it with specific values, we can use the `fill_()` method. Any **PyTorch** method with an underscore (_) refers to an in-place operation; that is, it modifies the
content in place without creating a new object, as shown in this example.

In [4]:
x = torch.zeros(2, 2)

describe(x)

x.fill_(5)

describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[0., 0.],
        [0., 0.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[5., 5.],
        [5., 5.]])


The values can either come from a list, as in the preceding example, or from a NumPy array. And, of
course, we can always go from a PyTorch tensor to a NumPy array, as well. Notice that the type of the
tensor is DoubleTensor instead of the default FloatTensor (see the next section). This
corresponds with the data type of the NumPy random matrix, a float64, as presented in this example:

In [5]:
x = torch.Tensor([[2, 3], [4, 5]])
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[2., 3.],
        [4., 5.]])


In [6]:
import numpy as np

describe(torch.from_numpy(np.random.randn(3, 3)))

Type: torch.DoubleTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[-0.5334,  0.9101, -1.1317],
        [ 0.3157,  0.7759,  1.6905],
        [-0.9728,  0.2949,  0.2085]], dtype=torch.float64)


We can observe that there are different types of tensors: `torch.FloatTensor` and `torch.DoubleTensor`. They correspond to different representations of floating point numbers.

## Operations

In [7]:

x = torch.randn(2, 2)

x

tensor([[-0.4638,  0.7159],
        [-0.3933,  0.1810]])

Sum

In [8]:
x + x

tensor([[-0.9276,  1.4317],
        [-0.7866,  0.3620]])

Elementwise product

In [9]:
x * x

tensor([[0.2151, 0.5124],
        [0.1547, 0.0328]])

Matrix multiplication

In [10]:
x @ x

tensor([[-0.0665, -0.2024],
        [ 0.1112, -0.2488]])

Resizing


In [24]:
x = torch.arange(10)

x.view(2, 5)

tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])

## Tensors and Computational Graphs

`PyTorch` tensor class encapsulates the data (the tensor itself) and a range of operations, such as
algebraic operations, indexing, and reshaping operations. However, as shown in the following, when
the requires_grad Boolean flag is set to True on a tensor, bookkeeping operations are enabled
that can track the gradient at the tensor as well as the gradient function, both of which are needed to
facilitate the gradient­based learning discussed in The Supervised Learning Paradigm”.

In [25]:
import torch
x = torch.ones(2, 2, requires_grad=True)
describe(x)
print(x.grad is None)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
True


In [26]:
y = (x + 2) * (x + 5) + 3
describe(y)
print(x.grad is None)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[21., 21.],
        [21., 21.]], grad_fn=<AddBackward0>)
True


In [27]:
z = y.mean()
describe(z)
print(z.grad is None)
z.backward()
print(x.grad is None)

Type: torch.FloatTensor
Shape/size: torch.Size([])
Values: 
21.0
True
False


In [30]:
x.grad

tensor([[2.2500, 2.2500],
        [2.2500, 2.2500]])

When you create a tensor with requires_grad=True, you are requiring PyTorch to manage bookkeeping information that computes gradients. First, PyTorch will keep track of the values of the forward pass. Then, at the end of the computations, a single scalar is used to compute a backward pass. The backward pass is initiated by using the backward() method on a tensor resulting from the evaluation of a loss function. The backward pass computes a gradient value for a tensor object that participated in the forward pass.

In general, the gradient is a value that represents the slope of a function output with respect to the function input. In the computational graph setting, gradients exist for each parameter in the model and can be thought of as the parameter’s contribution to the error signal. In PyTorch, you can access the gradients for the nodes in the computational graph by using the .grad member variable. Optimizers use the .grad variable to update the values of the parameters.

## CUDA

So far, we have been allocating our tensors on the CPU memory. When doing linear algebra operations, it might make sense to utilize a GPU, if you have one. To use a GPU, you need to first allocate the tensor on the GPU’s memory. Access to the GPUs is via a specialized API called CUDA. The CUDA API was created by NVIDIA and is limited to use on only NVIDIA GPUs. PyTorch offers CUDA tensor objects that are indistinguishable in use from the regular CPU-bound tensors except for the way they are allocated internally.

PyTorch makes it very easy to create these CUDA tensors, transfering the tensor from the CPU to the GPU while maintaining its underlying type. The preferred method in PyTorch is to be device agnostic and write code that works whether it’s on the GPU or the CPU. In Example 1-16, we first check whether a GPU is available by using torch.cuda.is_available(), and retrieve the device name with `torch.device()`. Then, all future tensors are instantiated and moved to the target device by using the .to(device) method

In [16]:
import torch
print (torch.cuda.is_available())

False


In [17]:
# preferred method: device agnostic tensor instantiation
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print (device)

cpu


In [18]:
x = torch.rand(3, 3).to(device)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[0.9138, 0.1688, 0.0581],
        [0.2904, 0.2981, 0.2249],
        [0.7045, 0.6757, 0.7511]])


## Exercises

The best way to master a topic is to solve problems. Here are some warm-up exercises. Many of the problems will require going through the official documentation and finding helpful functions.

1. Create a 2D tensor and then add a dimension of size 1 inserted at dimension 0.

In [19]:
x = torch.randn(2, 2)

x.unsqueeze(0)

tensor([[[ 0.4768, -0.7062],
         [-0.2757,  0.5638]]])

2. Remove the extra dimension you just added to the previous tensor.

In [20]:

x.squeeze()

tensor([[ 0.4768, -0.7062],
        [-0.2757,  0.5638]])


3. Create a random tensor of shape 5x3 in the interval [3, 7)

In [21]:
torch.randint(low=3, high=7, size=(5, 3))

tensor([[3, 5, 3],
        [5, 5, 6],
        [4, 5, 6],
        [3, 5, 5],
        [4, 6, 6]])

4. Create a tensor with values from a normal distribution (mean=0, std=1)

In [22]:
torch.randn(5, 5)

tensor([[ 9.4958e-01,  1.9422e+00,  1.6860e+00, -1.1151e+00,  4.7616e-01],
        [-7.2875e-01, -1.1347e+00,  9.1374e-01, -1.4614e+00, -1.0223e+00],
        [-7.2578e-01, -8.3391e-01,  1.6807e-01,  8.7654e-01,  1.4070e-01],
        [ 6.6295e-01,  6.7922e-01,  1.1490e+00,  9.5238e-01,  1.4845e+00],
        [ 4.1253e-01,  2.8259e-01,  1.3092e-03,  1.0865e-01, -1.2121e+00]])

5. Retrieve the indexes of all the nonzero elements in the tensor torch.Tensor([1, 1, 1, 0, 1])

In [23]:
x = torch.Tensor([1, 1, 1, 0, 1])

x != 0

tensor([1, 1, 1, 0, 1], dtype=torch.uint8)

6. Create a random tensor of size (3,1) and then horizontally stack four copies together.

In [24]:
x = torch.randn(3, 1)

torch.stack([x, x, x, x], dim=1).squeeze()

tensor([[0.6299, 0.6299, 0.6299, 0.6299],
        [0.3769, 0.3769, 0.3769, 0.3769],
        [0.4385, 0.4385, 0.4385, 0.4385]])

7. Return the batch matrix-matrix product of two three-dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4))


In [28]:
a = torch.rand(3, 4, 5)

b = torch.rand(3, 5, 3)

torch.bmm(a, b)

tensor([[[1.7783, 1.9196, 1.8574],
         [0.9789, 1.2092, 1.1393],
         [1.2980, 1.3825, 1.5667],
         [1.0263, 1.5045, 1.6290]],

        [[1.0673, 1.0863, 1.4547],
         [0.4108, 0.7216, 0.8166],
         [1.0102, 0.8803, 0.8434],
         [0.5331, 0.9789, 1.7401]],

        [[0.8372, 1.3008, 1.1223],
         [0.8817, 1.9978, 1.6211],
         [0.7308, 1.3091, 0.9169],
         [0.6679, 1.6554, 1.3777]]])

8. Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).

In [37]:
a = torch.rand(3, 4, 5)
b = torch.rand(5, 4)

# Repeat the matrix three times
c = b.unsqueeze(0).expand(a.size(0), *b.size())
torch.bmm(a, c)

tensor([[[0.6720, 0.7214, 0.8585, 0.9530],
         [0.8340, 0.8605, 1.3643, 1.3807],
         [1.5578, 1.4078, 1.9595, 2.1811],
         [0.8740, 0.8643, 1.3476, 1.4359]],

        [[1.6613, 1.3231, 1.9216, 2.1836],
         [1.8110, 1.9145, 2.2036, 2.5980],
         [0.9673, 0.4213, 0.5828, 0.8675],
         [0.9585, 0.9538, 1.2384, 1.4309]],

        [[1.4444, 1.3392, 1.8127, 1.9752],
         [0.5553, 1.1492, 1.3973, 1.4606],
         [1.7935, 2.0232, 2.5799, 2.8471],
         [0.8759, 0.8783, 1.3491, 1.3394]]])