In [81]:
import numpy as np
import matplotlib.pyplot as plt

# auto differentiation library
import torch
import torch.nn as nn  # for defining Neural Networks
import torch.nn.functional as F  # usual functions
import torch.optim as optim

# Pytorch's Tensors

Pytorch contains a class `Tensor` which is a wrapper for array, matrices, tensors... But numpy is already able to handle usual linear algebra operation and coordinate-wise operations. Pytorch brings to the table __auto-differentiation__.
That means a Tensor contains everything necessary to backpropagate gradients through the operation which created it.

In [26]:
# defining a Tensor from a numpy array
x = np.array([3, 1, 4, 1, 5, 9, 2])
x = torch.Tensor(x)
print("Tensor type `x.type()` gives more insight on the data structure of tensor coordinates: {}".format(x.type()))
print("The tensor's shape, alternatively `x.size()` or `x.shape`: {}".format(x.shape))
print(x)
print("You can get a numpy array back with `x.numpy()`: {}".format(x.numpy()))

Tensor type `x.type()` gives more insight on the data structure of tensor coordinates: torch.FloatTensor
The tensor's shape, alternatively `x.size()` or `x.shape`: torch.Size([7])
tensor([3., 1., 4., 1., 5., 9., 2.])
You can get a numpy array back with `x.numpy()`: [3. 1. 4. 1. 5. 9. 2.]


In [20]:
# linear algebra operation
A = np.random.randn(3, x.shape[0])
A = torch.Tensor(A)

print("`torch.mv` for matrix-vector multiplication: {}".format(torch.mv(A, x)))
print("Multiplication `*` is often reserved for coordinate wise multiplication.")
print("In Python, matrix multiplication is done using `@`: {}".format(A @ x))

`torch.mv` for matrix-vector multiplication: tensor([19.5277, -8.5126, -7.0295])
Multiplication `*` is often reserved for coordinate wise multiplication.
In Python, matrix multiplication is done using `@`: tensor([19.5277, -8.5126, -7.0295])


For now, this was the same as Numpy. Note that $x$ has a placeholder for a gradient, which will later contain $\frac{\partial \mathcal L}{\partial x}$ where $\mathcal$ is any real value computed from $x$.

In [23]:
print(x.grad)

None


# Pytorch's Backpropagation
Pytorch automatically registers operations when doing  operations. The `backward` method called on a scalar (torch Tensor of size 1) will backpropagate through the whole graph of dependancies. Calling `loss.backward` will compute for all variable used to compute `loss` the gradient of `loss`  with respect to that variable.

For memory efficiency, intermediate gradients are deleted during the backpropagation, and unwanted gradients are never computed. If you explicitely want a gradient to be computed and kept, use the argument `requires_grad=True`.

In [36]:
x = torch.randn(8, requires_grad=True)
display(x)

print("Gradient at start: {}".format(x.grad))

l1 = torch.sum(torch.abs(x))
print("L1 norm: {}".format(l1))

l1.backward()
print("Gradient after `l1.backward()`: {}".format(x.grad))

tensor([-0.0568, -1.2728,  0.8023,  1.9957, -1.2047,  0.0181,  1.0877,  0.2828],
       requires_grad=True)

Gradient at start: None
L1 norm: 6.7208251953125
Gradient after `l1.backward()`: tensor([-1., -1.,  1.,  1., -1.,  1.,  1.,  1.])


Note that gradients accumulate, and if you want to compute a new gradient from the same vector you need to delete its gradient first.

In [42]:
x.grad = torch.ones_like(x)  # artificial gradient from previous computation

print("Without clean up:")
print(x.grad)

l1 = torch.sum(torch.abs(x))
l1.backward()

print(x.grad)

Without clean up:
tensor([1., 1., 1., 1., 1., 1., 1., 1.])
tensor([0., 0., 2., 2., 0., 2., 2., 2.])


In [43]:
x.grad = torch.ones_like(x)  # artificial gradient from previous computation

print("With clean up:")
print(x.grad)

x.grad.zero_()

l1 = torch.sum(torch.abs(x))
l1.backward()

print(x.grad)

With clean up:
tensor([1., 1., 1., 1., 1., 1., 1., 1.])
tensor([-1., -1.,  1.,  1., -1.,  1.,  1.,  1.])


In [48]:
x = torch.randn(8, requires_grad=True)

y = torch.abs(x)  # modulus coordinate wise
z = x ** 2  # square coordinate wise

l1 = torch.sum(y)
l2 = torch.sqrt(torch.sum(z))

s = l2 / l1  # some real value depending on x. It turns out this is a good measure for sparsity

# Backpropagate gradients:
s.backward()

print("Only tensors with `requires_grad=True` keep their gradients.")
print(x.grad)
print(y.grad)
print(z.grad)

Only tensors with `requires_grad=True` keep their gradients.
tensor([ 0.0432,  0.0064, -0.0102, -0.0361, -0.0344,  0.0391, -0.0567, -0.0050])
None
None


### An alternative way to get gradients is to use the function `grad` provided in the `autograd` package:

In [58]:
from torch.autograd import grad

x = torch.randn(8, requires_grad=True)
l1 = torch.sum(torch.abs(x))

# grad's input is a list of losses and a list of tensors with respect to which we want the gradient.
gx = grad([l1], [x])
print(gx)

(tensor([ 1., -1., -1., -1.,  1.,  1., -1., -1.]),)


# Pytorch's Modules

You can functions to manipulate tensors. However, the final goal is to define a __parameterized__ function. In Pytorch, this is the purpose of `nn.Module`, from which modules should inherit.

A new module should define its initialization process (in particular it must initialize its parameters), and the forward function. The backward function is automatically defined and needs not be worried about.

In [87]:
# Defining the architecture of the network

class LogisticRegression(nn.Module):
    def __init__(self, p):
        """
        A shallow network which performs logistic regression.
        
        Arguments:
        ----------
        p: dimension of the input
        """
        
        super().__init__()  # run the initialiazation code of inherited class nn.Module
        
        self.linear = nn.Linear(p, 1)  # parameterized function defined by pytorch, x -> Ax + b

    def forward(self, x):
        # Note: x must be of size p
        xw = self.linear(x)
        y = torch.sigmoid(xw)
        
        return y

In [88]:
# Initializing an instance of the nework with randomly chosen parameters
p = 8
network = LogisticRegression(p)

In [89]:
# Calling the forward function can be done either by treating the network as a function,
# or by explicitely calling the forward method. The first option is cleaner.
x = torch.randn(1, p)

y_pred = network(x)
print(y_pred)

tensor([[0.5915]], grad_fn=<SigmoidBackward>)


In [96]:
# Defining a loss is done outside the network

# artificial label. x.detach() is used to detach this tensor from previously defined tensors.
y_true = .5 * (1 + torch.sign(x.detach()[:, 0:1]))

mse_loss = F.binary_cross_entropy(y_pred, y_true)

In [93]:
# calling backward
mse_loss.backward()

Now, we could do the update by hand, but we will want to use other optimization schemes. Pytorch provides optimizers, which implement the descent and can be used after `.backward()` with a simple call of the method `.step()`.

In [94]:
# define optimizer
optimizer = optim.SGD(network.parameters(), lr=1e-2)

Optimizer also have a very nice way to reset gradients, which also affects the descent algorithm if needed.

In [95]:
# Set gradients to zero
optimizer.zero_grad()

# Batches
You may have noticed that we have added a useless dimension in $x$, this is for the batch size. In practice, performing gradient steps on the whole training set is too costly, and doing it on a single datapoint is too unstable.
As a tradeoff, we use a subset of data at each step, not too many and not too few. These subsets are called batches.

# Pytorch Tensor dimension
Pytorch has a convention for the dimension of network inputs:
- first the batch dimension: x[i, ...] is the i-th sample in the batch.
- second, the channels: x[i, c, ...] is the c-th channel. For example, images may have 3 channels: red, green and blue.
- last, 0, 1, 2 our 3 spatial dimension depending on the data: x[..., x, y, z] is the voxel (x, y, z) of a 3D volume.

## Exercise: with a loop, perform gradient steps and train the network.
Here, unlimited data is available with `get_data_point(batch_size)`.

In [99]:
def get_data_point(batch_size):
    x = torch.randn(batch_size, p)
    y = .5 * (1 + torch.sign(x.detach()[:, 0:1]))
    return x, y