# Introduction to PyTorch
>  In this first chapter, we introduce basic concepts of neural networks and deep learning using PyTorch library.

- toc: true 
- badges: true
- comments: true
- author: Lucas Nunes
- categories: [Datacamp]
- image: images/datacamp/___

> Note: This is a summary of the course's chapter 1 exercises "Introduction to Deep Learning with PyTorch" at datacamp. <br>[Github repo](https://github.com/lnunesAI/Datacamp/) / [Course link](https://www.datacamp.com/tracks/deep-learning-in-python)

## Introduction to PyTorch

### Creating tensors in PyTorch

<div class=""><p>Random tensors are very important in neural networks. Parameters of the neural networks typically are initialized with random weights (random tensors).</p>
<p>Let us start practicing building tensors in PyTorch library. As you know, tensors are arrays with an arbitrary number of dimensions, corresponding to NumPy's ndarrays. You are going to create a random tensor of sizes 3 by 3 and set it to variable <code>your_first_tensor</code>. Then, you will need to print it. Finally, calculate its size in variable <code>tensor_size</code> and print its value.</p>
<p><em>NB: In case you have trouble solving the problems, you can always refer to slides in the bottom right of the screen.</em></p></div>

Instructions
<ul>
<li>Import PyTorch main library.</li>
<li>Create the variable <code>your_first_tensor</code> and set it to a random torch tensor of size 3 by 3.</li>
<li>Calculate its shape (dimension sizes) and set it to variable <code>tensor_size</code>.</li>
<li>Print the values of <code>your_first_tensor</code> and <code>tensor_size</code>.</li>
</ul>

In [None]:
# Import torch
import torch

# Create random tensor of size 3 by 3
your_first_tensor = torch.rand(3, 3)

# Calculate the shape of the tensor
tensor_size = your_first_tensor.shape

# Print the values of the tensor and its shape
print(your_first_tensor)
print(tensor_size)

tensor([[0.0141, 0.0859, 0.3999],
        [0.6517, 0.0857, 0.3699],
        [0.0851, 0.9952, 0.5838]])
torch.Size([3, 3])


### Matrix multiplication

<div class=""><p>There are many important types of matrices which have their uses in neural networks. Some important matrices are matrices of ones (where each entry is set to 1) and the identity matrix (where the diagonal is set to 1 while all other values are 0). The identity matrix is very important in linear algebra: any matrix multiplied with identity matrix is simply the original matrix.</p>
<p>Let us experiment with these two types of matrices. You are going to build a matrix of ones with shape 3 by 3 called <code>tensor_of_ones</code> and an identity matrix of the same shape, called <code>identity_tensor</code>. We are going to see what happens when we multiply these two matrices, and what happens if we do an element-wise multiplication of them.</p></div>

Instructions
<ul>
<li>Create a matrix of ones with shape 3 by 3, and put it on variable <code>tensor_of_ones</code>.</li>
<li>Create an identity matrix with shape 3 by 3, and put it on variable <code>identity_tensor</code>.</li>
<li>Do a matrix multiplication of <code>tensor_of_ones</code> with <code>identity_tensor</code> and print its value.</li>
<li>Do an element-wise multiplication of <code>tensor_of_ones</code> with <code>identity_tensor</code> and print its value.</li>
</ul>

In [None]:
# Create a matrix of ones with shape 3 by 3
tensor_of_ones = torch.ones(3, 3)

# Create an identity matrix with shape 3 by 3
identity_tensor = torch.eye(3)

# Do a matrix multiplication of tensor_of_ones with identity_tensor
matrices_multiplied = torch.matmul(tensor_of_ones, identity_tensor)
print(matrices_multiplied)

# Do an element-wise multiplication of tensor_of_ones with identity_tensor
element_multiplication = tensor_of_ones * identity_tensor
print(element_multiplication)

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])


**matrices_multiplied is same as tensor_of_ones (because identity matrix is the neutral element in matrix multiplication, the product of any matrix multiplied with it gives the original matrix), while element_multiplication is same as identity_tensor.**

## Forward propagation

### Forward pass

<div class=""><p>Let's have something resembling more a neural network. The computational graph has been given below. You are going to initialize 3 large random tensors, and then do the operations as given in the computational graph. The final operation is the mean of the tensor, given by <code>torch.mean(your_tensor)</code>.</p>
<p><img src="https://assets.datacamp.com/production/repositories/4094/datasets/ab707279d7be2835c17787a38c6e2e54f6d89409/graph_exercise.jpg" alt=""></p></div>

Instructions
<ul>
<li>Initialize random tensors <code>x</code>, <code>y</code> and <code>z</code>, each having shape <code>(1000, 1000)</code>.</li>
<li>Multiply <code>x</code> with <code>y</code>, putting the result in tensor <code>q</code>.</li>
<li>Do an elementwise multiplication of tensor <code>z</code> with tensor <code>q</code>, putting the results in <code>f</code></li>
</ul>

In [None]:
# Initialize tensors x, y and z
x = torch.rand(1000, 1000)
y = torch.rand(1000, 1000)
z = torch.rand(1000, 1000)

# Multiply x with y
q = x * y

# Multiply elementwise z with q
f = q * z

mean_f = torch.mean(f)
print(mean_f)

tensor(0.1249)


**You just built a nice computational graph containing 5'000'001 values.**

### Backpropagation by auto-differentiation

### Backpropagation by hand

<div class=""><p><img src="https://assets.datacamp.com/production/repositories/4094/datasets/b483da1f7b9a03a3669973dc6faa0e8899e399fa/der_example.jpg" alt=""></p>
<p>Given the computational graph above, we want to calculate the derivatives for the leaf nodes (x, y and z). To get you started we already calculated the results of the forward pass (in red) in addition to calculating the derivatives of f and q.</p>
<p>The rules for derivative computations have been given in the table below:</p>
<table>
<thead>
<tr>
<th>Interaction</th>
<th>Overall Change</th>
</tr>
</thead>
<tbody>
<tr>
<td>Addition</td>
<td><mjx-container class="MathJax CtxtMenu_Attached_0" jax="CHTML" role="presentation" tabindex="0" ctxtmenu_counter="0" style="font-size: 116.7%; position: relative;"><mjx-math class="MJX-TEX" aria-hidden="true"><mjx-mo class="mjx-n"><mjx-c class="mjx-c28"></mjx-c></mjx-mo><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D453 TEX-I"></mjx-c></mjx-mi><mjx-mo class="mjx-n" space="3"><mjx-c class="mjx-c2B"></mjx-c></mjx-mo><mjx-mi class="mjx-i" space="3"><mjx-c class="mjx-c1D454 TEX-I"></mjx-c></mjx-mi><mjx-msup><mjx-mo class="mjx-n"><mjx-c class="mjx-c29"></mjx-c></mjx-mo><mjx-script style="vertical-align: 0.363em;"><mjx-mo class="mjx-n" size="s"><mjx-c class="mjx-c2032"></mjx-c></mjx-mo></mjx-script></mjx-msup><mjx-mo class="mjx-n" space="4"><mjx-c class="mjx-c3D"></mjx-c></mjx-mo><mjx-msup space="4"><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D453 TEX-I"></mjx-c></mjx-mi><mjx-script style="vertical-align: 0.363em; margin-left: 0.053em;"><mjx-mo class="mjx-n" size="s"><mjx-c class="mjx-c2032"></mjx-c></mjx-mo></mjx-script></mjx-msup><mjx-mo class="mjx-n" space="3"><mjx-c class="mjx-c2B"></mjx-c></mjx-mo><mjx-msup space="3"><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D454 TEX-I"></mjx-c></mjx-mi><mjx-script style="vertical-align: 0.363em;"><mjx-mo class="mjx-n" size="s"><mjx-c class="mjx-c2032"></mjx-c></mjx-mo></mjx-script></mjx-msup></mjx-math><mjx-assistive-mml role="presentation" unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mi>f</mi><mo>+</mo><mi>g</mi><msup><mo stretchy="false">)</mo><mo>′</mo></msup><mo>=</mo><msup><mi>f</mi><mo>′</mo></msup><mo>+</mo><msup><mi>g</mi><mo>′</mo></msup></math></mjx-assistive-mml></mjx-container></td>
</tr>
<tr>
<td>Multiplication</td>
<td><mjx-container class="MathJax CtxtMenu_Attached_0" jax="CHTML" role="presentation" tabindex="0" ctxtmenu_counter="1" style="font-size: 116.7%; position: relative;"><mjx-math class="MJX-TEX" aria-hidden="true"><mjx-mo class="mjx-n"><mjx-c class="mjx-c28"></mjx-c></mjx-mo><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D453 TEX-I"></mjx-c></mjx-mi><mjx-mo class="mjx-n" space="3"><mjx-c class="mjx-c22C5"></mjx-c></mjx-mo><mjx-mi class="mjx-i" space="3"><mjx-c class="mjx-c1D454 TEX-I"></mjx-c></mjx-mi><mjx-msup><mjx-mo class="mjx-n"><mjx-c class="mjx-c29"></mjx-c></mjx-mo><mjx-script style="vertical-align: 0.363em;"><mjx-mo class="mjx-n" size="s"><mjx-c class="mjx-c2032"></mjx-c></mjx-mo></mjx-script></mjx-msup><mjx-mo class="mjx-n" space="4"><mjx-c class="mjx-c3D"></mjx-c></mjx-mo><mjx-mi class="mjx-i" space="4"><mjx-c class="mjx-c1D453 TEX-I"></mjx-c></mjx-mi><mjx-mo class="mjx-n" space="3"><mjx-c class="mjx-c22C5"></mjx-c></mjx-mo><mjx-mi class="mjx-i" space="3"><mjx-c class="mjx-c1D451 TEX-I"></mjx-c></mjx-mi><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D454 TEX-I"></mjx-c></mjx-mi><mjx-mo class="mjx-n" space="3"><mjx-c class="mjx-c2B"></mjx-c></mjx-mo><mjx-mi class="mjx-i" space="3"><mjx-c class="mjx-c1D454 TEX-I"></mjx-c></mjx-mi><mjx-mo class="mjx-n" space="3"><mjx-c class="mjx-c22C5"></mjx-c></mjx-mo><mjx-mi class="mjx-i" space="3"><mjx-c class="mjx-c1D451 TEX-I"></mjx-c></mjx-mi><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D453 TEX-I"></mjx-c></mjx-mi></mjx-math><mjx-assistive-mml role="presentation" unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mi>f</mi><mo>⋅</mo><mi>g</mi><msup><mo stretchy="false">)</mo><mo>′</mo></msup><mo>=</mo><mi>f</mi><mo>⋅</mo><mi>d</mi><mi>g</mi><mo>+</mo><mi>g</mi><mo>⋅</mo><mi>d</mi><mi>f</mi></math></mjx-assistive-mml></mjx-container></td>
</tr>
<tr>
<td>Powers</td>
<td><mjx-container class="MathJax CtxtMenu_Attached_0" jax="CHTML" role="presentation" tabindex="0" ctxtmenu_counter="2" style="font-size: 116.7%; position: relative;"><mjx-math class="MJX-TEX" aria-hidden="true"><mjx-mo class="mjx-n"><mjx-c class="mjx-c28"></mjx-c></mjx-mo><mjx-msup><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D465 TEX-I"></mjx-c></mjx-mi><mjx-script style="vertical-align: 0.363em;"><mjx-mi class="mjx-i" size="s"><mjx-c class="mjx-c1D45B TEX-I"></mjx-c></mjx-mi></mjx-script></mjx-msup><mjx-msup><mjx-mo class="mjx-n"><mjx-c class="mjx-c29"></mjx-c></mjx-mo><mjx-script style="vertical-align: 0.363em;"><mjx-mo class="mjx-n" size="s"><mjx-c class="mjx-c2032"></mjx-c></mjx-mo></mjx-script></mjx-msup><mjx-mo class="mjx-n" space="4"><mjx-c class="mjx-c3D"></mjx-c></mjx-mo><mjx-mfrac space="4"><mjx-frac><mjx-num><mjx-nstrut></mjx-nstrut><mjx-mi class="mjx-i" size="s"><mjx-c class="mjx-c1D451 TEX-I"></mjx-c></mjx-mi></mjx-num><mjx-dbox><mjx-dtable><mjx-line></mjx-line><mjx-row><mjx-den><mjx-dstrut></mjx-dstrut><mjx-mrow size="s"><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D451 TEX-I"></mjx-c></mjx-mi><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D465 TEX-I"></mjx-c></mjx-mi></mjx-mrow></mjx-den></mjx-row></mjx-dtable></mjx-dbox></mjx-frac></mjx-mfrac><mjx-msup><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D465 TEX-I"></mjx-c></mjx-mi><mjx-script style="vertical-align: 0.363em;"><mjx-mi class="mjx-i" size="s"><mjx-c class="mjx-c1D45B TEX-I"></mjx-c></mjx-mi></mjx-script></mjx-msup><mjx-mo class="mjx-n" space="4"><mjx-c class="mjx-c3D"></mjx-c></mjx-mo><mjx-mi class="mjx-i" space="4"><mjx-c class="mjx-c1D45B TEX-I"></mjx-c></mjx-mi><mjx-msup><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D465 TEX-I"></mjx-c></mjx-mi><mjx-script style="vertical-align: 0.363em;"><mjx-texatom size="s" texclass="ORD"><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D45B TEX-I"></mjx-c></mjx-mi><mjx-mo class="mjx-n"><mjx-c class="mjx-c2212"></mjx-c></mjx-mo><mjx-mn class="mjx-n"><mjx-c class="mjx-c31"></mjx-c></mjx-mn></mjx-texatom></mjx-script></mjx-msup></mjx-math><mjx-assistive-mml role="presentation" unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><msup><mi>x</mi><mi>n</mi></msup><msup><mo stretchy="false">)</mo><mo>′</mo></msup><mo>=</mo><mfrac><mi>d</mi><mrow><mi>d</mi><mi>x</mi></mrow></mfrac><msup><mi>x</mi><mi>n</mi></msup><mo>=</mo><mi>n</mi><msup><mi>x</mi><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msup></math></mjx-assistive-mml></mjx-container></td>
</tr>
<tr>
<td>Inverse</td>
<td><mjx-container class="MathJax CtxtMenu_Attached_0" jax="CHTML" role="presentation" tabindex="0" ctxtmenu_counter="3" style="font-size: 116.7%; position: relative;"><mjx-math class="MJX-TEX" aria-hidden="true"><mjx-mo class="mjx-n"><mjx-c class="mjx-c28"></mjx-c></mjx-mo><mjx-mfrac><mjx-frac><mjx-num><mjx-nstrut></mjx-nstrut><mjx-mn class="mjx-n" size="s"><mjx-c class="mjx-c31"></mjx-c></mjx-mn></mjx-num><mjx-dbox><mjx-dtable><mjx-line></mjx-line><mjx-row><mjx-den><mjx-dstrut></mjx-dstrut><mjx-mi class="mjx-i" size="s"><mjx-c class="mjx-c1D465 TEX-I"></mjx-c></mjx-mi></mjx-den></mjx-row></mjx-dtable></mjx-dbox></mjx-frac></mjx-mfrac><mjx-msup><mjx-mo class="mjx-n"><mjx-c class="mjx-c29"></mjx-c></mjx-mo><mjx-script style="vertical-align: 0.363em;"><mjx-mo class="mjx-n" size="s"><mjx-c class="mjx-c2032"></mjx-c></mjx-mo></mjx-script></mjx-msup><mjx-mo class="mjx-n" space="4"><mjx-c class="mjx-c3D"></mjx-c></mjx-mo><mjx-mo class="mjx-n" space="4"><mjx-c class="mjx-c2212"></mjx-c></mjx-mo><mjx-mfrac><mjx-frac><mjx-num><mjx-nstrut></mjx-nstrut><mjx-mn class="mjx-n" size="s"><mjx-c class="mjx-c31"></mjx-c></mjx-mn></mjx-num><mjx-dbox><mjx-dtable><mjx-line></mjx-line><mjx-row><mjx-den><mjx-dstrut></mjx-dstrut><mjx-msup size="s"><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D465 TEX-I"></mjx-c></mjx-mi><mjx-script style="vertical-align: 0.289em;"><mjx-mn class="mjx-n" size="s"><mjx-c class="mjx-c32"></mjx-c></mjx-mn></mjx-script></mjx-msup></mjx-den></mjx-row></mjx-dtable></mjx-dbox></mjx-frac></mjx-mfrac></mjx-math><mjx-assistive-mml role="presentation" unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mfrac><mn>1</mn><mi>x</mi></mfrac><msup><mo stretchy="false">)</mo><mo>′</mo></msup><mo>=</mo><mo>−</mo><mfrac><mn>1</mn><msup><mi>x</mi><mn>2</mn></msup></mfrac></math></mjx-assistive-mml></mjx-container></td>
</tr>
<tr>
<td>Division</td>
<td><mjx-container class="MathJax CtxtMenu_Attached_0" jax="CHTML" role="presentation" tabindex="0" ctxtmenu_counter="4" style="font-size: 116.7%; position: relative;"><mjx-math class="MJX-TEX" aria-hidden="true"><mjx-mo class="mjx-n"><mjx-c class="mjx-c28"></mjx-c></mjx-mo><mjx-mfrac><mjx-frac><mjx-num><mjx-nstrut></mjx-nstrut><mjx-mi class="mjx-i" size="s"><mjx-c class="mjx-c1D453 TEX-I"></mjx-c></mjx-mi></mjx-num><mjx-dbox><mjx-dtable><mjx-line></mjx-line><mjx-row><mjx-den><mjx-dstrut></mjx-dstrut><mjx-mi class="mjx-i" size="s"><mjx-c class="mjx-c1D454 TEX-I"></mjx-c></mjx-mi></mjx-den></mjx-row></mjx-dtable></mjx-dbox></mjx-frac></mjx-mfrac><mjx-msup><mjx-mo class="mjx-n"><mjx-c class="mjx-c29"></mjx-c></mjx-mo><mjx-script style="vertical-align: 0.363em;"><mjx-mo class="mjx-n" size="s"><mjx-c class="mjx-c2032"></mjx-c></mjx-mo></mjx-script></mjx-msup><mjx-mo class="mjx-n" space="4"><mjx-c class="mjx-c3D"></mjx-c></mjx-mo><mjx-mo class="mjx-n" space="4"><mjx-c class="mjx-c28"></mjx-c></mjx-mo><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D451 TEX-I"></mjx-c></mjx-mi><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D453 TEX-I"></mjx-c></mjx-mi><mjx-mo class="mjx-n" space="3"><mjx-c class="mjx-c22C5"></mjx-c></mjx-mo><mjx-mfrac space="3"><mjx-frac><mjx-num><mjx-nstrut></mjx-nstrut><mjx-mn class="mjx-n" size="s"><mjx-c class="mjx-c31"></mjx-c></mjx-mn></mjx-num><mjx-dbox><mjx-dtable><mjx-line></mjx-line><mjx-row><mjx-den><mjx-dstrut></mjx-dstrut><mjx-mi class="mjx-i" size="s"><mjx-c class="mjx-c1D454 TEX-I"></mjx-c></mjx-mi></mjx-den></mjx-row></mjx-dtable></mjx-dbox></mjx-frac></mjx-mfrac><mjx-mo class="mjx-n"><mjx-c class="mjx-c29"></mjx-c></mjx-mo><mjx-mo class="mjx-n" space="3"><mjx-c class="mjx-c2B"></mjx-c></mjx-mo><mjx-mo class="mjx-n" space="3"><mjx-c class="mjx-c28"></mjx-c></mjx-mo><mjx-mfrac><mjx-frac><mjx-num><mjx-nstrut></mjx-nstrut><mjx-mrow size="s"><mjx-mo class="mjx-n"><mjx-c class="mjx-c2212"></mjx-c></mjx-mo><mjx-mn class="mjx-n"><mjx-c class="mjx-c31"></mjx-c></mjx-mn></mjx-mrow></mjx-num><mjx-dbox><mjx-dtable><mjx-line></mjx-line><mjx-row><mjx-den><mjx-dstrut></mjx-dstrut><mjx-msup size="s"><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D454 TEX-I"></mjx-c></mjx-mi><mjx-script style="vertical-align: 0.289em;"><mjx-mn class="mjx-n" size="s"><mjx-c class="mjx-c32"></mjx-c></mjx-mn></mjx-script></mjx-msup></mjx-den></mjx-row></mjx-dtable></mjx-dbox></mjx-frac></mjx-mfrac><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D451 TEX-I"></mjx-c></mjx-mi><mjx-mi class="mjx-i"><mjx-c class="mjx-c1D454 TEX-I"></mjx-c></mjx-mi><mjx-mo class="mjx-n" space="3"><mjx-c class="mjx-c22C5"></mjx-c></mjx-mo><mjx-mi class="mjx-i" space="3"><mjx-c class="mjx-c1D453 TEX-I"></mjx-c></mjx-mi><mjx-mo class="mjx-n"><mjx-c class="mjx-c29"></mjx-c></mjx-mo></mjx-math><mjx-assistive-mml role="presentation" unselectable="on" display="inline"><math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mfrac><mi>f</mi><mi>g</mi></mfrac><msup><mo stretchy="false">)</mo><mo>′</mo></msup><mo>=</mo><mo stretchy="false">(</mo><mi>d</mi><mi>f</mi><mo>⋅</mo><mfrac><mn>1</mn><mi>g</mi></mfrac><mo stretchy="false">)</mo><mo>+</mo><mo stretchy="false">(</mo><mfrac><mrow><mo>−</mo><mn>1</mn></mrow><msup><mi>g</mi><mn>2</mn></msup></mfrac><mi>d</mi><mi>g</mi><mo>⋅</mo><mi>f</mi><mo stretchy="false">)</mo></math></mjx-assistive-mml></mjx-container></td>
</tr>
</tbody>
</table></div>

<pre>
Possible Answers
<b>The Derivative of x is 5, the derivative of y is 5, the derivative of z is 1.</b>
The Derivative of x is 5, the derivative of y is 5, the derivative of z is 5.
The Derivative of x is 8, the derivative of y is -3, the derivative of z is 0.
Derivatives are lame, integrals are cool.
</pre>

### Backpropagation using PyTorch

<p>Here, you are going to use automatic differentiation of PyTorch in order to compute the derivatives of <code>x</code>, <code>y</code> and <code>z</code> from the previous exercise.</p>

Instructions
<ul>
<li>Initialize tensors <code>x</code>, <code>y</code> and <code>z</code> to values 4, -3 and 5.</li>
<li>Put the sum of tensors <code>x</code> and <code>y</code> in <code>q</code>, put the product of <code>q</code> and <code>z</code> in <code>f</code>.</li>
<li>Calculate the derivatives of the computational graph.</li>
<li>Print the gradients of the <code>x</code>, <code>y</code> and <code>z</code> tensors.</li>
</ul>

In [None]:
# Initialize x, y and z to values 4, -3 and 5
x = torch.tensor(4., requires_grad=True)
y = torch.tensor(-3., requires_grad=True)
z = torch.tensor(5., requires_grad=True)

# Set q to sum of x and y, set f to product of q with z
q = x + y
f = q * z

# Compute the derivatives
f.backward()

# Print the gradients
print("Gradient of x is: " + str(x.grad))
print("Gradient of y is: " + str(y.grad))
print("Gradient of z is: " + str(z.grad))

Gradient of x is: tensor(5.)
Gradient of y is: tensor(5.)
Gradient of z is: tensor(1.)


**No surprise here, the results are the same as when you calculated them by hand!**

### Calculating gradients in PyTorch

<div class=""><p>Remember the exercise in forward pass? Now that you know how to calculate derivatives, let's make a step forward and start calculating the gradients (derivatives of tensors) of the computational graph you built back then. We have already initialized for you three random tensors of shape <code>(1000, 1000)</code> called <code>x</code>, <code>y</code> and <code>z</code>. First, we multiply tensors <code>x</code> and <code>y</code>, then we do an elementwise multiplication of their product with tensor <code>z</code>, and then we compute its <code>mean</code>. In the end, we compute the derivatives.</p>
<p>The main difference from the previous exercise is the scale of the tensors. While before, tensors <code>x</code>, <code>y</code> and <code>z</code> had just 1 number, now they each have 1 million numbers.</p>
<p><img src="https://assets.datacamp.com/production/repositories/4094/datasets/ab707279d7be2835c17787a38c6e2e54f6d89409/graph_exercise.jpg" alt=""></p></div>

In [None]:
x = torch.rand(1000, 1000, requires_grad=True)
y = torch.rand(1000, 1000, requires_grad=True)
z = torch.rand(1000, 1000, requires_grad=True)

Instructions
<ul>
<li>Multiply tensors <code>x</code> and <code>y</code>, put the product in tensor <code>q</code>.</li>
<li>Do an elementwise multiplication of tensors <code>z</code> with <code>q</code>.</li>
<li>Calculate the gradients.</li>
</ul>

In [None]:
# Multiply tensors x and y
q = torch.matmul(x, y)

# Elementwise multiply tensors z with q
f = z * q

mean_f = torch.mean(f)

# Calculate the gradients
mean_f.backward()

**In general, calculating gradients is as easy as calculating derivatives in PyTorch. Obviously, if the tensors are very large (billions of values) then the calculation might take some time.**

## Introduction to Neural Networks

### Your first neural network

<p>You are going to build a neural network in PyTorch, using the hard way. Your input will be images of size <code>(28, 28)</code>, so images containing <code>784</code> pixels. Your network will contain an <code>input_layer</code> (provided for you), a hidden layer with <code>200</code> units, and an output layer with <code>10</code> classes. The input layer has already been created for you. You are going to create the weights, and then do matrix multiplications, getting the results from the network.</p>

In [None]:
input_layer = torch.rand(784)

Instructions
<ul>
<li>Initialize with random numbers two matrices of weights, called <code>weight_1</code> and <code>weight_2</code>.</li>
<li>Set the result of <code>input_layer</code> times <code>weight_1</code> to <code>hidden_1</code>. Set the result of <code>hidden_1</code> times <code>weight_2</code> to <code>output_layer</code>.</li>
</ul>

In [None]:
# Initialize the weights of the neural network
weight_1 = torch.rand(784, 200)
weight_2 = torch.rand(200, 10)

# Multiply input_layer with weight_1
hidden_1 = torch.matmul(input_layer, weight_1)

# Multiply hidden_1 with weight_2
output_layer = torch.matmul(hidden_1, weight_2)
print(output_layer)

tensor([20436.0547, 20528.6582, 20435.9531, 19162.4219, 20058.9766, 18554.9512,
        21562.4766, 20638.4668, 20152.7910, 19314.1621])


**For the most part, neural networks are just matrix (tensor) multiplication. This is the reason why we have put so much emphasis on matrices and tensors!**

### Your first PyTorch neural network

<p>You are going to build the same neural network you built in the previous exercise, but now using the PyTorch way. As a reminder, you have 784 units in the input layer, 200 hidden units and 10 units for the output layer.</p>

In [None]:
import torch.nn as nn

Instructions
<ul>
<li>Instantiate two linear layers calling them <code>self.fc1</code> and <code>self.fc2</code>. Determine their correct dimensions. </li>
<li>Implement the <code>.forward()</code> method, using the two layers you defined and returning <code>x</code>.</li>
</ul>

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        
        # Instantiate all 2 linear layers  
        self.fc1 = nn.Linear(784 , 200)
        self.fc2 = nn.Linear(200 , 10)

    def forward(self, x):
      
        # Use the instantiated layers and return x
        x = self.fc1(x)
        x = self.fc2(x)
        return  x