<a href="https://colab.research.google.com/github/jonkrohn/ML-foundations/blob/master/notebooks/4-calculus-ii.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Calculus II: Partial Derivatives & Integrals

This class, *Calculus II: Partial Derivatives & Integrals*, builds on single-variable derivative calculus to introduce gradients and integral calculus. Gradients of learning, which are facilitated by partial-derivative calculus, are the basis of training most machine learning algorithms with data -- i.e., stochastic gradient descent (SGD). Paired with the principle of the chain rule (also covered in this class), SGD enables the backpropagation algorithm to train deep neural networks. 

Integral calculus, meanwhile, comes in handy for myriad tasks associated with machine learning, such as finding the area under the so-called “ROC curve” -- a prevailing metric for evaluating classification models. The content covered in this class is itself foundational for several other classes in the *Machine Learning Foundations* series, especially *Probability & Information Theory* and *Optimization*.

Over the course of studying this topic, you'll: 

* Develop an understanding of what’s going on beneath the hood of machine learning algorithms, including those used for deep learning. 
* Be able to grasp the details of the partial-derivative, multivariate calculus that is common in machine learning papers as well as many in other subjects that underlie ML, including information theory and optimization algorithms. 
* Use integral calculus to determine the area under any given curve, a recurring task in ML applied, for example, to evaluate model performance by calculating the ROC AUC metric.

**Note that this Jupyter notebook is not intended to stand alone. It is the companion code to a lecture or to videos from Jon Krohn's [Machine Learning Foundations](https://github.com/jonkrohn/ML-foundations) series, which offer detail on the following:**

*Segment 1: Review of Introductory Calculus*
* Differentiation with Rules
* Cost (or Loss) Functions
* Calculating the Derivative of a Cost Function
* AutoDiff: Automatic Differentiation

*Segment 2: Gradients Applied to Machine Learning*
* Partial Derivatives of Multivariate Functions
* Gradients
* Stochastic Gradient Descent 
* The Chain Rule
* Backpropagation 

*Segment 3: Integrals*
* The Confusion Matrix
* The Receiver-Operating Characteristic (ROC) Curve 
* Calculating Integrals
* Finding the Area Under the ROC Curve
* Resources for Further Study of Calculus


## Segment 1: Review of Introductory Calculus

Refer to slides and [*Regression in PyTorch* notebook](https://github.com/jonkrohn/ML-foundations/blob/master/notebooks/regression-in-pytorch.ipynb).

## Segment 2: Gradients Applied to Machine Learning

In [1]:
import math
import torch

In [2]:
r = torch.tensor(3.).requires_grad_()
r

tensor(3., requires_grad=True)

In [3]:
l = torch.tensor(5.).requires_grad_()
l

tensor(5., requires_grad=True)

The volume of a cylinder is described by $v = \pi r^2 l$ where: 

* $r$ is the radius of the cylinder
* $l$ is its length

In [4]:
def cylinder_vol(my_r, my_l):
    return math.pi * my_r**2 * my_l

In [5]:
v = cylinder_vol(r, l)
v

tensor(141.3717, grad_fn=<MulBackward0>)

In [6]:
v.backward()

In [7]:
l.grad

tensor(28.2743)

As derived on the slides: $$\frac{\partial v}{\partial l} = \pi r^2$$

In [8]:
math.pi * 3**2

28.274333882308138

With $r = 3$, a change in $l$ by one unit corresponds to a change in $v$ of 28.27 (cubed) units: 

In [9]:
cylinder_vol(3, 6) - cylinder_vol(3, 5)

28.274333882308127

In [10]:
cylinder_vol(3, 7) - cylinder_vol(3, 6)

28.274333882308156

And: $$\frac{\partial v}{\partial r} = 2 \pi r l$$

In [12]:
r.grad

tensor(94.2478)

In [13]:
2 * math.pi * 3 * 5

94.24777960769379

In [14]:
(cylinder_vol(3 + 1e-6, 5) - cylinder_vol(3, 5)) * 1e6

94.24779531741478