# Neural Ordinary Differential Equations

## Introduction and Motivation

### What is an ODE?

* Vertical Projectile motion:


$$\sum{F} = \frac{dp}{dt} $$ 

$$- (F_{drag} + F_{grav}) = \frac{dp}{dt} $$ 

$$ -(k \frac{dy}{dt} + m g) = m \frac{d^2y}{dt^2} $$ 
    
By integrating both sides from $ t_0 $ to $ t_0 + t $:
        
$$ -(k y + m g t) = m \frac{dy}{dt} $$ 

$$ \frac{dy}{dt} = \frac{-(k y + m g t)}{m} \tag{1.1}$$ 
$$ \frac{dy}{dt} = f(y, t) \tag{1.2}$$ 

One way of numerically solving $(1.2)$, would be by integrating in $t$ with discrete steps of $t$:

$$ \frac{dy}{dt} = f(y, t) \tag{1.2}$$ 

$$y(t_0 + \Delta t) - y(t_0) = \Delta t   f(y(t_0), t)$$

$$y(t_0 + 2\Delta t) - y(t_0 + \Delta t) = \Delta t   f(y(t_0 + \Delta t), t)$$

$$y(t_0 + 3\Delta t) - y(t_0 + 2\Delta t) = \Delta t   f(y(t_0 + 2\Delta t), t)$$


$$...$$


$$ y(t_n + \Delta t)  = \Delta t f(y(t_n), t) + y(t_n) \tag{2}$$


* The previous steps are known as `Euler's method`.

* It's known for some time, that that $(2)$ resembles very much with the characteristic equation of `ResNets`, where $f(y_n, t)$ represents the output of a layer $n$ given $y_n$ as input.

### The classic ResNet example

- ResNets had the best accuracy in the ImageNet Competition (2015)
- ResNets uses skip-connections between layers, so the "depth" of the network becomes a feature to be learnt 
- Can have up to 100 layers while avoiding vanishing gradients.
<img src="assets/1.png">
src: https://arxiv.org/abs/1512.03385

### How it works?

* Instead of using feeding the output of a previous layer into the next layer:
$$
f(z(t-1),\ \theta(t-1)) = z(t) $$

$$
f(z(t),\ \theta(t)) = z(t+1)
$$

* We feed the input of the previous layer as well:
$$
f(z(t-1),\ \theta(t-1)) + z(t-1)= z(t) $$

$$
f(z(t),\ \theta(t)) + z(t)= z(t+1)
$$

* Thus, if the network with $\theta$ as parameters is trained with a set of measurements $\{(z_0, t_0),(z_1, t_1),...,(z_M, t_M)\}$, it will approximate the dynamics function $f(\theta, z)$

* Given the noisy measurements:
$$\{(z_0, t_0),(z_1, t_1),...,(z_M, t_M)\}$$

* One wants to find an aproximation $\hat{f}(z, t, \theta)$ for the dynamics of $f(z, t)$


* Given $(z_0, t_0), (z_1, t_1)$, The system begin evolving and from $z_0, t_0$ until it reaches $z_1, t_0$. 



* One approximation ($\hat{z_1}, t_1$) of the $\hat{f}(z, t, \theta)$ would be achieved through an integration of $(1)$


$$\Big( \int_{t_0}^{t_1} f(z(t), t, \theta)dt = \hat{z}(t_1)\Big)$$


* In the case where a optimizer wants to approximate $\hat{z}(t)$, a cost function would be defined:
$$
L(z(t_1)) = L \Big( \int_{t_0}^{t_1} f(z(t), t, \theta)dt \Big) = L \big( \text{ODESolve}(z(t_0), f, t_0, t_1, \theta) \big) \tag{2}
$$

Thus, a function $f(z(t), t, \theta)$ with parameters $\theta$ that approximates 