# Explicit layers in deep learning

This notebook is based on [Chapter 1: Introduction - Explicit layers in deep learning](https://implicit-layers-tutorial.org/introduction/) in NeurIPS 2020 tutorial, created by [Zico Kolter](http://zicokolter.com/), [David Duvenaud](http://www.cs.toronto.edu/~duvenaud/), and [Matt Johnson](http://people.csail.mit.edu/mattjj/). During my own learning process, I will use both Vietnamese and English for more comfortable. So if someone reach this notebook/ or repo, you should handle by your own.

## What is implicit function?

If a function is written in the form of
\begin{equation}
        y = f(x), \text{ e.g., } y = 2x^3
\end{equation}
is called an \textbf{explicit function}. 

And sometimes functions are given in the form 
\begin{equation}
    y - f(x) = 0 \text{ e.g., } y - 2x^3 = 0
\end{equation}
is called an **implicit function**.

In general, **implicit function** are written in a general form as
\begin{equation}
    F(y, x) = 0
\end{equation}

*Note that*: while we can always change an explicit function into an implicit function (by taking $f(x)$ to the other side of the equality) the reverse is not always true

In machine learning and deep learning areas, deep neural networks (or DNN for short) are traditionally built by stacking many layers, such as convolutional layers, self-attention layers, or fully connected layers. And these layers are definitely *implicit* due to their output procedure, which involves taking an exact sequence of operations from input data.

The criation in studying implicit layers is *specify the conditions that we want the layer’s output to satisfy* instead of specifying how to compute the layer’s output from the input. 

Considering some explicit function $f: \mathcal{X} \rightarrow \mathcal{Z}$ with input $x \in \mathcal{X}$, and output $z \in \mathcal{Z}$

$$
z = f(x)
$$

then an implicit layer would instead be defined via a function $g : \mathcal{X} \times \mathcal{Z} \rightarrow \mathbb{R}^n$ (a joint function of both $x$ and $z$), and output of the layer $z$ is requied to *satisfy some constraint* e.g., finding a root of the equation,

$$
\text{Finding } z \quad\text{ s.t. }\quad g(x, z) = 0
$$

And in practice, depending on the representation of $g(x, z)$, above formalism can lead to many problems:
- If $g(x, z)$ captures algebraic equations and fixed points $\rightarrow$ *recurent backprop models* or *deep equilibrium models*.
- If $g(x, z)$ captures differential equations $\rightarrow$ *neural ordinary differential equations*.
- If we consider the optimality conditions of optimization problems $\rightarrow$ *differentiable optimization approaches*.
