# Bezier Loss Function

This notebook demonstrates the Bezier Loss Function which is regression loss function. Unlike traditional neural networks where the interpretation of the network output are directly points in the output space for direct use, the neural network outputs are points in the output to construct Bezier Curves over some predifined parameter interval.

## Bezier Curves

[Bezier Curves](https://en.wikipedia.org/wiki/B%C3%A9zier_curve) are polynomial curves of degree $d$ constructed from linear interpolations between successive points in a list of points.

$$p(t) = \sum_{k=0}^d {d \choose k} (1-t)^{d-k} t^k P_k$$

Note that the points $P$'s all exist in some arbitrary dimensional space but the parameter $t$ is real and $t\in[0,\ 1]$. Bezier curves also have usefull properties such as easy calculations of $\dot{p}$, embedding lower degree polynomials in higher degree Bezier Curves, and the ability enforce levels of continuity.

## Theoretical difference from standard sequential neural networks.

Normally training of neural networks have the network output try and approximate the function $f:\mathit{X}\mapsto\mathit{Y}$ based on a finite number of samples of the mapping. In this way, the network is $\hat{f}:\mathit{X}\mapsto\hat{\mathit{Y}}$ with $space(\hat{\mathit{Y}}) \equiv space(\mathit{Y})$ (e.g. $\mathbf{R}^{n}$) where the network is trained by minimizing a loss function $\ell:\mathit{Y}\times\hat{\mathit{Y}}\mapsto \mathbf{R}^+$. However as neural networks are being treated as constructors of the Bezier curve, then each training sample need $m$ curve samples of $f$ around the desired interval with given values of $t$ for each curve sample. This means the network desired output for training in tensorflow is $\mathbf{N}\times\mathbf{R}\times space(\mathit{Y})$ and the network predicted outputs are $\mathbf{N}^d\times space(\mathit{Y})$ where $\mathbf{N}^d$ is $\lbrace 0,\ 1,\cdots,\ d\rbrace$. However, since the neural network output is interpret as a Bezier curve, there is another mapping $B: \mathbf{N}^d \times space(\mathit{Y})\mapsto \mathbf{P}^d_{space(\mathit{Y})}$ where $\mathbf{P}^d_{space(\mathit{Y})}$ is the set of all polynomial of degree $d$ in $space(\mathit{Y})$. With a convolution $B\circ\hat{f}: \mathit{X}\mapsto \mathbf{P}^d_{space(\mathit{Y})}$, the bezier loss function is defined as $\ell_B:(\mathbf{R}\times space(\mathit{Y}))\times \mathbf{P}^d_{space(\mathit{Y})}\mapsto\mathbf{R}$.

## Tensorflow implementation

The spaces are handled as three dimensional tensors in tensorflow with the following ordering:

1. Sample index
2. Point index
3. Dimension index

For both true and estimated outputs, the 