# Implementation of KAN's in PyTorch

Here we are going to explore how to implement the somewhat novel KAN architecture in PyTorch then look at the PyKan "official" package.

We aren't going to go into too much detail of the basic theory but the paper can be found [here](https://arxiv.org/pdf/2404.19756).

In contrast to MLP's, KAN's have learnable activation functions on edges and sum operations on nodes. KAN's depend on an important theorem from numerical analysis.

$$\text{Kolmogorov-Arnold Representation Theorem}$$
$$F(x_1, x_2, \dots, x_n) = \sum_{j=1}^m \psi_j \bigg( \sum_{i=1}^n \phi_{ij} (x_i) \bigg)$$

In essence, this asserts that (given sufficient conditions, Hilbert space, polynomial ring etc.) a continuous multivariate function can be represented as a composition of univariate functions. The importance of this theorem cannot be overstated. After seeing use in numerical analysis, it was only a matter of time before we saw it be applied to machine learning.

So what does this look like in the context of neural networks? A layer of KAN takes an input $x \in \mathbb{R}^n$ and applies the operator $\phi^1 \in \mathbb{R}^{m\times n}$. As this is a linear operator, it has a natural matrix representation.

$$
\phi^1 = 
\begin{bmatrix}
    \phi_{11} & \phi_{12} & \dots & \phi_{1n}\\
    \phi_{21} & \phi_{22} & \dots & \phi_{2n}\\
    \vdots & \vdots & \ddots & \vdots\\
    \phi_{m1} & \phi_{m2} & \dots & \phi_{mn}
\end{bmatrix}
$$

Note that each entry in $\phi^i$ is a univariate function with respect to $x_i$.