Fetching contributors…
Cannot retrieve contributors at this time
78 lines (51 sloc) 5.02 KB

# Why using KeOps?

## Scalable kernel operations

KeOps can now be used on a broad class of problems (:ref:part.formula). But at heart, the main motivation behind this library is the need to compute fast and scalable Gaussian convolutions (RBF kernel products). For very large values of M and N, given :

• a target point cloud (x_i)_{i=1}^M \in \mathbb R^{M \times D},
• a source point cloud (y_j)_{j=1}^N \in \mathbb R^{N \times D},
• a signal (b_j)_{j=1}^N \in \mathbb R^{N} attached to the y_j's,

KeOps allows you to compute efficiently the array (a_i)_{i=1}^M \in \mathbb R^{M} given by

a_i =  \sum_j K(x_i,y_j) b_j,  \qquad i=1,\cdots,M,


where K(x_i,y_j) = \exp(-\|x_i - y_j\|^2 / 2 \sigma^2). On top of this, thanks to KeOps' automatic differentiation module, you can also get access to the gradient of the a_i's with respect to the x_i's:

a_i' =  \sum_j \partial_x K(x_i,y_j) b_j,  \qquad i=1,\cdots,M,


without having to code the formula \partial_x K(x_i,y_j) = -\tfrac{1}{\sigma^2}(x_i - y_j) \exp(-\|x_i - y_j\|^2 / 2 \sigma^2) !

## High performances

In recent years, Deep Learning frameworks such as PyTorch or TensorFlow have evolved into fully-fledged applied math libraries: with negligible overhead, they bring automatic differentiation and seamless GPU support to research communities that were used to Matlab, NumPy and other tensor-centric frameworks.

Unfortunately, though, no magic is involved: optimised CUDA codes still have to be written for every single operation provided to end-users. Supporting all the standard mathematical routines thus comes at a huge engineering cost for the developers of the main frameworks. As of 2019, this effort has been mostly restricted to the operations needed to implement Convolutional Neural Networks: linear algebra routines and convolutions on grids, images and volumes.

Consequently, in array-centric frameworks, the standard way of computing Gaussian convolutions is to create and store in memory the full M-by-N kernel matrix K_{i,j}=K(x_i,y_j), before computing (a_i) = (K_{i,j}) (b_j) as a matrix product. But for large datasets (say, M,N \geqslant 10,000), this is not realistic: large matrices just don't fit in GPU memories.

KeOps is all about letting researchers break through this memory bottleneck. Relying on online map-reduce schemes, we provide CUDA routines that "sum" the coefficients K_{i,j}\cdot b_j as they are computed, without ever storing the full matrix K in memory.

## A generic framework that suits your needs

KeOps supports generic operations, way beyond the simple case of kernel convolutions. Let's say that you have at hand:

• a collection p^1, p^2, ..., p^P of vectors.
• a collection x^1_i, x^2_i, ..., x^X_i of vector sequences, indexed by an integer i ranging from 1 to M.
• a collection y^1_j, y^2_j, ..., y^Y_j of vector sequences, indexed by an integer j ranging from 1 to N.
• a vector-valued function f(p^1, p^2,..., x^1_i, x^2_i,..., y^1_j, y^2_j, ...) on these input vectors.

Then, referring to the p's as parameters, the x's as x-variables and the y's as y-variables, the KeOps library allows you to compute efficiently any expression a_i of the form

a_i = \operatorname{Reduction}_{j=1,\cdots,N}\limits \big[ f(p^1, p^2,..., x^1_i, x^2_i,..., y^1_j, y^2_j, ...)  \big], \qquad i=1,\cdots,M


alongside its derivatives with respect to all the variables and parameters.

## Features

You can’t perform that action at this time.