Fetching contributors…
Cannot retrieve contributors at this time
210 lines (149 sloc) 13.8 KB

# Formulas and syntax

KeOps lets you define any reduction operation of the form

\alpha_i = \operatorname{Reduction}_j\limits \big[ F(x^0_{\iota_0}, ... , x^{n-1}_{\iota_{n-1}})  \big]


or

\beta_j = \operatorname{Reduction}_i\limits \big[ F(x^0_{\iota_0}, ... , x^{n-1}_{\iota_{n-1}})  \big]


where F is a symbolic formula, the x^k_{\iota_k}'s are vector variables and \text{Reduction} is a Sum, LogSumExp or any other standard operation (see :ref:part.reduction for the full list of supported reductions).

We now describe the symbolic syntax that can be used through KeOps' bindings.

## Variables: category, index and dimension

At a low level, every variable x^k_{\iota_k} is specified by its category \iota_k\in\{i,j,\emptyset\} (meaning that the variable is indexed by i, by j, or is a fixed parameter across indices), its positional index k and its dimension d_k.

In practice, the category \iota_k is given through a keyword

keyword meaning
Vi variable indexed by i
Vj variable indexed by j
Pm parameter

followed by a (k,d_k) or (index,dimension) pair of integers. For instance, Vi(2,4) specifies a variable indexed by i, given as the third (k=2) input in the function call, and representing a vector of dimension d_k=4.

N.B.: Using the same index k for two variables with different dimensions or categories is not allowed and will be rejected by the compiler.

## Math operators

To define formulas with KeOps, you can use simple arithmetics:

 f * g scalar-vector multiplication (if f is scalar) or vector-vector element-wise multiplication f + g addition of two vectors f - g difference between two vectors or minus sign f / g element-wise division (f | g) scalar product between vectors

Elementary functions:

 Inv(f) element-wise inverse (1 ./ f) Exp(f) element-wise exponential function Log(f) element-wise natural logarithm Sin(f) element-wise sine function Cos(f) element-wise cosine function Pow(f, P) P-th power of f (element-wise), where P is a fixed integer Powf(f, g) power operation, alias for Exp(g*Log(f)) Square(f) element-wise square, faster than Pow(f,2) Sqrt(f) element-wise square root, faster than Powf(f,.5) Rsqrt(f) element-wise inverse square root, faster than Powf(f,-.5) Sign(f) element-wise sign function (-1 if f<0, 0 if f=0, 1 if f>0) Step(f) element-wise step function (0 if f<0, 1 if f>=0) ReLU(f) element-wise ReLU function (0 if f<0, f if f>=0)

Simple vector operations:

 SqNorm2(f) squared L2 norm, same as (f|f) Norm2(f) L2 norm, same as Sqrt((f|f)) Normalize(f) normalize vector, same as Rsqrt(SqNorm2(f)) * f SqDist(f,g) squared L2 distance, same as SqNorm2(f-g)

Generic squared Euclidean norms, with support for scalar, diagonal and full (symmetric) matrices. If f is a vector of size N, depending on the size of s, WeightedSqNorm(s,f) may refer to:

• a weighted L2 norm s[0]\cdot\sum_{1\leqslant i \leqslant N} f[i-1]^2 if s is a vector of size 1.
• a separable norm \sum_{1\leqslant i \leqslant N} s[i-1]\cdot f[i-1]^2 if s is a vector of size N.
• a full anisotropic norm \sum_{1\leqslant i,j\leqslant N} s[N\cdot i+j-1]\cdot f[i-1] f[j-1] if s is a vector of size N*N such that s[N*i+j-1]=s[N*j+i-1].
 WeightedSqNorm(s,f) generic squared euclidean norm WeightedSqDist(s,f,g) generic squared distance, same as WeightedSqNorm(s,f-g)

 IntCst(N) integer constant N IntInv(N) alias for Inv(IntCst(N)) : 1/N Zero(N) vector of zeros of size N Elem(f, M) extract M-th element of vector f ElemT(f, N, M) insert scalar value f at position M in a vector of zeros of length N Extract(f, M, D) extract sub-vector from vector f (M is starting index, D is dimension of sub-vector) ExtractT(f, M, D) insert vector f in a larger vector of zeros (M is starting index, D is dimension of output) Concat(f, g) concatenation of vectors f and g

Elementary dot products:

 MatVecMult(f, g) matrix-vector product f x g: f is vector interpreted as matrix (column-major), g is vector VecMatMult(f, g) vector-matrix product f x g: f is vector, g is vector interpreted as matrix (column-major) TensorProd(f, g) tensor product f x g^T : f and g are vectors of sizes M and N, output is of size MN.

 Grad(f,x,e) gradient of f with respect to the variable x, with e as the "grad_input" to backpropagate GradMatrix(f, v) matrix of gradient (i.e. transpose of the jacobian matrix)

## Reductions

The operations that can be used to reduce an array are described in the following table.

code name arguments mathematical expression (reduction over j) remarks
Sum f \sum_j f_{ij}
Max_SumShiftExp f (scalar) (m_i,s_i) with \left\{\begin{array}{l}m_i=\max_j f_{ij}\\s_i=\sum_j\exp(m_i-f_{ij})\end{array}\right.
• core KeOps reduction for LogSumExp.
LogSumExp f (scalar) \log\left(\sum_j\exp(f_{ij})\right) only in Python bindings
Max_SumShiftExpWeight f (scalar), g (m_i,s_i) with \left\{\begin{array}{l}m_i=\max_j f_{ij}\\s_i=\sum_j\exp(m_i-f_{ij})g_{ij}\end{array}\right.
• core KeOps reduction for LogSumExpWeight and SumSoftMaxWeight.
LogSumExpWeight f (scalar), g \log\left(\sum_j\exp(f_{ij})g_{ij}\right) only in Python bindings
SumSoftMaxWeight f (scalar), g \left(\sum_j\exp(f_{ij})g_{ij}\right)/\left(\sum_j\exp(f_{ij})\right) only in Python bindings
Min f \min_j f_{ij} no gradient
ArgMin f \text{argmin}_jf_{ij} gradient xreturns zeros
Min_ArgMin f \left(\min_j f_{ij} ,\text{argmin}_j f_{ij}\right) no gradient
Max f \max_j f_{ij} no gradient
ArgMax f \text{argmax}_j f_{ij} gradient returns zeros
Max_ArgMax f \left(\max_j f_{ij},\text{argmax}_j f_{ij}\right) no gradient
KMin f, K (int) \begin{array}{l}\left[\min_j f_{ij},\ldots,\min^{(K)}_jf_{ij}\right] \\(\min^{(k)}\text{means k-th smallest value})\end{array} no gradient
ArgKMin f, K (int) \left[\text{argmin}_jf_{ij},\ldots,\text{argmin}^{(K)}_j f_{ij}\right] gradient returns zeros
KMin_ArgKMin f, K (int) \left([\min^{(1...K)}_j f_{ij} ],[\text{argmin}^{(1...K)}_j f_{ij}]\right) no gradient

N.B.: All these reductions, except Max_SumShiftExp and LogSumExp, are vectorized : whenever the input f or g is vector-valued, the output will be vector-valued, with the corresponding reduction applied element-wise to each component.

N.B.: All reductions accept an additional optional argument that specifies wether the reduction is performed over the j or the i index. (see :ref:part.cppapi and :ref:part.genred)

## An example

Assume we want to compute the sum

F(p,x,y,a)_i = \left(\sum_{j=1}^N (p -a_j )^2 \exp(x_i^u + y_j^u) \right)_{i=1..M, u=1,2,3} \in \mathbb R^{M\times 3}


where:

• p \in \mathbb R is a parameter,
• x \in \mathbb R^{M\times 3} is an x-variable indexed by i,
• y \in \mathbb R^{N\times 3} is an y-variable indexed by j,
• a \in \mathbb R^N is an y-variable indexed by j.

Using the variable placeholders presented above and the mathematical operations listed in :ref:part.mathOperation, we can define F as a symbolic string

F = "Sum_Reduction( Square( Pm(0,1) - Vj(3,1) )  *  Exp( Vi(1,3) + Vj(2,3) ), 1 )"

in which + and - denote the usual addition of vectors, Exp is the (element-wise) exponential function and * denotes scalar-vector multiplication. The second argument 1 of the Sum_Reduction operator indicates that the summation is performed with respect to the j index: a 0 would have been associated with an i-reduction.

Note that in all bindings, variables can be defined through aliases. In this example, we may write p=Pm(0,1), x=Vi(1,3), y=Vj(2,3), a=Vj(3,1) and thus give F through a much friendlier expression:

F = "Sum_Reduction( Square(p - a) * Exp(x + y), 1 )"
You can’t perform that action at this time.