Vectorized operations for Yorick

The vops.i plug-in for Yorick provides a number of optimized functions to perform basic linear algebra operations on arrays (as if they are vectors). These functions can serve as fast building blocks for more advanced algorithms (e.g., the linear conjugate gradient method).

Usage

Functions and subroutines provided by vops.i are:

vops_norm1(x)                          -->  sum(abs(x))

yields the L1-norm of the real-valued array x;

vops_norm2(x)                          -->  sqrt(sum(x*x))

yields the L2-norm (Euclidean norm) of the real-valued array x;

vops_norminf(x)                        -->  max(abs(x))

yields the infinite-norm of the real-valued array x;

vops_inner(x,y)                        -->  sum(x*y)
vops_inner(w,x,y)                      -->  sum(w*x*y)

yields the inner product or "triple" innner product of the real-valued arrays w, x, and y;

vops_scale(alpha, x)                   -->  alpha*x
vops_scale(x, alpha)                   -->  alpha*x
vops_scale, x, alpha;                  -->  x *= alpha

yields the real-valued array x scaled by the factor alpha (if alpha is zero, the result is filled by zeros whatever the values in x, hence x may contain NaN's in that case); if called as a subroutine, vops_scale overwrites the contents of x;

vops_update, y, alpha, x;              -->  y += alpha*x

computes y += alpha*x for arrays x and y and scalar factor alpha, overwriting the contents of y;

vops_combine(alpha, x, beta, y)        -->  alpha*x + beta*y
vops_combine, dst, alpha, x, beta, y;  -->  dst = alpha*x + beta*y

computes alpha*x + beta*y efficiently for arrays x and y and scalar factors alpha and beta efficiently; if called with 5 arguments, vops_combine automatically redefines or re-uses the contents of dst.

Benefits

One of the benefits of using vops.i is the acceleration of computations (see "Performances"). Note that specific values of the factors alpha and beta (i.e., 0, 1 and -1) are handled specifically to reduce the numerical complexity. Another advantage is that single-precision computations can be performed if all arrays are of type float (whatever the types of the factors alpha and beta). Finally, operations involving array results may be performed in-place sparing allocating memory.

Performances

The following timings have been obtained for double precision floating-point (double) variables:

Description	Code	Power	Complexity
L1-norm	`sum(abs(x))`	1.7 Gflops	`2⋅n`
	`vops_norm1(x)`	21.5 Gflops
L2-norm	`sqrt(sum(x*x))`	1.9 Gflops	`2⋅n`
	`vops_norm2(x)`	24.8 Gflops
Inf-norm	`max(abs(x))`	2.7 Gflops	`2⋅n`
	`vops_norminf(x)`	20.5 Gflops
Inner product	`sum(x*y)`	1.9 Gflops	`2⋅n`
	`vops_inner(x,y)`	15.6 Gflops
Triple product	`sum(wxy)`	2.1 Gflops	`3⋅n`
	`vops_inner(x,x,y)`	16.0 Gflops
Scale	`x *= alpha`	1.6 Gflops	`n`
	`vops_scale,x,alpha`	3.4 Gflops
Update	`y += alpha*x`	2.1 Gflops	`2⋅n`
	`vops_update,y,alpha,x`	10.6 Gflops
Combine	`z = alphax + betay`	1.9 Gflops	`3⋅n`
	`z = vops_combine(alpha,x,beta,y)`	8.4 Gflops
	`vops_combine,z,alpha,x,beta,y`	13.5 Gflops

The "Power" is the number of floating-point operations per second which is computed by dividing the CPU time by the number of floating-point operations (given in column "Complexity" with n the number of elements). The code for benchmarking is in file vops-tests.i.

Speed-up may be up to a factor 13 thanks to SIMD instructions on a single core. Note the advantage of re-using existing storage in the last example of vops_combine (arrays all have 10,000 elements in the benchmark).

These results have been obtained on an AMD Ryzen Threadripper 2950X 16-core processor at 4.1 GHz with a plug-in compiled by Clang (version 10.0) with loop vectorization. Compiler and optimization flags have been configured by (see "Installation"):

./configure cc=clang copt='-O3 -mavx2 -mfma -ffast-math'

Installation

In short, building and installing the plug-in can be as quick as:

cd $BUILD_DIR
$SRC_DIR/configure
make
make install

where $BUILD_DIR is the build directory (at your convenience) and $SRC_DIR is the source directory of the plug-in code. The build and source directories can be the same in which case, call ./configure to configure for building.

If the plug-in has been properly installed, it is sufficient to use any of its functions to automatically load the plug-in. You may force the loading of the plug-in by something like:

#include "vops.i"

or

require, "vops.i";

in your code.

More detailled installation explanations are given below.

You must have Yorick installed on your machine.
Unpack the plug-in code somewhere.
Configure for compilation. There are two possibilities:

For an in-place build, go to the source directory of the plug-in code and run the configuration script:
```
cd SRC_DIR
./configure
```
To see the configuration options, type:
```
./configure --help
```
To compile in a different build directory, say $BUILD_DIR, create the build directory, go to the build directory, and run the configuration script:
```
mkdir -p $BUILD_DIR
cd $BUILD_DIR
$SRC_DIR/configure
```
where $SRC_DIR is the path to the source directory of the plug-in code. To see the configuration options, type:
```
$SRC_DIR/configure --help
```
Compile the code:
```
make clean
make
```
Install the plug-in in Yorick directories:
```
make install
```

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Makefile.in		Makefile.in
README.md		README.md
configure		configure
vops-start.i		vops-start.i
vops-tests.i		vops-tests.i
vops.i		vops.i
yor_vops.c		yor_vops.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vectorized operations for Yorick

Usage

Benefits

Performances

Installation

About

Releases

Packages

Languages

License

emmt/yor-vops

Folders and files

Latest commit

History

Repository files navigation

Vectorized operations for Yorick

Usage

Benefits

Performances

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages