engine-cuda / engine-opencl
engine-cuda is an engine for the popular OpenSSL cryptography framework. It enables the execution of different block ciphers on a GPU using CUDA and OpenCL. Currently, the following block ciphers are supported:
AES-128, AES-192, AES-256 (ECB, CBC decryption)
Blowfish (ECB, CBC decryption)
CAST5 (ECB, CBC decryption)
Camellia-128 (ECB, CBC decryption)
DES (ECB, CBC decryption)
IDEA (ECB, CBC decryption)
In order to build and use engine-cuda you will need
CUDA toolkit installed
patched OpenSSL (see
In order to generate a Makefile you have to call configure first.
You might have to supply configure with the path to your CUDA toolkit:
Some helpful autoconf flags:
--with-gpuarch=sm_xx (to make nvcc compile for a certain compute capability) --with-buffersize=<bytes> (to set the size of the staging buffer between CPU and GPU) --with-maxrregcount=<regs> (to set the maximum registers per kernel thread) --disable-libopencl (to disable building engine-opencl)
After that simply do
make make install
For any further information please refer to http://code.google.com/p/engine-cuda/
Further helpful autoconf switches for debugging and developing engine-cuda:
--enable-debug (enables some verbose output) --enable-timing (necessary to get kernel execution timing output) --enable-verbose (enables verbose compilation output including register use) --disable-aes-coarse (use fine-grained implementation of AES)
Benchmarks of engine-cuda, engine-opencl and the individual kernels can be performed using the files in the test/ subdirectory. For testing the whole engine, call
./test-speed.sh <avg_runs> <GPUONLY/CPUONLY> <ciphers/"all"> <engine>
in test/plots subdir. Plots using gnuplot will automatically generated. Tests with 5 iterations and for all block ciphers will take some time.
To test the raw kernel execution time, have a look at the
./generate-files.sh <device> (/dev/zero or /dev/urandom) ./test-kernels.sh <key> <ciphers>
scripts in test/. This will generate files using the device and run them through the kernels.
During development, use
./test-correctness.sh <cipher/"all"> <file> <key> <bufsize>
to compare the output of engine-cuda / engine_opencl with the CPU output in order to ensure functionality of the algorithm.
There are many (very strange) problems which can arise when using and developing CUDA and OpenCL. We have tried to identify these within the documentation/source, but you may stumble on other problems as well.
For CUDA, using the compute-exclusive mode removes the initial lag:
nvidia-smi -pm 1 nvidia-smi -c 1
to set persistence-mode and compute-exclusive mode
Since a few people asked me, you can donate here: http://pledgie.com/campaigns/18855