Insanely fast CUDA LATCH: fully scale- and rotation-invariant 512-bit binary descriptor for computer vision
Clone or download
Kareem Omar
Kareem Omar Slight performance boos
t via intermediate float stage.
Latest commit 3a6d52d Jan 16, 2017
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore First commit! Sep 14, 2016 Slight performance boos Jan 16, 2017
CLATCH.h Updated docs. Oct 14, 2016
Makefile Updated for CUDA 8. Oct 7, 2016 First commit! Sep 14, 2016
main.cpp Updated docs. Oct 14, 2016
test.jpg First commit! Sep 14, 2016

Fastest implementation of the fully scale- and rotation-invariant LATCH 512-bit binary feature descriptor as described in the 2015 paper by Levi and Hassner:

"LATCH: Learned Arrangements of Three Patch Codes"

See also the ECCV 2016 Descriptor Workshop paper, of which I am a coauthor:

"The CUDA LATCH Binary Descriptor"

And the original LATCH project's website:

This implementation is insanely fast, matching or beating the much simpler ORB descriptor despite outputting twice as many bits AND being a superior descriptor.

A key insight responsible for much of the performance of this laboriously crafted CUDA kernel is due to Christopher Parker (, to whom I am extremely grateful.

CUDA CC 3.0 or higher is required.

All functionality is contained in the files CLATCH.h and 'main.cpp' is simply a sample test harness with example usage and performance testing.