Fastest CPU implementation of the LATCH 512-bit binary feature descriptor; fully scale- and rotation-invariant
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Fastest implementation of the fully scale- and rotation-invariant LATCH 512-bit binary feature descriptor as described in the 2015 paper by Levi and Hassner:

"LATCH: Learned Arrangements of Three Patch Codes"

See also the ECCV 2016 Descriptor Workshop paper, of which I am a coauthor:

"The CUDA LATCH Binary Descriptor"

And the original LATCH project's website:

See my GitHub for the CUDA version, which is extremely fast.

My implementation uses multithreading, SSE2/3/4/4.1, AVX, AVX2, and many many careful optimizations to implement the algorithm as described in the paper, but at great speed. This implementation outperforms the reference implementation by 800% single-threaded or 3200% multi-threaded (!) while exactly matching the reference implementation's output and capabilities.

If you do not have AVX2, uncomment the '#define NO_AVX_PLEASE' in LATCH.h to route the code through SSE isntructions only. NOTE THAT THIS IS ABOUT 50% SLOWER. A processor with full AVX2 support is highly recommended.

All functionality is contained in the file LATCH.h. This file is simply a sample test harness with example usage and performance testing.