Fastest CPU implementation of the LATCH 512-bit binary feature descriptor for computer vision (upright)
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore First commit! Sep 13, 2016
Makefile Added Linux Makefile. Sep 15, 2016
README.md Major SSE+AVX overhaul, speed boost. Oct 7, 2016
ULATCH.h Fixed compatibility issue. Oct 30, 2016
main.cpp Major SSE+AVX overhaul, speed boost. Oct 7, 2016
test.jpg First commit! Sep 13, 2016

README.md

Fastest CPU implementation of an UPRIGHT (no rotation) LATCH 512-bit binary feature descriptor as described in the 2015 paper by Levi and Hassner:

"LATCH: Learned Arrangements of Three Patch Codes" http://arxiv.org/abs/1501.03719

See also the ECCV 2016 Descriptor Workshop paper, of which I am a coauthor:

"The CUDA LATCH Binary Descriptor" http://arxiv.org/abs/1609.03986

And the original LATCH project's website: http://www.openu.ac.il/home/hassner/projects/LATCH/

See my GitHub for this CUDA version, which is extremely fast.

Note once again that this is an UPRIGHT LATCH, a.k.a. ULATCH. A fast rotation- and scale-invariant version is also available on my GitHub.

My implementation uses multithreading, SSE2/3/4/4.1, AVX, AVX2, and many many careful optimizations to implement the algorithm as described in the paper, but at great speed. This implementation outperforms the reference implementation by 800% single-threaded or 3200% multi-threaded (!) while exactly matching the reference implementation's output and capabilities in upright mode.

All functionality is contained in the file ULATCH.h. 'main.cpp' is simply a sample test harness with example usage and performance testing.