Move features into tree for performance #3

vadixidav · 2019-04-20T23:06:53Z

The current performance is unacceptable. After profiling the code, it is clear that there are issues with cache locality. Here is the closure taking up 71.85% of all the time:

let lookup_distance = |index| (lookup(index) ^ feature).count_ones();

This performs a single trivial operation, so it looks innocuous, but it seems that the way hwt allows the user to store features outside of the tree is actually working against it, making it much slower than linear search (except maybe on huge data sets > 1B).

The API should be changed to give back features and not indices. The user can use their features to look up their data afterwards, which should have negligible performance impact since they only need to do a few lookups (k lookups for k-NN).

This has two negative impacts:

This could make the tree slightly larger.
This will make it harder for the user to store the data however they want.

Seeing as performance is the most important thing here, that takes precedence over these issues.

The text was updated successfully, but these errors were encountered:

abonander · 2019-04-20T23:56:00Z

What about identical features that refer to distinct items?

vadixidav · 2019-04-22T02:51:23Z

@abonander My bad for getting back a bit late, I was visiting the family for Easter. Currently, I plan to just have the user disambiguate that. It isn't clear what one might do if two features overlap. For instance, you might just throw one of them out if you don't expect overlap (like descriptors over a bunch of completely unique features). In other cases the user might map one feature to multiple entries, like in photogrammetry applications where you might want geometric verification filtering via RANSAC to match a repeating pattern. There would be a problem if two features overlapped and you were using the lowes ratio from the two features coming out of hwt, in which case you might miss the duplicate, but it could technically be checked when you get the feature back if it has duplicates (although at that point maybe you would just remove it from the tree).

Before it was possible to do this in the tree by using the u32 identifiers, but it wont be possible anymore since that really wasn't good for cache locality. Alternatively, I could add a helper wrapper over the tree that maps features back into a vector of values associated with them. For the time being I am just trying to do everything I can to boost the performance though, and I have learned that what I was doing did not appease the cache on my Intel processor. It seems obvious in retrospect, but I didn't realize that cache locality would have destroyed the performance like that. I am still trying to make more improvements right now to improve branch prediction and cache locality performance, after which I will make another release.

vadixidav · 2019-04-22T03:23:02Z

Actually, the way it is being coded now will allow for the lowes ratio to be computed still because the feature can show up twice in the tree, so it can be returned back twice.

vadixidav · 2019-04-29T01:35:48Z

This has been finished (along with other optimizations). Benchmarks are on the README.md now.

Unfortunately, after adding more tests I found that the tree broke down at further depths and so the fixed tree's performance only beats linear search once you reach over 1 million features. Keep in mind that the original paper was working with a linear search that took ~2.5 seconds, while I am finding that linear search on modern CPUs is a tiny ~80 milliseconds for the same dataset size. My profiling doesn't leave much room for optimization, but SIMD (using packed_simd) is definitely an option moving forwards (which would definitely speed up the relevant critical sections).

This issue has been addressed, but the performance is still lacking, so I intend to create a tracking issue for improving performance as a follow-up to this.

vadixidav closed this as completed Apr 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move features into tree for performance #3

Move features into tree for performance #3

vadixidav commented Apr 20, 2019

abonander commented Apr 20, 2019

vadixidav commented Apr 22, 2019

vadixidav commented Apr 22, 2019

vadixidav commented Apr 29, 2019

Move features into tree for performance #3

Move features into tree for performance #3

Comments

vadixidav commented Apr 20, 2019

abonander commented Apr 20, 2019

vadixidav commented Apr 22, 2019

vadixidav commented Apr 22, 2019

vadixidav commented Apr 29, 2019