Performance: gridcache limit on coalesce #60

yhahn · 2016-12-14T04:00:11Z

Limits individual gridcache gets to

500000 entries and
100000 on coalescesingle without proximity/bbox

This optimization prevents extremely large gridcache shards (usually those for early degens, e.g. a) from having an outsized impact on performance. Since gridcache shards are presorted when stored this optimization largely excludes low scoring, low relevance features from consideration during coalesce.

These covers were definitely thrown out anyway by the final step of coalesce making this change have functionally equivalent handling of the spatialmatch process.

Note: These thresholds were tested with real data but are definitely heuristics. Example theoretical problem case: an example of gridcache shard that would push the boundaries of the 100k limit, for example, would be 11 textually matching features with 10k covers each (e.g. 11 Siberias). It's possible that one of the 11 features would be excluded.

cc @mapbox/geocoding-gang

…n very specific optimization circumstances

apendleton · 2016-12-14T18:03:06Z

I'm not sure how this plays with degenless... we should discuss. If we're scanning multiple keys that all share a prefix, the grids will be sorted by relev/score within each key, but not overall, so getting the top-scoring 500k grids is a complicated operation.

I'm probably gonna ignore this for now on the rocksdb branch and come back to it. Doing it right will probably involve some sort of n-way merge, though.

yhahn and others added 5 commits December 12, 2016 23:37

Add __getTruncated for parsing only first 40 elements in grid cache i…

fba792c

…n very specific optimization circumstances

Truncate all grid parsing to a generous 500k entries.

192ea79

0.14.0-top40-hits-dev1 [publish binary]

02918c8

Fix threshold

a73a035

0.14.0-top40-hits-dev2 [publish binary]

84d62f7

yhahn merged commit 97eab81 into master Dec 14, 2016

yhahn deleted the top40-hits branch December 14, 2016 04:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: gridcache limit on coalesce #60

Performance: gridcache limit on coalesce #60

yhahn commented Dec 14, 2016

apendleton commented Dec 14, 2016

Performance: gridcache limit on coalesce #60

Performance: gridcache limit on coalesce #60

Conversation

yhahn commented Dec 14, 2016

apendleton commented Dec 14, 2016