Skip to content
This repository has been archived by the owner on Oct 30, 2021. It is now read-only.

Performance: gridcache limit on coalesce #60

Merged
merged 5 commits into from
Dec 14, 2016
Merged

Performance: gridcache limit on coalesce #60

merged 5 commits into from
Dec 14, 2016

Conversation

yhahn
Copy link
Member

@yhahn yhahn commented Dec 14, 2016

Limits individual gridcache gets to

  • 500000 entries and
  • 100000 on coalescesingle without proximity/bbox

This optimization prevents extremely large gridcache shards (usually those for early degens, e.g. a) from having an outsized impact on performance. Since gridcache shards are presorted when stored this optimization largely excludes low scoring, low relevance features from consideration during coalesce.

These covers were definitely thrown out anyway by the final step of coalesce making this change have functionally equivalent handling of the spatialmatch process.

Note: These thresholds were tested with real data but are definitely heuristics. Example theoretical problem case: an example of gridcache shard that would push the boundaries of the 100k limit, for example, would be 11 textually matching features with 10k covers each (e.g. 11 Siberias). It's possible that one of the 11 features would be excluded.

cc @mapbox/geocoding-gang

@yhahn yhahn merged commit 97eab81 into master Dec 14, 2016
@yhahn yhahn deleted the top40-hits branch December 14, 2016 04:09
@apendleton
Copy link
Contributor

I'm not sure how this plays with degenless... we should discuss. If we're scanning multiple keys that all share a prefix, the grids will be sorted by relev/score within each key, but not overall, so getting the top-scoring 500k grids is a complicated operation.

I'm probably gonna ignore this for now on the rocksdb branch and come back to it. Doing it right will probably involve some sort of n-way merge, though.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants