Ingest geoip processor cache 'no results' from the database #104092
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For my own future reference, here's some history of the geoip cache: it was originally added in #22231, rewritten for better performance in #33029, and tweaked for better performance in #80737 and #92372.
The maxmind
DatabaseReader
'stryCity
/tryCountry
/tryAsn
methods return an emptyOptional
if you ask about an ip address for which the database doesn't have a record, and we very quickly convert that into a null value (seeDatabaseReaderLazyLoader#getResponse
).Externally, the cache has two way logic: it either returns a response if there was one, or it returns null if there's nothing in the database for that ip address. That was true before this PR, and it remains true after this PR.
Internally, the cache used to have two way logic: it would A.) return a result from the cache if there was one, OR B.) it would hit the underlying database if it had either never seen the ip address before or if the ip address wasn't in the database.
After this pull request that logic changes to three way logic: it will A.) return a result from the cache if there is one, OR B.) return null 'from the cache' if there isn't a record of the ip address in the database, OR C.) hit the underlying database if it hasn't seen the ip address before.
Generally speaking, I don't expect the databases to have records for "private internets" (https://www.ietf.org/rfc/rfc1918.txt) so if you're feeding a lot of those sorts of ip addresses into the
geoip
processor, then this change will result in significantly better performance for you (since we'll cache 'no-result' rather than hitting the underlying database each time).In some ad-hoc testing on my machine, a cache hit is about an order of magnitude cheaper than hitting the actual database, so the more we can exercise the cache, the faster we'll be.