-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terms filter lookup caching should cache values, not filters. #9027
Conversation
The terms filter lookup mechanism today caches filters. Because of this, the cache values depend on two things: the values that can be found in the lookup index AND the mapping of the local index, since changing the mapping can change the way that the filter is parsed. We should make the cache depend solely on the content of the lookup index. For instance the issue I was seeing was due to the following scenario: - create index1 with _id indexed - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2` - remove index1 - create index1 with _id not indexed - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed) - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed.
LGTM |
The terms filter lookup mechanism today caches filters. Because of this, the cache values depend on two things: the values that can be found in the lookup index AND the mapping of the local index, since changing the mapping can change the way that the filter is parsed. We should make the cache depend solely on the content of the lookup index. For instance the issue I was seeing was due to the following scenario: - create index1 with _id indexed - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2` - remove index1 - create index1 with _id not indexed - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed) - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed. Close #9027
The terms filter lookup mechanism today caches filters. Because of this, the cache values depend on two things: the values that can be found in the lookup index AND the mapping of the local index, since changing the mapping can change the way that the filter is parsed. We should make the cache depend solely on the content of the lookup index. For instance the issue I was seeing was due to the following scenario: - create index1 with _id indexed - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2` - remove index1 - create index1 with _id not indexed - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed) - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed. Close #9027
The terms filter lookup mechanism today caches filters. Because of this, the cache values depend on two things: the values that can be found in the lookup index AND the mapping of the local index, since changing the mapping can change the way that the filter is parsed. We should make the cache depend solely on the content of the lookup index. For instance the issue I was seeing was due to the following scenario: - create index1 with _id indexed - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2` - remove index1 - create index1 with _id not indexed - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed) - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed. Close #9027
@jpountz Probably too late to this conversation... but wouldn't it be possible to make this more flexible, choosing what to cache? My particular use case is that I use lookup quite heavily, and the lists of terms are be quite big. Even though this is slow, it's only slow once... This change will both increase memory usage on my side(keeping the list of terms) and also slow down things in general. I use a _cache_key which is hour based... So I only pay the full price of lookups once every hour, which is quite ok. I can also see the relationship of this and: but I still wonder, if it is possible to maintain the option of caching the result filter of a terms lookup filter. thanks :) |
The terms filter lookup mechanism today caches filters. Because of this, the cache values depend on two things: the values that can be found in the lookup index AND the mapping of the local index, since changing the mapping can change the way that the filter is parsed. We should make the cache depend solely on the content of the lookup index. For instance the issue I was seeing was due to the following scenario: - create index1 with _id indexed - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2` - remove index1 - create index1 with _id not indexed - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed) - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed. Close elastic#9027
The terms filter lookup mechanism today caches filters. Because of this, the cache values depend on two things: the values that can be found in the lookup index AND the mapping of the local index, since changing the mapping can change the way that the filter is parsed. We should make the cache depend solely on the content of the lookup index. For instance the issue I was seeing was due to the following scenario: - create index1 with _id indexed - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2` - remove index1 - create index1 with _id not indexed - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed) - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed. Close elastic#9027
The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.
For instance the issue I was seeing was due to the following scenario:
_id: 1 OR _id: 2
_uid: type#1 OR _uid: type#2
(the _id field mapper knows how to use the _uid field when _id is not indexed)_id: 1 OR _id: 2
but does not match anything since_id
is not indexed.