Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terms filter lookup caching should cache values, not filters. #9027

Closed
wants to merge 1 commit into from

Conversation

@jpountz
Copy link
Contributor

commented Dec 22, 2014

The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:

  • create index1 with _id indexed
  • run terms filter with caching, the parsed filter looks like _id: 1 OR _id: 2
  • remove index1
  • create index1 with _id not indexed
  • run terms filter without caching, the parsed filter is _uid: type#1 OR _uid: type#2 (the _id field mapper knows how to use the _uid field when _id is not indexed)
  • run terms filter with caching, the filter is fetched from the cache: _id: 1 OR _id: 2 but does not match anything since _id is not indexed.
Core: Terms filter lookup caching should cache values, not filters.
The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:
 - create index1 with _id indexed
 - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2`
 - remove index1
 - create index1 with _id not indexed
 - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed)
 - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed.
@martijnvg

This comment has been minimized.

Copy link
Member

commented Dec 22, 2014

LGTM

@jpountz jpountz closed this in 67eba23 Dec 24, 2014

@jpountz jpountz deleted the jpountz:fix/terms_lookup_caching branch Dec 24, 2014

jpountz added a commit that referenced this pull request Dec 24, 2014
Core: Terms filter lookup caching should cache values, not filters.
The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:
 - create index1 with _id indexed
 - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2`
 - remove index1
 - create index1 with _id not indexed
 - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed)
 - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed.

Close #9027
jpountz added a commit that referenced this pull request Dec 24, 2014
Core: Terms filter lookup caching should cache values, not filters.
The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:
 - create index1 with _id indexed
 - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2`
 - remove index1
 - create index1 with _id not indexed
 - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed)
 - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed.

Close #9027
jpountz added a commit that referenced this pull request Dec 24, 2014
Core: Terms filter lookup caching should cache values, not filters.
The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:
 - create index1 with _id indexed
 - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2`
 - remove index1
 - create index1 with _id not indexed
 - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed)
 - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed.

Close #9027
@lmenezes

This comment has been minimized.

Copy link
Contributor

commented Jan 22, 2015

@jpountz Probably too late to this conversation... but wouldn't it be possible to make this more flexible, choosing what to cache?

My particular use case is that I use lookup quite heavily, and the lists of terms are be quite big. Even though this is slow, it's only slow once...

This change will both increase memory usage on my side(keeping the list of terms) and also slow down things in general. I use a _cache_key which is hour based... So I only pay the full price of lookups once every hour, which is quite ok.

I can also see the relationship of this and:
#8573
#9176
#9056

but I still wonder, if it is possible to maintain the option of caching the result filter of a terms lookup filter.

thanks :)

@clintongormley clintongormley changed the title Core: Terms filter lookup caching should cache values, not filters. Terms filter lookup caching should cache values, not filters. Jun 7, 2015

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
Core: Terms filter lookup caching should cache values, not filters.
The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:
 - create index1 with _id indexed
 - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2`
 - remove index1
 - create index1 with _id not indexed
 - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed)
 - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed.

Close elastic#9027
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
Core: Terms filter lookup caching should cache values, not filters.
The terms filter lookup mechanism today caches filters. Because of this, the
cache values depend on two things: the values that can be found in the lookup
index AND the mapping of the local index, since changing the mapping can change
the way that the filter is parsed. We should make the cache depend solely on
the content of the lookup index.

For instance the issue I was seeing was due to the following scenario:
 - create index1 with _id indexed
 - run terms filter with lookup, the parsed filter looks like `_id: 1 OR _id: 2`
 - remove index1
 - create index1 with _id not indexed
 - run terms filter without lookup, the parsed filter is `_uid: type#1 OR _uid: type#2` (the _id field mapper knows how to use the _uid field when _id is not indexed)
 - run terms filter with lookup, the filter is fetched from the cache: `_id: 1 OR _id: 2` but does not match anything since `_id` is not indexed.

Close elastic#9027
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.