-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Use doc values skipper for _tsid in synthetic _id postings #138568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Relates ES-13604
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
Hi @tlrx, I've created a changelog YAML for you. |
server/src/main/java/org/elasticsearch/index/codec/tsdb/TSDBSyntheticIdFieldsProducer.java
Outdated
Show resolved
Hide resolved
| // _id terms over tombstones also work as if a regular _id field was present. | ||
| document.add(SortedDocValuesField.indexedField(TimeSeriesIdFieldMapper.NAME, extractTimeSeriesIdFromSyntheticId(uid))); | ||
| document.add(SortedNumericDocValuesField.indexedField("@timestamp", extractTimestampFromSyntheticId(uid))); | ||
| document.add(new SortedDocValuesField(TimeSeriesRoutingHashFieldMapper.NAME, extractRoutingHashBytesFromSyntheticId(uid))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be gated by an IndexVersion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it needs but it's better to be safe indeed. So I reverted the change which uses the USE_DOC_VALUES_SKIPPER index setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 9ff33a7
| } | ||
| skipper.advance(maxDocID + 1); | ||
| } | ||
| return skipper.minDocID(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, if we don't find the tsIdOrd at level 0, this will return NO_MORE_DOCS? I think that I might be missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the ordinal is not found in the first level 0, then it skips to the next levels until it finds a level that includes the ordinal or exhaust the iterator, in which case the Javadoc indicates that minDocs returns NO_MORE_DOCS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that a DocValuesSkipper is kind of a skip list on top of docs values blocks of data.
If that helps, here is a representation of such skipper levels:
minValue: 0, maxValue: 0, [minDocID: 0, maxDocID: 31], docCount: 32, level: 0/3
minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718], docCount: 719, level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733, level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0, level: 3/3
minValue: 1, maxValue: 1, [minDocID: 32, maxDocID: 178], docCount: 147, level: 0/3
minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718], docCount: 719, level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733, level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0, level: 3/3
minValue: 2, maxValue: 2, [minDocID: 179, maxDocID: 269], docCount: 91, level: 0/3
minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718], docCount: 719, level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733, level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0, level: 3/3
...
minValue: 8, maxValue: 8, [minDocID: 719, maxDocID: 765], docCount: 47, level: 0/3
minValue: 8, maxValue: 15, [minDocID: 719, maxDocID: 1440], docCount: 722, level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733, level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0, level: 3/3
...
For example, when looking for tsIdOrd == 9 the advance(min, max) method executes:
- the first level
minValue: 0, maxValue: 0, [minDocID: 0, maxDocID: 31]has max value0below9so we can skip tomaxDocID + 1 = 32 - while there we can check if we can skip even more docs so we look up the next level
1which isminValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718]which also has a max value7 < 9so we can in fact skip tomaxDocID + 1 = 718 + 1 = 719 - next level
2has a max value of64so we cannot skip more - we advance the iterator to 719
- our new level
0is nowminValue: 8, maxValue: 8, [minDocID: 719, maxDocID: 765], with max value of8we can skip all docs until765 +1 - while there we check if we can skip more in the next level
1, which isminValue: 8, maxValue: 15, [minDocID: 719, maxDocID: 1440]and has max value of15, sotsIdOrd == 9is between docs ids[766, 1440] - the while loop ends with
minDocs(0) == 766
I hope it helps. It took me some time to understand all of this 🫠
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation, this makes sense 👍
| assert skipper != null; | ||
|
|
||
| if (skipper.minValue() >= tsIdOrd) { | ||
| skipper.advance(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the skipper minValue is greater than your requested tsid then that means the tsid isn't present in the segment, so this should probably also return NO_MORE_DOCS? Or maybe trigger an assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I pushed b3fc3b9
fcofdez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks Francisco & Alan! |
Instead of scanning all documents to find the first document that has a
_tsidless than, or equal to, a given ordinal we can use a doc values skipper to skip as much as possible documents, and only then scan. the remaining docs.When seeking a synthetic _id, we look up the
_tsidordinal, then use DV skipper to find a starting doc ID, then scan each doc to find the first doc ID matching the exact_tsidordinal. Then we finally scan remaining docs to find the one matching the timestamp.I wonder if that also makes sense to use DV skipper for the timestamp too?
Relates ES-13604