Skip to content

Conversation

@tlrx
Copy link
Member

@tlrx tlrx commented Nov 25, 2025

Instead of scanning all documents to find the first document that has a _tsid less than, or equal to, a given ordinal we can use a doc values skipper to skip as much as possible documents, and only then scan. the remaining docs.

When seeking a synthetic _id, we look up the _tsid ordinal, then use DV skipper to find a starting doc ID, then scan each doc to find the first doc ID matching the exact _tsid ordinal. Then we finally scan remaining docs to find the one matching the timestamp.

I wonder if that also makes sense to use DV skipper for the timestamp too?

Relates ES-13604

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @tlrx, I've created a changelog YAML for you.

@kkrik-es kkrik-es requested a review from romseygeek November 25, 2025 11:47
// _id terms over tombstones also work as if a regular _id field was present.
document.add(SortedDocValuesField.indexedField(TimeSeriesIdFieldMapper.NAME, extractTimeSeriesIdFromSyntheticId(uid)));
document.add(SortedNumericDocValuesField.indexedField("@timestamp", extractTimestampFromSyntheticId(uid)));
document.add(new SortedDocValuesField(TimeSeriesRoutingHashFieldMapper.NAME, extractRoutingHashBytesFromSyntheticId(uid)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be gated by an IndexVersion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it needs but it's better to be safe indeed. So I reverted the change which uses the USE_DOC_VALUES_SKIPPER index setting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 9ff33a7

}
skipper.advance(maxDocID + 1);
}
return skipper.minDocID(0);
Copy link
Contributor

@fcofdez fcofdez Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, if we don't find the tsIdOrd at level 0, this will return NO_MORE_DOCS? I think that I might be missing something here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ordinal is not found in the first level 0, then it skips to the next levels until it finds a level that includes the ordinal or exhaust the iterator, in which case the Javadoc indicates that minDocs returns NO_MORE_DOCS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that a DocValuesSkipper is kind of a skip list on top of docs values blocks of data.

If that helps, here is a representation of such skipper levels:

minValue: 0, maxValue: 0, [minDocID: 0, maxDocID: 31], docCount: 32, level: 0/3
minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718], docCount: 719,  level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733,  level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0,  level: 3/3
minValue: 1, maxValue: 1, [minDocID: 32, maxDocID: 178], docCount: 147, level: 0/3
minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718], docCount: 719,  level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733,  level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0,  level: 3/3
minValue: 2, maxValue: 2, [minDocID: 179, maxDocID: 269], docCount: 91, level: 0/3
minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718], docCount: 719,  level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733,  level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0,  level: 3/3
...
minValue: 8, maxValue: 8, [minDocID: 719, maxDocID: 765], docCount: 47, level: 0/3
minValue: 8, maxValue: 15, [minDocID: 719, maxDocID: 1440], docCount: 722,  level: 1/3
minValue: 0, maxValue: 64, [minDocID: 0, maxDocID: 4732], docCount: 4733,  level: 2/3
minValue: 0, maxValue: 0, [minDocID: -1, maxDocID: -1], docCount: 0,  level: 3/3
...

For example, when looking for tsIdOrd == 9 the advance(min, max) method executes:

  • the first level minValue: 0, maxValue: 0, [minDocID: 0, maxDocID: 31] has max value 0 below 9 so we can skip to maxDocID + 1 = 32
  • while there we can check if we can skip even more docs so we look up the next level 1 which is minValue: 0, maxValue: 7, [minDocID: 0, maxDocID: 718] which also has a max value 7 < 9 so we can in fact skip to maxDocID + 1 = 718 + 1 = 719
  • next level 2 has a max value of 64 so we cannot skip more
  • we advance the iterator to 719
  • our new level 0 is now minValue: 8, maxValue: 8, [minDocID: 719, maxDocID: 765], with max value of 8 we can skip all docs until 765 +1
  • while there we check if we can skip more in the next level 1, which is minValue: 8, maxValue: 15, [minDocID: 719, maxDocID: 1440] and has max value of 15, so tsIdOrd == 9 is between docs ids [766, 1440]
  • the while loop ends with minDocs(0) == 766

I hope it helps. It took me some time to understand all of this 🫠

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation, this makes sense 👍

@tlrx tlrx requested review from fcofdez and romseygeek November 26, 2025 14:40
assert skipper != null;

if (skipper.minValue() >= tsIdOrd) {
skipper.advance(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the skipper minValue is greater than your requested tsid then that means the tsid isn't present in the segment, so this should probably also return NO_MORE_DOCS? Or maybe trigger an assertion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I pushed b3fc3b9

Copy link
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tlrx tlrx merged commit 699eb2b into elastic:main Nov 28, 2025
34 checks passed
@tlrx tlrx deleted the 2025/11/25/ES-13604 branch November 28, 2025 08:47
@tlrx
Copy link
Member Author

tlrx commented Nov 28, 2025

Thanks Francisco & Alan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants