-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Improve bulk loading of binary doc values #137995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve bulk loading of binary doc values #137995
Conversation
|
Hi @parkertimmins, I've created a changelog YAML for you. |
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
martijnvg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left minor comments and questions, LGTM otherwise.
server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java
Outdated
Show resolved
Hide resolved
...ql/compute/src/main/java/org/elasticsearch/compute/lucene/read/SingletonBytesRefBuilder.java
Outdated
Show resolved
Hide resolved
|
Hi @parkertimmins, I've updated the changelog YAML for you. |
...er/src/test/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesFormatTests.java
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martijnvg Could I get another review over these test changes. The structure is maybe a bit weird. The numeric dv test right above are sparse queries over sparse documents, but only where the set of docs in the query matches the set of docs in the underlying doc values.
This differs from the test I added below for binary doc values. It is only for sparse queries over dense docIds, since that is the only kind of sparcity which currently supports bulk loading for binary doc values.
martijnvg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2
* main: (135 commits)
Mute org.elasticsearch.upgrades.IndexSortUpgradeIT testIndexSortForNumericTypes {upgradedNodes=1} elastic#138130
Mute org.elasticsearch.upgrades.IndexSortUpgradeIT testIndexSortForNumericTypes {upgradedNodes=2} elastic#138129
Mute org.elasticsearch.search.basic.SearchWithRandomDisconnectsIT testSearchWithRandomDisconnects elastic#138128
[DiskBBQ] avoid EsAcceptDocs bug by calling cost before building iterator (elastic#138127)
Log NOT_PREFERRED shard movements (elastic#138069)
Improve bulk loading of binary doc values (elastic#137995)
Add internal action for getting inference fields and inference results for those fields (elastic#137680)
Address issue with DateFieldMapper#isFieldWithinQuery(...) (elastic#138032)
WriteLoadConstraintDecider: Have separate rate limiting for canRemain and canAllocate decisions (elastic#138067)
Adding NodeContext to TransportBroadcastByNodeAction (elastic#138057)
Mute org.elasticsearch.simdvec.ESVectorUtilTests testSoarDistanceBulk elastic#138117
Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeIT test elastic#137909
Backport batched_response_might_include_reduction_failure version to 8.19 (elastic#138046)
Add summary metrics for tdigest fields (elastic#137982)
Add gp-llm-v2 model ID and inference endpoint (elastic#138045)
Various tracing fixes (elastic#137908)
[ML] Fixing KDE evaluate() to return correct ValueAndMagnitude object (elastic#128602)
Mute org.elasticsearch.xpack.shutdown.NodeShutdownIT testStalledShardMigrationProperlyDetected elastic#115697
[ML] Fix Flaky Audit Message Assertion in testWithDatastream for RegressionIT and ClassificationIT (elastic#138065)
[ML] Fix Non-Deterministic Training Set Selection in RegressionIT testTwoJobsWithSameRandomizeSeedUseSameTrainingSet (elastic#138063)
...
# Conflicts:
# rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.vectors/200_dense_vector_docvalue_fields.yml
Speed up bulk loading for bytes ref doc values. If doc values has dense docIds, and the queried docs are dense, copy the bytes for the adjacent values and use directly in the block loader.