Skip to content

Conversation

@parkertimmins
Copy link
Contributor

@parkertimmins parkertimmins commented Nov 13, 2025

Speed up bulk loading for bytes ref doc values. If doc values has dense docIds, and the queried docs are dense, copy the bytes for the adjacent values and use directly in the block loader.

@parkertimmins parkertimmins added >feature :StorageEngine/Mapping The storage related side of mappings labels Nov 13, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @parkertimmins, I've created a changelog YAML for you.

@parkertimmins parkertimmins marked this pull request as ready for review November 13, 2025 23:47
@parkertimmins parkertimmins requested a review from a team as a code owner November 13, 2025 23:47
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left minor comments and questions, LGTM otherwise.

@elasticsearchmachine
Copy link
Collaborator

Hi @parkertimmins, I've updated the changelog YAML for you.

}
}

{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martijnvg Could I get another review over these test changes. The structure is maybe a bit weird. The numeric dv test right above are sparse queries over sparse documents, but only where the set of docs in the query matches the set of docs in the underlying doc values.

This differs from the test I added below for binary doc values. It is only for sparse queries over dense docIds, since that is the only kind of sparcity which currently supports bulk loading for binary doc values.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

@parkertimmins parkertimmins merged commit e0fcab7 into elastic:main Nov 14, 2025
34 checks passed
@parkertimmins parkertimmins deleted the parker/bulk-load-binary-dv-uncompressed-singleton branch November 14, 2025 22:34
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Nov 16, 2025
* main: (135 commits)
  Mute org.elasticsearch.upgrades.IndexSortUpgradeIT testIndexSortForNumericTypes {upgradedNodes=1} elastic#138130
  Mute org.elasticsearch.upgrades.IndexSortUpgradeIT testIndexSortForNumericTypes {upgradedNodes=2} elastic#138129
  Mute org.elasticsearch.search.basic.SearchWithRandomDisconnectsIT testSearchWithRandomDisconnects elastic#138128
  [DiskBBQ] avoid EsAcceptDocs bug by calling cost before building iterator (elastic#138127)
  Log NOT_PREFERRED shard movements (elastic#138069)
  Improve bulk loading of binary doc values (elastic#137995)
  Add internal action for getting inference fields and inference results for those fields (elastic#137680)
  Address issue with DateFieldMapper#isFieldWithinQuery(...) (elastic#138032)
  WriteLoadConstraintDecider: Have separate rate limiting for canRemain and canAllocate decisions (elastic#138067)
  Adding NodeContext to TransportBroadcastByNodeAction (elastic#138057)
  Mute org.elasticsearch.simdvec.ESVectorUtilTests testSoarDistanceBulk elastic#138117
  Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeIT test elastic#137909
  Backport batched_response_might_include_reduction_failure version to 8.19 (elastic#138046)
  Add summary metrics for tdigest fields (elastic#137982)
  Add gp-llm-v2 model ID and inference endpoint (elastic#138045)
  Various tracing fixes (elastic#137908)
  [ML] Fixing KDE evaluate() to return correct ValueAndMagnitude object (elastic#128602)
  Mute org.elasticsearch.xpack.shutdown.NodeShutdownIT testStalledShardMigrationProperlyDetected elastic#115697
  [ML] Fix Flaky Audit Message Assertion in testWithDatastream for RegressionIT and ClassificationIT (elastic#138065)
  [ML] Fix Non-Deterministic Training Set Selection in RegressionIT testTwoJobsWithSameRandomizeSeedUseSameTrainingSet (elastic#138063)
  ...

# Conflicts:
#	rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search.vectors/200_dense_vector_docvalue_fields.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants