ESQL: tighten parsed-footer cache after review feedback#149045
Conversation
Follow-up to elastic#149018 incorporating pragmatic review feedback. Three behavioural/structural changes plus two doc improvements. ParsedFooterCache - Lower DEFAULT_MAX_ENTRIES from 64 to 32. Worst case is now bounded closer to a few hundred MiB instead of multiple GiB when extremely wide files dominate the working set; typical workloads still sit at a few MiB total. - Replace 'intentionally conservative' wording with an explicit worst-case heap budget (column-chunk math, typical-vs-extreme split, weigher follow-up note) so the operational tradeoff is visible. - Extract rethrowStructural(ExecutionException) as a shared helper that reshapes Error/IOException/CircuitBreakingException/ElasticsearchException back to their original types. Format-specific RuntimeException wrapping stays at the call site since the two readers differ there. ParquetFormatReader - Use ParsedFooterCache.rethrowStructural in loadFooter; the format- specific newInvalidParquetFileException now wraps only what rethrow could not rethrow structurally. - Document why loadFooter is non-static (captured 'this' threads the per-instance CircuitBreakerByteBufferAllocator into the loader) and why that is harmless under the current shared-parent-breaker layout. OrcFormatReader - Use ParsedFooterCache.rethrowStructural in loadTail and rethrow RuntimeException directly; only non-structural causes get wrapped as IOException with a format-tagged message. - Document why the cold path deliberately parses the tail twice (the single-parse alternative would require re-implementing ORC's tail fetch protocol outside the library, fragile across versions). - Restrict setFileContext to the cache-miss branch in readRange, matching parquet style — no functional change but removes a redundant self-write on the hot per-producer path. No SPI changes. No behavioural change to the cache hit path. Developed using AI-assisted tooling
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Hi @costin, I've created a changelog YAML for you. |
🔍 Preview links for changed docs⏳ Building and deploying preview... View progress This comment will be updated with preview links when the build is complete. |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
Follow-up to elastic#149018 incorporating feedback. Three behavioural/structural changes plus two doc improvements. ParsedFooterCache - Lower DEFAULT_MAX_ENTRIES from 64 to 32. Worst case is now bounded closer to a few hundred MiB instead of multiple GiB when extremely wide files dominate the working set; typical workloads still sit at a few MiB total. - Replace 'intentionally conservative' wording with an explicit worst-case heap budget (column-chunk math, typical-vs-extreme split, weigher follow-up note) so the operational tradeoff is visible. - Extract rethrowStructural(ExecutionException) as a shared helper that reshapes Error/IOException/CircuitBreakingException/ElasticsearchException back to their original types. Format-specific RuntimeException wrapping stays at the call site since the two readers differ there. ParquetFormatReader - Use ParsedFooterCache.rethrowStructural in loadFooter; the format- specific newInvalidParquetFileException now wraps only what rethrow could not rethrow structurally. - Document why loadFooter is non-static (captured 'this' threads the per-instance CircuitBreakerByteBufferAllocator into the loader) and why that is harmless under the current shared-parent-breaker layout. OrcFormatReader - Use ParsedFooterCache.rethrowStructural in loadTail and rethrow RuntimeException directly; only non-structural causes get wrapped as IOException with a format-tagged message. - Document why the cold path deliberately parses the tail twice (the single-parse alternative would require re-implementing ORC's tail fetch protocol outside the library, fragile across versions). - Restrict setFileContext to the cache-miss branch in readRange, matching parquet style — no functional change but removes a redundant self-write on the hot per-producer path. No SPI changes. No behavioural change to the cache hit path. Developed using AI-assisted tooling
Follow-up to #149018 incorporating feedback. Three
behavioural/structural changes plus two doc improvements.
ParsedFooterCache
closer to a few hundred MiB instead of multiple GiB when extremely wide
files dominate the working set; typical workloads still sit at a few
MiB total.
worst-case heap budget (column-chunk math, typical-vs-extreme split,
weigher follow-up note) so the operational tradeoff is visible.
reshapes Error/IOException/CircuitBreakingException/ElasticsearchException
back to their original types. Format-specific RuntimeException wrapping
stays at the call site since the two readers differ there.
ParquetFormatReader
specific newInvalidParquetFileException now wraps only what rethrow
could not rethrow structurally.
per-instance CircuitBreakerByteBufferAllocator into the loader) and
why that is harmless under the current shared-parent-breaker layout.
OrcFormatReader
RuntimeException directly; only non-structural causes get wrapped as
IOException with a format-tagged message.
single-parse alternative would require re-implementing ORC's tail
fetch protocol outside the library, fragile across versions).
matching parquet style — no functional change but removes a redundant
self-write on the hot per-producer path.
No SPI changes. No behavioural change to the cache hit path.
Developed using AI-assisted tooling