feat: Lazy per-column I/O for complex columns in Nimble (#677) by prashantgolash · Pull Request #677 · facebookincubator/nimble

prashantgolash · 2026-04-27T05:30:21Z

Summary:
X-link: facebookincubator/velox#17350

Today, the Nimble selective reader loads all column streams upfront during stripe init — including columns wrapped in LazyVectors. The lazy contract only defers decoding; the underlying I/O is still eager. When a high-selectivity remaining filter eliminates most rows, the eagerly-loaded data for output-only columns is never decoded — but the I/O cost was already paid.

This diff extends laziness from decoding to I/O. Complex lazy columns (MAP/ARRAY/ROW) without pushed-down filters get their streams enqueued into a per-column cloned BufferedInput, loaded only on first downstream access. If the filter eliminates all rows in a stripe, the deferred column's load() is never called — zero I/O for that column in that stripe.

How it works:

During column reader construction, qualifying columns have their streams enqueued into a cloned BufferedInput instead of the shared one.
The shared input is loaded during stripe init (eager columns only).
Each deferred column's clone is loaded independently via ColumnLoader when the LazyVector is first accessed.
Batch size estimation uses totalStreamBytes (compressed stream sizes from tablet metadata) for deferred columns since their decoders are not yet loaded.

Gated behind the lazy_column_io session property (default off).

Detailed analysis (naming changes, per-column vs shared clone tradeoff, code flow, shadow data): P2302893230

Reviewed By: HuamengJiang

Differential Revision: D100277342

meta-codesync · 2026-04-27T05:30:29Z

@prashantgolash has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100277342.

…ebookincubator#17350) Summary: X-link: facebookincubator/nimble#677 FlatMap columns (e.g. sparse_features) store each map key as separate streams — often hundreds of streams totaling GBs per stripe. Today, all streams are loaded eagerly during stripe setup, even for columns wrapped in LazyVectors. When a high-selectivity filter on a sibling column (e.g. element_at(pipeline_labels, key) IS NOT NULL with 99.98% selectivity) eliminates most rows, the FlatMap data is loaded but never used. This diff implements per-column deferred I/O, gated behind the defer_flatmap_io session property (default off). ## How it works **Before (eager):** All streams are loaded in one batched I/O during stripe setup. FlatMap data sits in memory even if the filter eliminates every row. **After (deferred):** Each qualifying FlatMap column gets its own cloned BufferedInput. Its streams are enqueued but not loaded during stripe setup. On first lazy access, DeferredInput::load() issues a single batched I/O for all of that column's streams. If the filter eliminates all rows in a stripe, the load is never triggered — zero I/O for that column. ## What qualifies for deferral A column is deferred when all of these are true: - defer_flatmap_io session property is enabled - Column is a top-level child of the root struct (eligible for LazyVector) - At least one sibling has a pushed-down filter - The column itself has no filter and is projected - The column is a complex type (MAP, ARRAY, or ROW) ## Batch size estimation Deferred columns' decoders are not loaded, so estimateMaterializedSize() cannot query them. Without handling this, the estimate fails and falls back to 1MB per row (tiny batches, massive overhead). The fix: skip deferred children and use their totalStreamBytes (compressed stream sizes from tablet metadata) as an approximation. When file-level vectorized stats exist, this code path is never reached — stats-based estimation (Tier 1) wins outright. ## Why per-column clones (not a shared clone) Each deferred column gets its own cloned BufferedInput rather than sharing one clone across all deferred columns. A shared clone would preserve cross-column coalescing but has a critical flaw: when the remaining filter accesses one deferred column (e.g. pipeline_labels for element_at), the shared load() triggers I/O for ALL deferred columns — including output-only columns (e.g. sparse_features) that may never be needed if the remaining filter eliminates all rows. Per-column clones load each column independently at the right time: - pipeline_labels loads when the remaining filter accesses it - sparse_features loads only when serialization needs it (after the remaining filter) - If the remaining filter eliminates all rows, sparse_features is never loaded Production validation confirmed: shared clone showed no I/O reduction (46TB vs 46TB), while per-column clones reduced storageRead from 46TB to 6TB (7.5x reduction). ## Usage SET SESSION hive.native_defer_flatmap_io = true; Differential Revision: D100277342

Summary: X-link: facebookincubator/nimble#677 FlatMap columns (e.g. sparse_features) store each map key as separate streams — often hundreds of streams totaling GBs per stripe. Today, all streams are loaded eagerly during stripe setup, even for columns wrapped in LazyVectors. When a high-selectivity filter on a sibling column (e.g. element_at(pipeline_labels, key) IS NOT NULL with 99.98% selectivity) eliminates most rows, the FlatMap data is loaded but never used. This diff implements per-column deferred I/O, gated behind the defer_flatmap_io session property (default off). ## How it works **Before (eager):** All streams are loaded in one batched I/O during stripe setup. FlatMap data sits in memory even if the filter eliminates every row. **After (deferred):** Each qualifying FlatMap column gets its own cloned BufferedInput. Its streams are enqueued but not loaded during stripe setup. On first lazy access, DeferredInput::load() issues a single batched I/O for all of that column's streams. If the filter eliminates all rows in a stripe, the load is never triggered — zero I/O for that column. ## What qualifies for deferral A column is deferred when all of these are true: - defer_flatmap_io session property is enabled - Column is a top-level child of the root struct (eligible for LazyVector) - At least one sibling has a pushed-down filter - The column itself has no filter and is projected - The column is a complex type (MAP, ARRAY, or ROW) ## Batch size estimation Deferred columns' decoders are not loaded, so estimateMaterializedSize() cannot query them. Without handling this, the estimate fails and falls back to 1MB per row (tiny batches, massive overhead). The fix: skip deferred children and use their totalStreamBytes (compressed stream sizes from tablet metadata) as an approximation. When file-level vectorized stats exist, this code path is never reached — stats-based estimation (Tier 1) wins outright. ## Why per-column clones (not a shared clone) Each deferred column gets its own cloned BufferedInput rather than sharing one clone across all deferred columns. A shared clone would preserve cross-column coalescing but has a critical flaw: when the remaining filter accesses one deferred column (e.g. pipeline_labels for element_at), the shared load() triggers I/O for ALL deferred columns — including output-only columns (e.g. sparse_features) that may never be needed if the remaining filter eliminates all rows. Per-column clones load each column independently at the right time: - pipeline_labels loads when the remaining filter accesses it - sparse_features loads only when serialization needs it (after the remaining filter) - If the remaining filter eliminates all rows, sparse_features is never loaded Production validation confirmed: shared clone showed no I/O reduction (46TB vs 46TB), while per-column clones reduced storageRead from 46TB to 6TB (7.5x reduction). ## Usage SET SESSION hive.native_defer_flatmap_io = true; Differential Revision: D100277342

…ebookincubator#677) Summary: FlatMap columns (e.g. sparse_features) store each map key as separate streams — often hundreds of streams totaling GBs per stripe. Today, all streams are loaded eagerly during stripe setup, even for columns wrapped in LazyVectors. When a high-selectivity filter on a sibling column (e.g. element_at(pipeline_labels, key) IS NOT NULL with 99.98% selectivity) eliminates most rows, the FlatMap data is loaded but never used. This diff implements per-column deferred I/O, gated behind the defer_flatmap_io session property (default off). ## How it works **Before (eager):** All streams are loaded in one batched I/O during stripe setup. FlatMap data sits in memory even if the filter eliminates every row. **After (deferred):** Each qualifying FlatMap column gets its own cloned BufferedInput. Its streams are enqueued but not loaded during stripe setup. On first lazy access, DeferredInput::load() issues a single batched I/O for all of that column's streams. If the filter eliminates all rows in a stripe, the load is never triggered — zero I/O for that column. ## What qualifies for deferral A column is deferred when all of these are true: - defer_flatmap_io session property is enabled - Column is a top-level child of the root struct (eligible for LazyVector) - At least one sibling has a pushed-down filter - The column itself has no filter and is projected - The column is a complex type (MAP, ARRAY, or ROW) ## Batch size estimation Deferred columns' decoders are not loaded, so estimateMaterializedSize() cannot query them. Without handling this, the estimate fails and falls back to 1MB per row (tiny batches, massive overhead). The fix: skip deferred children and use their totalStreamBytes (compressed stream sizes from tablet metadata) as an approximation. When file-level vectorized stats exist, this code path is never reached — stats-based estimation (Tier 1) wins outright. ## Why per-column clones (not a shared clone) Each deferred column gets its own cloned BufferedInput rather than sharing one clone across all deferred columns. A shared clone would preserve cross-column coalescing but has a critical flaw: when the remaining filter accesses one deferred column (e.g. pipeline_labels for element_at), the shared load() triggers I/O for ALL deferred columns — including output-only columns (e.g. sparse_features) that may never be needed if the remaining filter eliminates all rows. Per-column clones load each column independently at the right time: - pipeline_labels loads when the remaining filter accesses it - sparse_features loads only when serialization needs it (after the remaining filter) - If the remaining filter eliminates all rows, sparse_features is never loaded Production validation confirmed: shared clone showed no I/O reduction (46TB vs 46TB), while per-column clones reduced storageRead from 46TB to 6TB (7.5x reduction). ## Usage SET SESSION hive.native_defer_flatmap_io = true; Differential Revision: D100277342

…ebookincubator#17350) Summary: X-link: facebookincubator/nimble#677 FlatMap columns (e.g. sparse_features) store each map key as separate streams — often hundreds of streams totaling GBs per stripe. Today, all streams are loaded eagerly during stripe setup, even for columns wrapped in LazyVectors. When a high-selectivity filter on a sibling column (e.g. element_at(pipeline_labels, key) IS NOT NULL with 99.98% selectivity) eliminates most rows, the FlatMap data is loaded but never used. This diff implements per-column deferred I/O, gated behind the defer_flatmap_io session property (default off). ## How it works **Before (eager):** All streams are loaded in one batched I/O during stripe setup. FlatMap data sits in memory even if the filter eliminates every row. **After (deferred):** Each qualifying FlatMap column gets its own cloned BufferedInput. Its streams are enqueued but not loaded during stripe setup. On first lazy access, DeferredInput::load() issues a single batched I/O for all of that column's streams. If the filter eliminates all rows in a stripe, the load is never triggered — zero I/O for that column. ## What qualifies for deferral A column is deferred when all of these are true: - defer_flatmap_io session property is enabled - Column is a top-level child of the root struct (eligible for LazyVector) - At least one sibling has a pushed-down filter - The column itself has no filter and is projected - The column is a complex type (MAP, ARRAY, or ROW) ## Batch size estimation Deferred columns' decoders are not loaded, so estimateMaterializedSize() cannot query them. Without handling this, the estimate fails and falls back to 1MB per row (tiny batches, massive overhead). The fix: skip deferred children and use their totalStreamBytes (compressed stream sizes from tablet metadata) as an approximation. When file-level vectorized stats exist, this code path is never reached — stats-based estimation (Tier 1) wins outright. ## Why per-column clones (not a shared clone) Each deferred column gets its own cloned BufferedInput rather than sharing one clone across all deferred columns. A shared clone would preserve cross-column coalescing but has a critical flaw: when the remaining filter accesses one deferred column (e.g. pipeline_labels for element_at), the shared load() triggers I/O for ALL deferred columns — including output-only columns (e.g. sparse_features) that may never be needed if the remaining filter eliminates all rows. Per-column clones load each column independently at the right time: - pipeline_labels loads when the remaining filter accesses it - sparse_features loads only when serialization needs it (after the remaining filter) - If the remaining filter eliminates all rows, sparse_features is never loaded Production validation confirmed: shared clone showed no I/O reduction (46TB vs 46TB), while per-column clones reduced storageRead from 46TB to 6TB (7.5x reduction). ## Usage SET SESSION hive.native_defer_flatmap_io = true; Differential Revision: D100277342

…ebookincubator#677) Summary: X-link: facebookincubator/velox#17350 FlatMap columns (e.g. sparse_features) store each map key as separate streams — often hundreds of streams totaling GBs per stripe. Today, all streams are loaded eagerly during stripe setup, even for columns wrapped in LazyVectors. When a high-selectivity filter on a sibling column (e.g. element_at(pipeline_labels, key) IS NOT NULL with 99.98% selectivity) eliminates most rows, the FlatMap data is loaded but never used. This diff implements per-column deferred I/O, gated behind the defer_flatmap_io session property (default off). ## How it works **Before (eager):** All streams are loaded in one batched I/O during stripe setup. FlatMap data sits in memory even if the filter eliminates every row. **After (deferred):** Each qualifying FlatMap column gets its own cloned BufferedInput. Its streams are enqueued but not loaded during stripe setup. On first lazy access, DeferredInput::load() issues a single batched I/O for all of that column's streams. If the filter eliminates all rows in a stripe, the load is never triggered — zero I/O for that column. ## What qualifies for deferral A column is deferred when all of these are true: - defer_flatmap_io session property is enabled - Column is a top-level child of the root struct (eligible for LazyVector) - At least one sibling has a pushed-down filter - The column itself has no filter and is projected - The column is a complex type (MAP, ARRAY, or ROW) ## Batch size estimation Deferred columns' decoders are not loaded, so estimateMaterializedSize() cannot query them. Without handling this, the estimate fails and falls back to 1MB per row (tiny batches, massive overhead). The fix: skip deferred children and use their totalStreamBytes (compressed stream sizes from tablet metadata) as an approximation. When file-level vectorized stats exist, this code path is never reached — stats-based estimation (Tier 1) wins outright. ## Why per-column clones (not a shared clone) Each deferred column gets its own cloned BufferedInput rather than sharing one clone across all deferred columns. A shared clone would preserve cross-column coalescing but has a critical flaw: when the remaining filter accesses one deferred column (e.g. pipeline_labels for element_at), the shared load() triggers I/O for ALL deferred columns — including output-only columns (e.g. sparse_features) that may never be needed if the remaining filter eliminates all rows. Per-column clones load each column independently at the right time: - pipeline_labels loads when the remaining filter accesses it - sparse_features loads only when serialization needs it (after the remaining filter) - If the remaining filter eliminates all rows, sparse_features is never loaded Production validation confirmed: shared clone showed no I/O reduction (46TB vs 46TB), while per-column clones reduced storageRead from 46TB to 6TB (7.5x reduction). ## Usage SET SESSION hive.native_defer_flatmap_io = true; Differential Revision: D100277342

…ebookincubator#17350) Summary: X-link: facebookincubator/nimble#677 FlatMap columns (e.g. sparse_features) store each map key as separate streams — often hundreds of streams totaling GBs per stripe. Today, all streams are loaded eagerly during stripe setup, even for columns wrapped in LazyVectors. When a high-selectivity filter on a sibling column (e.g. element_at(pipeline_labels, key) IS NOT NULL with 99.98% selectivity) eliminates most rows, the FlatMap data is loaded but never used. This diff implements per-column deferred I/O, gated behind the defer_flatmap_io session property (default off). ## How it works **Before (eager):** All streams are loaded in one batched I/O during stripe setup. FlatMap data sits in memory even if the filter eliminates every row. **After (deferred):** Each qualifying FlatMap column gets its own cloned BufferedInput. Its streams are enqueued but not loaded during stripe setup. On first lazy access, DeferredInput::load() issues a single batched I/O for all of that column's streams. If the filter eliminates all rows in a stripe, the load is never triggered — zero I/O for that column. ## What qualifies for deferral A column is deferred when all of these are true: - defer_flatmap_io session property is enabled - Column is a top-level child of the root struct (eligible for LazyVector) - At least one sibling has a pushed-down filter - The column itself has no filter and is projected - The column is a complex type (MAP, ARRAY, or ROW) ## Batch size estimation Deferred columns' decoders are not loaded, so estimateMaterializedSize() cannot query them. Without handling this, the estimate fails and falls back to 1MB per row (tiny batches, massive overhead). The fix: skip deferred children and use their totalStreamBytes (compressed stream sizes from tablet metadata) as an approximation. When file-level vectorized stats exist, this code path is never reached — stats-based estimation (Tier 1) wins outright. ## Why per-column clones (not a shared clone) Each deferred column gets its own cloned BufferedInput rather than sharing one clone across all deferred columns. A shared clone would preserve cross-column coalescing but has a critical flaw: when the remaining filter accesses one deferred column (e.g. pipeline_labels for element_at), the shared load() triggers I/O for ALL deferred columns — including output-only columns (e.g. sparse_features) that may never be needed if the remaining filter eliminates all rows. Per-column clones load each column independently at the right time: - pipeline_labels loads when the remaining filter accesses it - sparse_features loads only when serialization needs it (after the remaining filter) - If the remaining filter eliminates all rows, sparse_features is never loaded Production validation confirmed: shared clone showed no I/O reduction (46TB vs 46TB), while per-column clones reduced storageRead from 46TB to 6TB (7.5x reduction). ## Usage SET SESSION hive.native_defer_flatmap_io = true; Differential Revision: D100277342

…ebookincubator#677) Summary: X-link: facebookincubator/velox#17350 FlatMap columns (e.g. sparse_features) store each map key as separate streams — often hundreds of streams totaling GBs per stripe. Today, all streams are loaded eagerly during stripe setup, even for columns wrapped in LazyVectors. When a high-selectivity filter on a sibling column (e.g. element_at(pipeline_labels, key) IS NOT NULL with 99.98% selectivity) eliminates most rows, the FlatMap data is loaded but never used. This diff implements per-column deferred I/O, gated behind the defer_flatmap_io session property (default off). ## How it works **Before (eager):** All streams are loaded in one batched I/O during stripe setup. FlatMap data sits in memory even if the filter eliminates every row. **After (deferred):** Each qualifying FlatMap column gets its own cloned BufferedInput. Its streams are enqueued but not loaded during stripe setup. On first lazy access, DeferredInput::load() issues a single batched I/O for all of that column's streams. If the filter eliminates all rows in a stripe, the load is never triggered — zero I/O for that column. ## What qualifies for deferral A column is deferred when all of these are true: - defer_flatmap_io session property is enabled - Column is a top-level child of the root struct (eligible for LazyVector) - At least one sibling has a pushed-down filter - The column itself has no filter and is projected - The column is a complex type (MAP, ARRAY, or ROW) ## Batch size estimation Deferred columns' decoders are not loaded, so estimateMaterializedSize() cannot query them. Without handling this, the estimate fails and falls back to 1MB per row (tiny batches, massive overhead). The fix: skip deferred children and use their totalStreamBytes (compressed stream sizes from tablet metadata) as an approximation. When file-level vectorized stats exist, this code path is never reached — stats-based estimation (Tier 1) wins outright. ## Why per-column clones (not a shared clone) Each deferred column gets its own cloned BufferedInput rather than sharing one clone across all deferred columns. A shared clone would preserve cross-column coalescing but has a critical flaw: when the remaining filter accesses one deferred column (e.g. pipeline_labels for element_at), the shared load() triggers I/O for ALL deferred columns — including output-only columns (e.g. sparse_features) that may never be needed if the remaining filter eliminates all rows. Per-column clones load each column independently at the right time: - pipeline_labels loads when the remaining filter accesses it - sparse_features loads only when serialization needs it (after the remaining filter) - If the remaining filter eliminates all rows, sparse_features is never loaded Production validation confirmed: shared clone showed no I/O reduction (46TB vs 46TB), while per-column clones reduced storageRead from 46TB to 6TB (7.5x reduction). ## Usage SET SESSION hive.native_defer_flatmap_io = true; Differential Revision: D100277342

Summary: X-link: facebookincubator/nimble#677 Today, the Nimble selective reader loads all column streams upfront during stripe init — including columns wrapped in LazyVectors. The lazy contract only defers decoding; the underlying I/O is still eager. When a high-selectivity remaining filter eliminates most rows, the eagerly-loaded data for output-only columns is never decoded — but the I/O cost was already paid. This diff extends laziness from decoding to I/O. Complex lazy columns (MAP/ARRAY/ROW) without pushed-down filters get their streams enqueued into a per-column cloned BufferedInput, loaded only on first downstream access. If the filter eliminates all rows in a stripe, the deferred column's load() is never called — zero I/O for that column in that stripe. How it works: - During column reader construction, qualifying columns have their streams enqueued into a cloned BufferedInput instead of the shared one. - The shared input is loaded during stripe init (eager columns only). - Each deferred column's clone is loaded independently via ColumnLoader when the LazyVector is first accessed. - Batch size estimation uses totalStreamBytes (compressed stream sizes from tablet metadata) for deferred columns since their decoders are not yet loaded. Gated behind the `lazy_column_io` session property (default off). Detailed analysis (naming changes, per-column vs shared clone tradeoff, code flow, shadow data): P2297779682 Differential Revision: D100277342

…bator#17350) Summary: X-link: facebookincubator/nimble#677 Today, the Nimble selective reader loads all column streams upfront during stripe init — including columns wrapped in LazyVectors. The lazy contract only defers decoding; the underlying I/O is still eager. When a high-selectivity remaining filter eliminates most rows, the eagerly-loaded data for output-only columns is never decoded — but the I/O cost was already paid. This diff extends laziness from decoding to I/O. Complex lazy columns (MAP/ARRAY/ROW) without pushed-down filters get their streams enqueued into a per-column cloned BufferedInput, loaded only on first downstream access. If the filter eliminates all rows in a stripe, the deferred column's load() is never called — zero I/O for that column in that stripe. How it works: - During column reader construction, qualifying columns have their streams enqueued into a cloned BufferedInput instead of the shared one. - The shared input is loaded during stripe init (eager columns only). - Each deferred column's clone is loaded independently via ColumnLoader when the LazyVector is first accessed. - Batch size estimation uses totalStreamBytes (compressed stream sizes from tablet metadata) for deferred columns since their decoders are not yet loaded. Gated behind the `lazy_column_io` session property (default off). Detailed analysis (naming changes, per-column vs shared clone tradeoff, code flow, shadow data): P2302893230 Differential Revision: D100277342

…bator#677) Summary: X-link: facebookincubator/velox#17350 Today, the Nimble selective reader loads all column streams upfront during stripe init — including columns wrapped in LazyVectors. The lazy contract only defers decoding; the underlying I/O is still eager. When a high-selectivity remaining filter eliminates most rows, the eagerly-loaded data for output-only columns is never decoded — but the I/O cost was already paid. This diff extends laziness from decoding to I/O. Complex lazy columns (MAP/ARRAY/ROW) without pushed-down filters get their streams enqueued into a per-column cloned BufferedInput, loaded only on first downstream access. If the filter eliminates all rows in a stripe, the deferred column's load() is never called — zero I/O for that column in that stripe. How it works: - During column reader construction, qualifying columns have their streams enqueued into a cloned BufferedInput instead of the shared one. - The shared input is loaded during stripe init (eager columns only). - Each deferred column's clone is loaded independently via ColumnLoader when the LazyVector is first accessed. - Batch size estimation uses totalStreamBytes (compressed stream sizes from tablet metadata) for deferred columns since their decoders are not yet loaded. Gated behind the `lazy_column_io` session property (default off). Detailed analysis (naming changes, per-column vs shared clone tradeoff, code flow, shadow data): P2302893230 Differential Revision: D100277342

…bator#17350) Summary: Pull Request resolved: facebookincubator#17350 X-link: facebookincubator/nimble#677 Today, the Nimble selective reader loads all column streams upfront during stripe init — including columns wrapped in LazyVectors. The lazy contract only defers decoding; the underlying I/O is still eager. When a high-selectivity remaining filter eliminates most rows, the eagerly-loaded data for output-only columns is never decoded — but the I/O cost was already paid. This diff extends laziness from decoding to I/O. Complex lazy columns (MAP/ARRAY/ROW) without pushed-down filters get their streams enqueued into a per-column cloned BufferedInput, loaded only on first downstream access. If the filter eliminates all rows in a stripe, the deferred column's load() is never called — zero I/O for that column in that stripe. How it works: - During column reader construction, qualifying columns have their streams enqueued into a cloned BufferedInput instead of the shared one. - The shared input is loaded during stripe init (eager columns only). - Each deferred column's clone is loaded independently via ColumnLoader when the LazyVector is first accessed. - Batch size estimation uses totalStreamBytes (compressed stream sizes from tablet metadata) for deferred columns since their decoders are not yet loaded. Gated behind the `lazy_column_io` session property (default off). Detailed analysis (naming changes, per-column vs shared clone tradeoff, code flow, shadow data): P2302893230 Differential Revision: D100277342

…bator#677) Summary: X-link: facebookincubator/velox#17350 Today, the Nimble selective reader loads all column streams upfront during stripe init — including columns wrapped in LazyVectors. The lazy contract only defers decoding; the underlying I/O is still eager. When a high-selectivity remaining filter eliminates most rows, the eagerly-loaded data for output-only columns is never decoded — but the I/O cost was already paid. This diff extends laziness from decoding to I/O. Complex lazy columns (MAP/ARRAY/ROW) without pushed-down filters get their streams enqueued into a per-column cloned BufferedInput, loaded only on first downstream access. If the filter eliminates all rows in a stripe, the deferred column's load() is never called — zero I/O for that column in that stripe. How it works: - During column reader construction, qualifying columns have their streams enqueued into a cloned BufferedInput instead of the shared one. - The shared input is loaded during stripe init (eager columns only). - Each deferred column's clone is loaded independently via ColumnLoader when the LazyVector is first accessed. - Batch size estimation uses totalStreamBytes (compressed stream sizes from tablet metadata) for deferred columns since their decoders are not yet loaded. Gated behind the `lazy_column_io` session property (default off). Detailed analysis (naming changes, per-column vs shared clone tradeoff, code flow, shadow data): P2302893230 Reviewed By: HuamengJiang Differential Revision: D100277342

…bator#17350) Summary: X-link: facebookincubator/nimble#677 Today, the Nimble selective reader loads all column streams upfront during stripe init — including columns wrapped in LazyVectors. The lazy contract only defers decoding; the underlying I/O is still eager. When a high-selectivity remaining filter eliminates most rows, the eagerly-loaded data for output-only columns is never decoded — but the I/O cost was already paid. This diff extends laziness from decoding to I/O. Complex lazy columns (MAP/ARRAY/ROW) without pushed-down filters get their streams enqueued into a per-column cloned BufferedInput, loaded only on first downstream access. If the filter eliminates all rows in a stripe, the deferred column's load() is never called — zero I/O for that column in that stripe. How it works: - During column reader construction, qualifying columns have their streams enqueued into a cloned BufferedInput instead of the shared one. - The shared input is loaded during stripe init (eager columns only). - Each deferred column's clone is loaded independently via ColumnLoader when the LazyVector is first accessed. - Batch size estimation uses totalStreamBytes (compressed stream sizes from tablet metadata) for deferred columns since their decoders are not yet loaded. Gated behind the `lazy_column_io` session property (default off). Detailed analysis (naming changes, per-column vs shared clone tradeoff, code flow, shadow data): P2302893230 Reviewed By: HuamengJiang Differential Revision: D100277342

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 27, 2026

meta-codesync Bot added fb-exported meta-exported labels Apr 27, 2026

prashantgolash changed the title ~~Deferred per-column I/O for lazy FlatMap columns in Nimble~~ [nimble]feat: Deferred per-column I/O for lazy FlatMap columns in Nimble Apr 27, 2026

prashantgolash mentioned this pull request Apr 27, 2026

feat: Lazy per-column I/O for complex columns in Nimble (#17350) facebookincubator/velox#17350

Open

prashantgolash changed the title ~~[nimble]feat: Deferred per-column I/O for lazy FlatMap columns in Nimble~~ feat: Deferred per-column I/O for lazy FlatMap columns in Nimble Apr 27, 2026

prashantgolash force-pushed the export-D100277342 branch from 681e874 to 143d35b Compare April 27, 2026 06:09

meta-codesync Bot changed the title ~~feat: Deferred per-column I/O for lazy FlatMap columns in Nimble~~ Deferred per-column I/O for lazy FlatMap columns in Nimble Apr 27, 2026

prashantgolash force-pushed the export-D100277342 branch from 143d35b to 5a29802 Compare April 27, 2026 06:15

prashantgolash changed the title ~~Deferred per-column I/O for lazy FlatMap columns in Nimble~~ feat: Deferred per-column I/O for lazy FlatMap columns in Nimble Apr 27, 2026

prashantgolash force-pushed the export-D100277342 branch from 5a29802 to 0025f13 Compare April 28, 2026 06:40

meta-codesync Bot changed the title ~~feat: Deferred per-column I/O for lazy FlatMap columns in Nimble~~ feat: Deferred per-column I/O for lazy FlatMap columns in Nimble (#677) Apr 28, 2026

prashantgolash force-pushed the export-D100277342 branch from 0025f13 to a43fb66 Compare April 28, 2026 15:05

prashantgolash force-pushed the export-D100277342 branch from a43fb66 to 42cbb2f Compare April 29, 2026 06:27

prashantgolash force-pushed the export-D100277342 branch from 42cbb2f to d08b7b0 Compare April 29, 2026 18:30

meta-codesync Bot changed the title ~~feat: Deferred per-column I/O for lazy FlatMap columns in Nimble (#677)~~ Lazy per-column I/O for complex columns in Nimble May 1, 2026

prashantgolash force-pushed the export-D100277342 branch from d08b7b0 to 164366f Compare May 1, 2026 05:16

prashantgolash changed the title ~~Lazy per-column I/O for complex columns in Nimble~~ feat: Lazy per-column I/O for complex columns in Nimble May 1, 2026

meta-codesync Bot changed the title ~~feat: Lazy per-column I/O for complex columns in Nimble~~ feat: Lazy per-column I/O for complex columns in Nimble (#677) May 4, 2026

prashantgolash force-pushed the export-D100277342 branch from 164366f to 4fed353 Compare May 4, 2026 00:14

prashantgolash force-pushed the export-D100277342 branch from 4fed353 to 478ca3e Compare May 8, 2026 05:46

prashantgolash force-pushed the export-D100277342 branch from 478ca3e to f6d53b5 Compare May 12, 2026 21:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Lazy per-column I/O for complex columns in Nimble (#677)#677

feat: Lazy per-column I/O for complex columns in Nimble (#677)#677
prashantgolash wants to merge 1 commit into
facebookincubator:mainfrom
prashantgolash:export-D100277342

prashantgolash commented Apr 27, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

meta-codesync Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

prashantgolash commented Apr 27, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

prashantgolash commented Apr 27, 2026 •

edited by meta-codesync Bot

Loading