perf: fast-path column decoders for non-null data + execute/2 for DDL#154
Closed
hugobarauna wants to merge 2 commits into
Closed
perf: fast-path column decoders for non-null data + execute/2 for DDL#154hugobarauna wants to merge 2 commits into
hugobarauna wants to merge 2 commits into
Conversation
When all values in a column are valid (no nulls), skip the per-element bitmap_valid? check and decoder closure call. DuckDB always sends validity bitmaps even for non-null data, so detect all-valid via binary comparison. Fast paths for: s8-s64, u8-u64, f16-f64, string/binary (32+64 offsets), date32. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two changes to support Dux's view-based compute optimization: 1. Add Adbc.Connection.execute/2 — command dispatch (no stream) for DDL statements that return no data. Faster than query/2 for CREATE VIEW. 2. Fix delete_on_gc handler to DROP VIEW for __dux_v_ prefixed names. Previously all GC cleanup used DROP TABLE, which is a no-op on views. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
|
Thank you! I pushed each of those as individual commits. For the validity one, I opted-in to check the null_count on the C side, which should be quite more efficient (assuming it is populated correctly). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
This PR is the result of an AI-driven performance optimization experiment on Dux (a DuckDB dataframe library) using Claude Code. The code was not human-reviewed beyond passing Dux's test suite. The intent is to show potential optimization approaches — feel free to close this PR or cherry-pick individual ideas.
Companion Dux PR: elixir-dux/dux#44
Changes
1. Fast-path column decoders when no nulls are present
DuckDB always sends validity bitmaps even when all values are valid (all 0xFF bytes). The current decoders check
bitmap_valid?and call through adecoderclosure per element — for 400k rows × 5 columns, that's 2M unnecessary function calls.This adds an
all_valid?/1check per column (O(1) binary comparison) and dispatches to plain decoders that skip both the bitmap check and the closure call:Fast paths added for: s8-s64, u8-u64, f16-f64, string/binary (32+64 bit offsets), and date32.
Impact: ~15-20% improvement on
Adbc.Result.to_mapfor typical non-null data.2.
Adbc.Connection.execute/2for DDL statementsAdds a public function that uses command dispatch (no stream setup/teardown) for statements that return no data (
CREATE VIEW,DROP TABLE,SET, etc.):This avoids the stream → stream_results → unlock cycle that
query/2goes through. Saves ~50-100μs per call.3. View cleanup in GC handler (⚠️ Dux-specific, open to alternatives)
Modified
handle_command({:delete_on_gc, table_name})to check for a__dux_v_name prefix and useDROP VIEW IF EXISTSinstead ofDROP TABLE IF EXISTS. This is needed because Dux's performance optimization creates temp views instead of temp tables, but the existing GC mechanism only drops tables.I'm open to alternative approaches here. Some options:
adbc_delete_view_on_gc_newNIF functionadbc_delete_on_gc_newspecifying the drop commandadbc_delete_on_gc_new/3that accepts a custom SQL templateThe prefix-based approach is the least invasive but couples ADBC to a Dux naming convention.
4. Single-batch fast path for
to_map/1Added a pattern match for the common single-batch case that skips
Enum.zip_with+Enum.flat_map.Benchmark impact (measured in Dux context)
These ADBC changes contributed to closing the gap between Dux and Explorer on
to_rows():🤖 Generated with Claude Code