Skip to content

fix: embedded benchmark timing fallback + Python wrapper resilience#250

Merged
genezhang merged 2 commits intomainfrom
test/embedded-benchmark-fixes
Mar 28, 2026
Merged

fix: embedded benchmark timing fallback + Python wrapper resilience#250
genezhang merged 2 commits intomainfrom
test/embedded-benchmark-fixes

Conversation

@genezhang
Copy link
Copy Markdown
Owner

Summary

  • Benchmark runner falls back to wall-clock timing when FFI bindings are stale
  • Python wrapper handles missing FFI methods gracefully (no crash on stale _ffi.py)

First LDBC Embedded Results (chdb + sf0.003 Parquet)

26/36 passed, 5 skipped, 10 failed (72% pass rate)
Database init: 0.22s
Category Count Details
Passed 26 short-1..7, complex-1..6,8,9,11,13, bi-1..3,5,9,11..14
chdb timeout 5 complex-7,10, bi-7,17,18 (session expired)
Parse gap 2 bi-4 (CALL subquery), bi-6 (syntax)
CTE resolution 2 complex-12, bi-8 (column not found)
Type mismatch 1 complex-14 (UInt64 vs Int64)
Skipped 5 bi-10,15,16,19,20 (known feature gaps)

🤖 Generated with Claude Code

- embedded_benchmark.py: Fall back to wall-clock timing when FFI
  bindings lack get_compiling_time/get_execution_time (stale bindings).
  Use len(list(qr)) for row count instead of qr.num_rows.
- clickgraph/__init__.py: Gracefully handle missing FFI methods in
  QueryResult.__init__ with try/except (prevents crash when bindings
  are not regenerated after new FFI methods are added).

First end-to-end LDBC results on embedded chdb:
  26/36 passed, 5 skipped, 10 failed (72% pass rate)
  Failures: 5 chdb session timeout, 2 parse gaps, 2 CTE resolution,
  1 type mismatch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@genezhang genezhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Timing fallback + Python wrapper resilience

Small PR, two changes. The intent is good — don't crash on stale FFI bindings — but I have concerns about the approach.

Observations

1. try/except AttributeError in the wrapper is a band-aid

The three separate try/except blocks in QueryResult.__init__ mask a real problem: stale _ffi.py bindings that are out of sync with libclickgraph_ffi.so. Silent fallback (0.0 timing, empty types) means users won't know their bindings are stale — they'll just get wrong/missing data with no warning.

Better approach: a single try/except with a logged warning:

try:
    self._compiling_time = ffi_result.get_compiling_time()
    self._execution_time = ffi_result.get_execution_time()
    self._column_data_types = ffi_result.get_column_data_types()
except AttributeError:
    import warnings
    warnings.warn(
        "Stale FFI bindings: timing/type methods not available. "
        "Regenerate with: uniffi-bindgen generate ...",
        stacklevel=2,
    )
    self._compiling_time = 0.0
    self._execution_time = 0.0
    self._column_data_types = []

This way the user sees a clear warning once, not silent zeros.

2. result.row_count = len(list(qr)) consumes the iterator

The original code used qr.num_rows (a property). The new code does len(list(qr)) which materializes all rows into a list and then discards them. This:

  • Wastes memory for large result sets
  • Consumes the iterator — the qr object is now exhausted
  • Means the verbose first-row print was removed (the qr[0] line) likely because the iterator was already consumed

qr.num_rows should still work since it's a property on the wrapper, not a timing method. Was this change intentional? If num_rows also fails on stale bindings, wrap it in the same try/except rather than consuming the iterator.

3. Benchmark timing fallback is fine

The try/except AttributeError with wall-clock fallback in embedded_benchmark.py is reasonable for a benchmark runner — it's a development tool, not library code. The wall-clock fallback preserves the benchmark's ability to report something even with older bindings.

4. Verbose output removed

The --verbose first-row print was deleted:

-                if args.verbose and qr.num_rows > 0:
-                    print(f"\n--- {query_id} (first row) ---")
-                    print(qr[0])

This was useful for debugging query results. If it was removed because qr[0] fails after list(qr) consumes the iterator, fix the root cause (use num_rows) rather than removing the feature.

Verdict

The benchmark runner change is fine. The Python wrapper change needs refinement — use a single warning instead of silent fallback, and keep num_rows instead of len(list(qr)).

- Python wrapper: consolidated 3 try/except into single block with
  warnings.warn() including regeneration instructions
- Benchmark: restored qr.num_rows (doesn't consume iterator) and
  --verbose first-row output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@genezhang genezhang merged commit e3a9465 into main Mar 28, 2026
4 checks passed
@genezhang genezhang deleted the test/embedded-benchmark-fixes branch March 28, 2026 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant