Skip to content

Ddb117 DynamoDB JSON conversion benchmark#135

Merged
olpa merged 23 commits intomasterfrom
ddb117-benchmarks
Dec 2, 2025
Merged

Ddb117 DynamoDB JSON conversion benchmark#135
olpa merged 23 commits intomasterfrom
ddb117-benchmarks

Conversation

@olpa
Copy link
Copy Markdown
Owner

@olpa olpa commented Dec 2, 2025

  • Create a DynamoDB benchmark
  • Roundtrips ddb-normal-ddb and normal-ddb-normal
  • Vibe coded ddb converters: Rust serde, Python no boto, Python boto3
  • Tool "json-eq.sh" for semantic json equality
  • Fix scan_json example to use buffered io
  • Document the result, including a transcript and a final plot

close #117

olpa and others added 23 commits November 26, 2025 05:20
Add BufReader and BufWriter to all file and stdio operations,
and increase rjiter buffer from 4KB to 64KB. This dramatically
reduces syscall overhead.

Performance improvements on 100k records (80MB input):
- Time: 10.8s → 0.84s (12.9x faster)
- Write syscalls: 5.7M → 9.8K (99.83% reduction)
- Read syscalls: 22K → 1.3K (94.3% reduction)

The previous implementation was performing unbuffered I/O,
averaging only ~7.7 bytes per write syscall. With standard
64KB buffering, I/O overhead dropped from 34s to 0.035s.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Refactored format detection to avoid reopening input files. The tool
now reads the first line once for detection (checking .jsonl extension
first as a fast path), then preserves that line for processing. This
eliminates redundant I/O operations while maintaining efficient
streaming processing for JSONL files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Applied the same optimizations from the Rust version (commit cb6a945):
- Refactored format detection to read first line once and pass it through
  to processing functions, eliminating file reopening
- Added explicit buffered I/O wrappers for file streams
- Modified process_jsonl() and process_json() to accept first_line parameter
- Replaced enumerate() with manual line counting for better control

Performance improvements:
- Python: ~1.0s for 10k records (~10k records/sec)
- Rust: ~0.13s for 10k records (~77k records/sec)

The 7.7x performance gap is expected due to boto3 overhead and Python's
interpreted nature. No critical performance issues found in the Python
implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Clone of ddb_convert_rust implementation in Python without boto3 library.
Features:
- Convert between DynamoDB JSON and normal JSON formats
- Support for JSONL and single JSON files with auto-detection
- Streaming processing with buffered I/O
- All DynamoDB types supported (S, N, BOOL, NULL, M, L, SS, NS, BS, B)
- CLI with same interface as Rust version

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@olpa olpa merged commit 7c3587c into master Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dynamodb example: basic docs and fixture

1 participant