Ddb117 DynamoDB JSON conversion benchmark by olpa · Pull Request #135 · olpa/streaming_json

olpa · 2025-12-02T04:27:07Z

Create a DynamoDB benchmark
Roundtrips ddb-normal-ddb and normal-ddb-normal
Vibe coded ddb converters: Rust serde, Python no boto, Python boto3
Tool "json-eq.sh" for semantic json equality
Fix scan_json example to use buffered io
Document the result, including a transcript and a final plot

close #117

Add BufReader and BufWriter to all file and stdio operations, and increase rjiter buffer from 4KB to 64KB. This dramatically reduces syscall overhead. Performance improvements on 100k records (80MB input): - Time: 10.8s → 0.84s (12.9x faster) - Write syscalls: 5.7M → 9.8K (99.83% reduction) - Read syscalls: 22K → 1.3K (94.3% reduction) The previous implementation was performing unbuffered I/O, averaging only ~7.7 bytes per write syscall. With standard 64KB buffering, I/O overhead dropped from 34s to 0.035s. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Refactored format detection to avoid reopening input files. The tool now reads the first line once for detection (checking .jsonl extension first as a fast path), then preserves that line for processing. This eliminates redundant I/O operations while maintaining efficient streaming processing for JSONL files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Applied the same optimizations from the Rust version (commit cb6a945): - Refactored format detection to read first line once and pass it through to processing functions, eliminating file reopening - Added explicit buffered I/O wrappers for file streams - Modified process_jsonl() and process_json() to accept first_line parameter - Replaced enumerate() with manual line counting for better control Performance improvements: - Python: ~1.0s for 10k records (~10k records/sec) - Rust: ~0.13s for 10k records (~77k records/sec) The 7.7x performance gap is expected due to boto3 overhead and Python's interpreted nature. No critical performance issues found in the Python implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Clone of ddb_convert_rust implementation in Python without boto3 library. Features: - Convert between DynamoDB JSON and normal JSON formats - Support for JSONL and single JSON files with auto-detection - Streaming processing with buffered I/O - All DynamoDB types supported (S, N, BOOL, NULL, M, L, SS, NS, BS, B) - CLI with same interface as Rust version 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

olpa and others added 23 commits November 26, 2025 05:20

Copy ddb examples from example to bench.

89488c1

Roundtrip for ddb.

0725491

Add "json-eq.sh" for semantic json equality.

0947775

Update fixture to use only supported attributes.

41a3684

Several conversion tools.

087fd9a

Don't create sets for ddb, create generic lists.

a7bc111

Rust ddb conv: optimize conversion.

23d0ef3

Fixture for normal-ddb-normal roundtrip.

d1e0f80

Makefile for normal-to-ddb-to-normal.

6701085

Correct fixture to be jsonl.

28ccf0d

add noboto converter to make files

1a59a63

Parse logs to "stats.json".

2791528

Collect file stats.

6113652

Parse stats, generate some image.

e78c45a

Tune the presentation.

035fdc5

Performance plot.

44e3362

Describe the content of the package.

7f25b0a

Proofread README.

5a5efb6

Add benchmark to the main README.

5d64486

olpa merged commit 7c3587c into master Dec 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ddb117 DynamoDB JSON conversion benchmark#135

Ddb117 DynamoDB JSON conversion benchmark#135
olpa merged 23 commits intomasterfrom
ddb117-benchmarks

olpa commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

olpa commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant