Ddb117 DynamoDB JSON conversion benchmark#135
Merged
Conversation
Add BufReader and BufWriter to all file and stdio operations, and increase rjiter buffer from 4KB to 64KB. This dramatically reduces syscall overhead. Performance improvements on 100k records (80MB input): - Time: 10.8s → 0.84s (12.9x faster) - Write syscalls: 5.7M → 9.8K (99.83% reduction) - Read syscalls: 22K → 1.3K (94.3% reduction) The previous implementation was performing unbuffered I/O, averaging only ~7.7 bytes per write syscall. With standard 64KB buffering, I/O overhead dropped from 34s to 0.035s. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Refactored format detection to avoid reopening input files. The tool now reads the first line once for detection (checking .jsonl extension first as a fast path), then preserves that line for processing. This eliminates redundant I/O operations while maintaining efficient streaming processing for JSONL files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Applied the same optimizations from the Rust version (commit cb6a945): - Refactored format detection to read first line once and pass it through to processing functions, eliminating file reopening - Added explicit buffered I/O wrappers for file streams - Modified process_jsonl() and process_json() to accept first_line parameter - Replaced enumerate() with manual line counting for better control Performance improvements: - Python: ~1.0s for 10k records (~10k records/sec) - Rust: ~0.13s for 10k records (~77k records/sec) The 7.7x performance gap is expected due to boto3 overhead and Python's interpreted nature. No critical performance issues found in the Python implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Clone of ddb_convert_rust implementation in Python without boto3 library. Features: - Convert between DynamoDB JSON and normal JSON formats - Support for JSONL and single JSON files with auto-detection - Streaming processing with buffered I/O - All DynamoDB types supported (S, N, BOOL, NULL, M, L, SS, NS, BS, B) - CLI with same interface as Rust version 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
scan_jsonexample to use buffered ioclose #117