A high-performance two-stage JSON fragment scanner written in Rust. Extracts complete JSON objects and arrays from documents containing mixed content (log files, JSON Lines, etc.).
- Two-stage pipeline: SIMD character classification + fragment extraction
- SIMD-accelerated: AVX2/SSE4.2 with automatic scalar fallback
- Zero-copy API: Buffer reuse via
StagedScannereliminates repeated allocations - Fragment detection: Identifies JSON objects (
{}) and arrays ([]) - Error reporting: Detailed error information for incomplete/invalid fragments
- Position tracking: Absolute byte offsets for each fragment
Add this to your Cargo.toml:
[dependencies]
json-extractor = "0.1.0"Extract the first JSON fragment from a string:
use json_extractor::extract_first;
let input = r#"some log prefix {"name": "Alice"} tail"#;
assert_eq!(extract_first(input), Some(r#"{"name": "Alice"}"#));Use StagedScanner for full control and buffer reuse across repeated scans:
use json_extractor::StagedScanner;
let mut scanner = StagedScanner::new();
let data = br#"some prefix {"name": "Alice"} garbage {"age": 30} more text"#;
let fragments = scanner.scan_fragments(data);
assert_eq!(fragments.len(), 2);
assert!(fragments[0].is_complete());
assert_eq!(&data[fragments[0].start..fragments[0].end()], br#"{"name": "Alice"}"#);use json_extractor::{StagedScanner, FragmentStatus, ErrorKind};
let mut scanner = StagedScanner::new();
let data = br#"{"unterminated": "value"#;
let fragments = scanner.scan_fragments(data);
match &fragments[0].status {
FragmentStatus::Incomplete(err) => {
println!("Error: {err}");
}
FragmentStatus::Complete => {}
}Benchmarked on x86_64 with AVX2:
| Workload | Throughput |
|---|---|
| Long strings (1KB) | 14.9 GiB/s |
| Large arrays (10k) | 3.44 GiB/s |
| Mixed log files | 1.63 GiB/s |
| Simple objects | 1.21 GiB/s |
| Deep nesting (50) | 1.10 GiB/s |
Run benchmarks:
cargo bench --bench scanner_bench 2>/dev/nullextract_first— Extract the first complete JSON fragment from a&str. Simplest entry point.StagedScanner— Stateful scanner with buffer reuse. Best for repeated scans or when you need all fragments.JsonFragmentScanner— Convenience stateless wrapper (allocates per call).Fragment— Extracted fragment withstart,length,status,end(),is_complete().FragmentStatus—CompleteorIncomplete(ErrorKind).ErrorKind— Detailed error variants (unterminated strings, mismatched brackets, etc.).
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contributions are welcome!
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.