-
Notifications
You must be signed in to change notification settings - Fork 2
Initial implementation #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/gemini review |
WalkthroughThis change introduces the initial implementation of a zero-allocation, no_std-friendly JSON pull parser library and its supporting tokenizer. It adds parser and tokenizer crates, detailed configuration and design documentation, flexible number and string handling, escape processing, multiple parser implementations, comprehensive examples, and extensive tests. Workspace and development configuration files are also included. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Parser
participant Tokenizer
participant Buffer
participant ScratchBuffer
User->>Parser: Create PullParser(input, scratch_buffer)
loop Event Iteration
Parser->>Tokenizer: next_token()
Tokenizer-->>Parser: EventToken (e.g., String, Number, StartObject)
alt String Event
Parser->>Buffer: Get string slice
alt No escapes
Buffer-->>Parser: Borrowed slice
else Escapes found
Parser->>ScratchBuffer: Unescape and copy
ScratchBuffer-->>Parser: Unescaped slice
end
Parser-->>User: Event::String(Borrowed/Unescaped)
else Number Event
Parser->>Buffer: Get number slice
Parser->>Parser: Parse number (int/float/overflow)
Parser-->>User: Event::Number(JsonNumber)
else Structural Event
Parser-->>User: Event (StartObject, EndArray, etc.)
end
end
Parser-->>User: Event::EndDocument
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @kaidokert, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request represents the initial implementation of a highly configurable, zero-allocation JSON pull parser designed for no_std
environments, particularly embedded systems. It establishes a multi-crate workspace, lays down core architectural principles, and provides robust parsing capabilities with a focus on memory efficiency and compile-time configurability.
Highlights
- Initial Project Setup: This pull request establishes the foundational structure for a new Rust-based JSON pull parser, including workspace configuration, Git attributes, and pre-commit hooks for consistent code quality and formatting.
- Core JSON Parser Implementation: The
stax
crate introduces ano_std
, zero-allocation JSON pull parser with flexible configuration options for embedded environments, including configurable integer widths (i32
/i64
) and various float handling behaviors (full support, error on float, truncate float, or raw string preservation). - Streaming and Slice-based Parsers: Two primary parser implementations are provided: a
DirectParser
for streaming input from aReader
trait with a single internal buffer, and aPullParser
(FlexParser) for parsing from a slice with a copy-on-escape optimization for string unescaping. - Modular Tokenizer: A separate
ujson
tokenizer crate is introduced, featuring a non-recursive design and a genericBitStack
implementation (includingArrayBitStack
) for efficient tracking of arbitrary nesting depths, crucial for embedded systems. - Comprehensive Documentation and Examples: Detailed
DESIGN.md
andCONFIGURATION.md
documents outline the architectural decisions and configurable aspects of the parser. Ademos
crate provides multiple examples showcasing different parser configurations and usage patterns.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the initial implementation of the stax
JSON parser and its underlying tokenizer
. The project is well-structured with a workspace, examples, and documentation. The code demonstrates a strong focus on no_std
compatibility and performance, particularly with the copy-on-escape mechanism and configurable features.
I've identified several areas for improvement, including a critical bug in the ArrayBitStack
implementation, a potential stack overflow in the recursive parser, and several places where errors are not propagated correctly. I've also made suggestions to improve dependency management. Overall, this is a very solid foundation for a high-performance JSON parser. Addressing the issues raised will significantly improve its robustness and correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The code changes introduce a high-level, zero-allocation JSON pull parser with flexible configuration for integer and float handling, suitable for embedded and no_std environments. I've found a few issues, including a critical correctness bug in the ArrayBitStack
implementation, and some dependency improvements. Addressing these will significantly improve the robustness and correctness of the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
🔭 Outside diff range comments (1)
tokenizer/Cargo.toml (1)
1-16
: Align crate metadata & naming with folder, add missing fieldsThe crate lives under
tokenizer/
but is published asujson
, which is confusing. Pick one or document the rationale. Also add standard metadata and tighten feature gating:[package] -name = "ujson" -version = "0.1.0" +name = "tokenizer" # or rename the folder; keep consistent +version = "0.1.0" edition = "2021" +license = "MIT OR Apache-2.0" +repository = "https://github.com/kaidokert/picojson-rs" +description = "Zero-allocation JSON tokenizer used by the stax pull parser" + +# Prevent accidental publish until 1.0 +publish = false [dependencies] clap = { version = "4.5.31", optional = true, features = ["derive"] } log = "0.4.26" [features] -clap = ["dep:clap"] +default = [] # keep no_std friendly default +cli = ["dep:clap"] # clearer than repeating the crate nameBenefits:
• Avoids name/path mismatch confusion.
• Provides license/compliance data.
• Ensures examples don’t leak to crates.io unintentionally.
• Clearer feature naming.
🧹 Nitpick comments (25)
.pre-commit-config.yaml (1)
1-24
: Add Rust-specific quality hooks for stronger gatekeeping
rustfmt
is covered, but you can cheaply catch many issues before CI by addingcargo clippy --all-targets --all-features -- -D warnings
and a plaincargo check
.
They are available in the samepre-commit-rust
repo.hooks: - id: fmt name: Fmt + - id: clippy + name: Clippy (deny warnings) + args: ["--all-targets", "--all-features", "--", "-D", "warnings"] + - id: cargo-check + name: Cargo Checkdemos/src/lib.rs (1)
1-2
: Empty library file – clarify intentCurrently the crate compiles to an empty
rlib
.
If thedemos
package is meant only for binaries/examples, consider:
- Removing
src/lib.rs
(andlib
section in Cargo.toml) and keeping justsrc/bin/*
.- Or adding a minimal doc-comment so
cargo doc
doesn’t emit “crate has no documentation”.- +//! Helper crate that aggregates demonstration binaries for `stax`.tokenizer/README.md (1)
1-3
: Minor wording tweak for conciseness-Non-recursive JSON stream tokenizer. - -It uses 1 bit per nesting depth to track whether the level represents an array [] or an object {} +Non-recursive JSON stream tokenizer using one bit per nesting level to distinguish arrays (`[]`) from objects (`{}`).demos/Cargo.toml (1)
1-4
: Add metadata & mark crate as non-publishable
demos
is only for examples; consider preventing accidental publication and improving discoverability by adding standard fields:[package] name = "demos" version = "0.0.1" edition = "2021" +publish = false # avoid pushing examples to crates.io +license = "MIT OR Apache-2.0" +description = "Demonstration binaries for the stax / tokenizer workspace" +repository = "https://github.com/kaidokert/picojson-rs" +keywords = ["json", "examples"]Cargo.toml (1)
1-3
: Minor readability nit: add spacing & newlineThe members list is valid but a bit cramped; adding spaces (or line-per-item) makes diffs cleaner:
-members = [ "stax","tokenizer", "demos"] +members = [ + "stax", + "tokenizer", + "demos", +]tokenizer/src/tokenizer/testdata/i_structure_500_nested_arrays.json (1)
1-1
: Consider generating this deep-nest test at runtimeChecking very deep nesting is great, but a 500-level literal inflates repo size and hurts diff readability. You could:
- Generate it in the test with a loop.
- Store it gzip-compressed and expand on test run.
No functional change required, just a maintenance tip.
tokenizer/src/bin/main.rs (3)
4-4
: Remove unused commented import.The commented import is not needed since
std::process::exit
is used with the full path.-//use std::process;
7-7
: Remove debug print statement.This appears to be leftover debug code that should be removed from the final implementation.
- println!("Hello, world!"); -
15-21
: Consider memory usage for large files.Reading the entire file into memory could be problematic for very large JSON files. Consider if streaming parsing would be more appropriate for this use case.
stax/Cargo.toml (2)
22-25
: Address TODO comments.These TODO comments indicate unfinished configuration decisions that should be resolved.
The comments suggest:
- Line 22-23: Decision needed on log vs defmt integration
- Line 24-25:
test-env-log
dependency may not be neededShould these be cleaned up or converted to GitHub issues for tracking?
10-17
: Consider feature flag organization.The mutually exclusive features (int32/int64 and float behavior options) might benefit from better organization or documentation to prevent invalid combinations.
Consider adding feature conflict validation or documenting the mutual exclusivity more clearly in the manifest comments.
stax/src/json_string.rs (1)
30-39
: Simplify Deref implementation.The
Deref
implementation duplicates the pattern matching logic fromas_str()
. Consider delegating to the existing method for consistency and maintainability.impl Deref for String<'_, '_> { type Target = str; fn deref(&self) -> &Self::Target { - match self { - String::Borrowed(s) => s, - String::Unescaped(s) => s, - } + self.as_str() } }stax/README.md (3)
45-45
: Add missing comma for better readability.Static analysis correctly identified missing comma after "By default".
-By default this is a 32-bit int, but can be changed +By default, this is a 32-bit int, but can be changed
50-55
: Fix markdown list indentation.The unordered list items should start at column 0 for proper markdown formatting.
- * int64 ( default ) - numbers are returned in int64 values - * int32 - integers are returned as int32, to avoid 64-bit math on constrained targets, e.g. Cortex-M0 - * float - full float support is included. - * float-error - Any floating point input will yield an error, to reduce float math dependency - * float-skip - Float values are skipped. - * float-truncate - float values are truncated to integers. Scientific notation will generate an error +* int64 (default) - numbers are returned as int64 values +* int32 - integers are returned as int32, to avoid 64-bit math on constrained targets, e.g. Cortex-M0 +* float - full float support is included +* float-error - Any floating-point input will yield an error, to reduce float math dependency +* float-skip - Float values are skipped +* float-truncate - float values are truncated to integers. Scientific notation will generate an error
59-59
: Add missing comma for consistency.Same grammar issue as line 45.
-By default full float and int64 support is enabled. +By default, full float and int64 support is enabled.tokenizer/src/lib.rs (1)
10-30
: Consider if the BitStackCore trait adds value.The
BitStackCore
trait aggregates multiple trait bounds, but it's only used as a re-export. Consider whether this abstraction is necessary or if the individual bounds could be used directly where needed.CONFIGURATION.md (1)
10-10
: Minor grammar improvement for compound adjective.Consider using a hyphen for the compound adjective "full-range" that modifies "integer support".
-- **`int64`** (default): Use `i64` for full range integer support +- **`int64`** (default): Use `i64` for full-range integer supportstax/examples/direct_parser_demo.rs (1)
63-68
: Consider documenting the rationale for buffer and chunk sizes.While these sizes work well for the demonstration, consider adding a comment explaining why 256 bytes for buffer and 8 bytes for chunks were chosen. This would help users understand the trade-offs when adapting the example.
// Create ArrayReader that reads in small chunks (simulates network streaming) + // Using 8-byte chunks to demonstrate multiple read calls for the 74-byte input let reader = ArrayReader::new(json, 8); // Read 8 bytes at a time // Create DirectParser with a reasonably sized buffer + // 256 bytes is sufficient for this example while keeping memory usage low let mut buffer = [0u8; 256];stax/examples/array_bitstack_demo.rs (1)
93-99
: Consider propagating the error instead of silently breakingThe error is caught but not propagated. Since the function returns
Result<(), stax::ParseError>
, you could propagate the error for better error reporting.- Some(Err(_)) => { - println!( - " ! Parse error encountered at depth {}, continuing...", - depth - ); - break; - } + Some(Err(e)) => { + println!(" ! Parse error encountered at depth {}: {:?}", depth, e); + return Err(e); + }DESIGN.md (2)
152-152
: Fix grammatical error: "In addition of" → "In addition to"-More: In addition of taking just slice [u8] as input, we should accept an `impl Reader` of some sort. +More: In addition to taking just slice [u8] as input, we should accept an `impl Reader` of some sort.
155-155
: Fix typo: "implent" → "implement"-make our own, and auto-implent it for arrays and slices or for anything that looks like AsRef<[u8]> +make our own, and auto-implement it for arrays and slices or for anything that looks like AsRef<[u8]>stax/src/json_number.rs (1)
178-196
: Consider adding more robust scientific notation handlingThe float-truncate implementation rejects all scientific notation, but simple cases like "1e3" (= 1000) could be handled without float math by parsing the exponent and shifting.
Would you like me to provide an implementation that handles simple scientific notation cases without using float operations?
stax/src/copy_on_escape.rs (1)
162-163
: Simplify error conversion.The error conversion can be simplified using the
?
operator directly.- let unescaped_str = - core::str::from_utf8(unescaped_slice).map_err(ParseError::InvalidUtf8)?; + let unescaped_str = core::str::from_utf8(unescaped_slice) + .map_err(ParseError::InvalidUtf8)?;tokenizer/src/tokenizer/mod.rs (2)
183-184
: Document test-only variant purpose.The
Uninitialized
variant should have documentation explaining its test-only purpose.ArrayEnd, + /// Test-only variant used as a placeholder in test arrays #[cfg(test)] Uninitialized,
1542-1543
: Track TODO for array BitStack support.The comment indicates that array BitStack support needs custom implementation.
Would you like me to create an issue to track implementing BitStack support for arrays?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (37)
.gitattributes
(1 hunks).gitignore
(1 hunks).pre-commit-config.yaml
(1 hunks)CONFIGURATION.md
(1 hunks)Cargo.toml
(1 hunks)DESIGN.md
(1 hunks)TODO.md
(1 hunks)demos/Cargo.toml
(1 hunks)demos/src/lib.rs
(1 hunks)stax/Cargo.toml
(1 hunks)stax/README.md
(1 hunks)stax/examples/advanced_bitstack_demo.rs
(1 hunks)stax/examples/array_bitstack_demo.rs
(1 hunks)stax/examples/direct_parser_demo.rs
(1 hunks)stax/examples/no_float_demo.rs
(1 hunks)stax/examples/simple_api_demo.rs
(1 hunks)stax/src/copy_on_escape.rs
(1 hunks)stax/src/direct_buffer.rs
(1 hunks)stax/src/direct_parser.rs
(1 hunks)stax/src/escape_processor.rs
(1 hunks)stax/src/flex_parser.rs
(1 hunks)stax/src/json_number.rs
(1 hunks)stax/src/json_string.rs
(1 hunks)stax/src/lib.rs
(1 hunks)stax/src/number_parser.rs
(1 hunks)stax/src/shared.rs
(1 hunks)stax/src/slice_input_buffer.rs
(1 hunks)stax/tests/api_test.rs
(1 hunks)stax/tests/configurable_numbers.rs
(1 hunks)tokenizer/Cargo.toml
(1 hunks)tokenizer/README.md
(1 hunks)tokenizer/src/bin/main.rs
(1 hunks)tokenizer/src/bitstack/mod.rs
(1 hunks)tokenizer/src/lib.rs
(1 hunks)tokenizer/src/tokenizer/mod.rs
(1 hunks)tokenizer/src/tokenizer/testdata/i_structure_500_nested_arrays.json
(1 hunks)tokenizer/tests/array_bitstack_test.rs
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
tokenizer/tests/array_bitstack_test.rs (1)
tokenizer/src/bitstack/mod.rs (9)
default
(5-5)default
(24-26)default
(57-59)top
(11-11)top
(37-39)top
(93-96)pop
(9-9)pop
(31-35)pop
(76-91)
stax/src/json_string.rs (2)
stax/src/json_number.rs (4)
as_str
(81-86)as_ref
(110-112)deref
(118-120)fmt
(124-141)tokenizer/src/tokenizer/mod.rs (1)
fmt
(232-238)
stax/src/json_number.rs (2)
stax/src/json_string.rs (4)
as_str
(16-21)as_ref
(25-27)deref
(33-38)fmt
(42-44)tokenizer/src/tokenizer/mod.rs (1)
fmt
(232-238)
stax/src/slice_input_buffer.rs (2)
stax/src/number_parser.rs (7)
new
(95-101)get_number_slice
(15-15)get_number_slice
(105-112)current_position
(18-18)current_position
(114-116)is_empty
(21-21)is_empty
(118-120)stax/src/flex_parser.rs (1)
new
(44-57)
🪛 LanguageTool
stax/README.md
[uncategorized] ~45-~45: Did you mean: “By default,”?
Context: ...state, one bit for every nesting level. By default this is a 32-bit int, but can be change...
(BY_DEFAULT_COMMA)
[uncategorized] ~50-~50: The preposition ‘as’ seems more likely in this position.
Context: ...nt64 ( default ) - numbers are returned in int64 values * int32 - integers are re...
(AI_HYDRA_LEO_REPLACE_IN_AS)
[uncategorized] ~53-~53: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...pport is included. * float-error - Any floating point input will yield an error, to reduce fl...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~59-~59: Did you mean: “By default,”?
Context: ... Please see examples/no_float_demo.rs By default full float and int64 support is enabled...
(BY_DEFAULT_COMMA)
CONFIGURATION.md
[uncategorized] ~10-~10: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... - int64
(default): Use i64
for full range integer support - int32
: Use `i32...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~16-~16: Loose punctuation mark.
Context: ...l float parsing behavior: - float
: Enable full f64 parsing support - **No ...
(UNLIKELY_OPENING_PUNCTUATION)
DESIGN.md
[uncategorized] ~152-~152: The preposition ‘to’ seems more likely in this position.
Context: ... input IMPORTANT!!! More: In addition of taking just slice [u8] as input, we sho...
(AI_HYDRA_LEO_REPLACE_OF_TO)
[grammar] ~162-~162: It appears that an article is missing.
Context: ... like println! is convenient and easy. Same should be done with Number, but it's a ...
(MISSING_ARTICLE)
[uncategorized] ~169-~169: A determiner appears to be missing. Consider inserting it.
Context: ...ting should do sensible default things. Most tricky is number formatting. The object...
(AI_EN_LECTOR_MISSING_DETERMINER)
🪛 markdownlint-cli2 (0.17.2)
stax/README.md
50-50: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
51-51: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
52-52: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
53-53: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
54-54: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
55-55: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
🔇 Additional comments (27)
.gitignore (1)
22-22
: Re-consider ignoringCargo.lock
at workspace rootFor Rust binary projects or mixed workspaces (binaries + libs) keeping
Cargo.lock
under VCS is the recommended practice – it guarantees reproducible CI and demo builds.
Unless every crate in the workspace is intended to be a pure library published on crates.io, tracking the lock-file is safer.-Cargo.lock +# If all crates are libraries published on crates.io you may ignore the lock file, +# otherwise keep it under version control for reproducibility. +# Cargo.lockTODO.md (1)
1-12
: LGTM – plaintext TODO listThe task list is clear and scoped; no review remarks.
.gitattributes (1)
1-1
: Consistent line endings enforced – looks good
No issues spotted.stax/examples/simple_api_demo.rs (1)
5-40
: Well-structured example with good practices.This example effectively demonstrates the PullParser API with:
- Proper error handling using
Result
return type- Clear pattern matching for different event types
- Good use of the iterator pattern
- Informative output for learning purposes
The code follows Rust best practices and provides a clear introduction to the API.
stax/src/json_string.rs (2)
6-22
: Well-designed copy-on-escape string type.The
String
enum provides an elegant solution for handling escaped vs unescaped strings without heap allocation. The design aligns well with the zero-allocation goals and follows similar patterns used inJsonNumber
(as seen in the relevant code snippets).
47-63
: Good test coverage for core functionality.The tests verify the key
Deref
behavior and string compatibility, which are essential for the ergonomic API.stax/README.md (1)
1-60
: Comprehensive and well-structured documentation.The README effectively explains the library's purpose, provides clear usage examples, and documents configuration options. The content clearly differentiates when to use this library vs alternatives like serde-json.
tokenizer/tests/array_bitstack_test.rs (1)
1-57
: Excellent test coverage for ArrayBitStack.The test suite comprehensively validates the ArrayBitStack implementation with well-chosen scenarios:
- Basic LIFO operations and non-mutating
top()
behavior- Multi-element handling with larger capacities
- Cross-element boundary operations when exceeding single element capacity
The tests use appropriate type parameters to demonstrate flexibility and cover edge cases effectively.
tokenizer/src/lib.rs (1)
1-8
: Well-structured crate entry point.The no_std configuration and module structure are appropriate for a tokenizer library. The public API re-exports provide a clean interface.
stax/src/lib.rs (2)
1-30
: Well-organized crate structure with comprehensive module coverage.The module organization effectively separates concerns (parsing strategies, buffer management, JSON types, utilities) and the public API re-exports provide a clean interface for users.
31-37
: Error conversion implementation is correct.The mapping from
slice_input_buffer::Error::ReachedEnd
toParseError::EndOfData
is straightforward and appropriate.CONFIGURATION.md (1)
1-142
: Excellent comprehensive configuration documentation.This documentation provides thorough coverage of the parser's configurability with:
- Clear feature flag explanations
- Practical dependency configuration examples
- Detailed API usage patterns with conditional compilation
- Testing instructions and edge case handling
The content effectively guides users in choosing appropriate configurations for their use cases.
stax/examples/advanced_bitstack_demo.rs (1)
1-90
: Excellent demonstration of BitStack configuration flexibility.This example effectively showcases the library's adaptability with three well-chosen scenarios:
- Standard configuration for typical use cases
- Memory-efficient setup for resource-constrained environments
- Deep-nesting configuration for complex JSON structures
The depth tracking logic is correct, error handling is appropriate, and the output formatting makes the differences clear to users.
stax/examples/direct_parser_demo.rs (1)
7-49
: Good implementation of Reader trait for demonstration purposes!The ArrayReader implementation effectively simulates streaming behavior with configurable chunk sizes. Good use of
saturating_sub
to prevent underflow.stax/examples/no_float_demo.rs (1)
1-166
: Excellent demonstration of configurable number parsing!The example effectively showcases different number parsing behaviors based on compile-time features. The configuration detection, test case selection, and summary output provide clear insights into how the parser behaves under various settings.
stax/tests/api_test.rs (1)
1-131
: Well-structured and comprehensive API tests!The test suite provides good coverage of the PullParser API, including:
- Parsing without escapes
- Error handling for escapes without buffer
- Successful escape handling with buffer
- Mixed data types
- Borrowed vs unescaped string handling
stax/tests/configurable_numbers.rs (1)
1-248
: Excellent test coverage for configurable number parsing!The test suite comprehensively covers all number parsing configurations including:
- Integer overflow detection for i32/i64
- Float error behavior
- Float truncation modes
- Scientific notation handling
- Default float-disabled behavior
- Embedded-friendly configurations
The use of conditional compilation ensures tests run only for their intended configurations.
stax/src/number_parser.rs (1)
1-82
: Well-designed number parsing abstractionThe
NumberExtractor
trait and parsing functions provide a clean, reusable abstraction for extracting and parsing numbers from different buffer implementations. The delimiter handling logic is particularly well thought out.stax/src/slice_input_buffer.rs (1)
1-79
: Clean buffer implementation with proper trait supportThe
SliceInputBuffer
implementation is well-structured with clear separation of concerns. TheNumberExtractor
implementation correctly handles position adjustments for parser compatibility.stax/src/copy_on_escape.rs (2)
1-27
: Well-designed struct with clear documentation.The
CopyOnEscape
struct is well-organized with appropriate lifetime parameters and comprehensive documentation for each field. The separation between global buffer state and per-string state is clean.
175-337
: Excellent test coverage.The tests comprehensively cover all major scenarios including:
- Zero-copy optimization for strings without escapes
- Escape handling with buffer switching
- Buffer reuse across multiple strings
- Unicode escape processing
- Error conditions
The test structure is clear and well-organized.
tokenizer/src/tokenizer/mod.rs (1)
395-1048
: Well-implemented JSON tokenizer state machine.The parsing logic is comprehensive and correctly implements the JSON grammar:
- Proper handling of all number formats (integers, decimals, exponents)
- Complete escape sequence processing for strings
- Correct structural validation for objects and arrays
- Good error reporting with specific error kinds
The state machine design makes the parser easy to understand and maintain.
stax/src/escape_processor.rs (2)
149-210
: Well-designed Unicode escape collector.The
UnicodeEscapeCollector
provides a clean interface for incremental hex digit collection with:
- Proper state management and reset capability
- Good bounds checking to prevent buffer overflow
- Clear completion tracking
- Appropriate error handling
276-280
: Note: Unicode surrogate pair handling limitation.The comment indicates that the test doesn't fully handle Unicode surrogate pairs. JSON allows encoding characters outside the BMP using surrogate pairs (e.g.,
\uD83D\uDE0A
for 😊).Consider whether the parser needs to handle surrogate pairs for full JSON compliance. If so, this would require detecting high surrogates (U+D800-U+DBFF) and combining them with low surrogates (U+DC00-U+DFFF).
stax/src/direct_parser.rs (1)
7-7
: Fix incorrect import syntaxThe import statement is missing the
use
keyword.-use log; +use log;Actually, this appears to be correct syntax. However, if you're not using any specific items from the
log
crate, you can remove this line entirely since thelog!
macros are available globally when the crate is included as a dependency.stax/src/shared.rs (1)
1-270
: Well-structured shared componentsThe shared module is well-designed with:
- Clear event and error type definitions
- Consistent error handling patterns through
ParserErrorHandler
- Useful position calculation utilities in
ContentRange
- Good separation of concerns
stax/src/direct_buffer.rs (1)
1-437
: Excellent buffer management implementationThe
DirectBuffer
implementation is well-designed with:
- Clear documentation of design principles
- Proper error handling and validation
- Safe handling of overlapping ranges using
copy_within
- Comprehensive test coverage including edge cases
- Good integration with the
NumberExtractor
trait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
stax/src/flex_parser.rs (3)
399-411
: Address TODO comment about escape sequence end events.The TODO comment indicates uncertainty about whether these event handlers are needed.
Would you like me to help investigate whether these End events for escape sequences are used elsewhere in the codebase, or create an issue to track this technical debt?
183-187
: Preserve tokenizer error information.The specific error from the tokenizer is being discarded and replaced with a generic error message.
- if res.is_err() { - return Err(ParseError::UnexpectedState( - "Failed to pull tokenizer events", - )); - } + res.map_err(|_| ParseError::TokenizerError)?;
415-418
: Replace recursion with loop continuation.The recursive call could cause stack overflow with malformed JSON input.
Since you already have a
while
loop at line 205, continue the loop instead:- } else { - // No event was produced, need to call next_event recursively - return self.next_event(); - } + } + // If no event was produced, the while loop will continuestax/src/direct_parser.rs (2)
494-505
: Preserve reader error information.The error from
self.reader.read()
is being discarded and mapped toParseError::EndOfData
. The reader could fail for reasons other than reaching EOF (e.g., I/O errors, network issues).Consider adding a new ParseError variant to wrap reader errors:
- .map_err(|_| ParseError::EndOfData)?; + .map_err(|_| ParseError::ReaderError)?;This would require adding a
ReaderError
variant to theParseError
enum to preserve the underlying error details.
596-604
: Handle escape buffer errors properly.The error from
append_byte_to_escape_buffer
is being silently ignored, which could lead to data corruption.- if let Ok(unescaped_char) = EscapeProcessor::process_escape_token(escape_token) { - if let Err(_) = self.append_byte_to_escape_buffer(unescaped_char) { - // Handle error - for now just continue - } - } + let unescaped_char = EscapeProcessor::process_escape_token(escape_token) + .map_err(|e| ParseError::InvalidEscape)?; + self.append_byte_to_escape_buffer(unescaped_char)?;
🧹 Nitpick comments (2)
tokenizer/src/bin/main.rs (2)
4-4
: Remove commented import.The commented
std::process
import should be removed as it's not used.-//use std::process;
7-7
: Remove or replace generic greeting.The "Hello, world!" message doesn't provide value for a JSON parser utility. Consider removing it or replacing it with a more informative message.
Either remove the line:
- println!("Hello, world!");
Or replace with something more informative:
- println!("Hello, world!"); + println!("JSON Parser - validating JSON file...");
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
stax/Cargo.toml
(1 hunks)stax/examples/array_bitstack_demo.rs
(1 hunks)stax/src/direct_parser.rs
(1 hunks)stax/src/flex_parser.rs
(1 hunks)tokenizer/src/bin/main.rs
(1 hunks)tokenizer/src/lib.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- stax/Cargo.toml
- stax/examples/array_bitstack_demo.rs
- tokenizer/src/lib.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
stax/src/direct_parser.rs (7)
stax/examples/direct_parser_demo.rs (2)
read
(28-48)new
(16-22)stax/src/json_string.rs (2)
fmt
(42-44)as_str
(16-21)stax/src/direct_buffer.rs (1)
new
(46-57)stax/src/flex_parser.rs (3)
new
(44-57)next
(191-196)next_event
(200-425)stax/src/escape_processor.rs (3)
new
(161-166)hex_count
(207-209)process_escape_token
(51-55)stax/src/number_parser.rs (2)
new
(95-101)parse_number_event
(31-59)stax/src/shared.rs (8)
new
(83-88)quote_position_from_current
(153-155)number_start_from_current
(141-143)state_mismatch
(220-229)bytes_to_utf8_str
(197-199)string_content_bounds
(111-115)string_content_bounds_before_escape
(125-132)invalid_unicode_escape
(257-259)
🔇 Additional comments (1)
stax/src/direct_parser.rs (1)
693-1383
: Excellent test coverage!The comprehensive test suite covers edge cases, error conditions, escape sequences, Unicode handling, and parser state transitions. The tests are well-organized and provide good validation of the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
tokenizer/src/bitstack/mod.rs (2)
31-35
: Critical:pop()
method violates trait contract by never returningNone
.This issue was flagged in previous reviews and remains unresolved. The trait documentation clearly states that
pop()
should returnNone
when the stack is empty, but this implementation always returnsSome(bit)
.
78-95
: Critical:ArrayBitStack::pop()
method violates trait contract by never returningNone
.This implementation also always returns
Some(bit)
and never returnsNone
, violating the trait contract that requiresNone
when the stack is empty.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tokenizer/src/bitstack/mod.rs
(1 hunks)
🔇 Additional comments (1)
tokenizer/src/bitstack/mod.rs (1)
66-67
: Well done: Dynamic bit width calculation correctly addresses previous hardcoded shift issue.The implementation now properly calculates the bit width based on the generic type size, fixing the previous critical issue with hardcoded 7-bit shifts that only worked for
u8
.Also applies to: 84-85
@sourcery-ai review |
Reviewer's GuideThis PR introduces the first end-to-end implementation of a zero-allocation, no-std JSON pull-parsing toolkit, built on top of the low-level Sequence diagram for streaming JSON parsing with DirectParsersequenceDiagram
participant App as Application
participant Reader
participant DirectParser
participant Tokenizer
participant DirectBuffer
App->>DirectParser: next_event()
loop While not EndDocument
DirectParser->>DirectBuffer: fill_buffer_from_reader()
DirectParser->>Reader: read(buf)
Reader-->>DirectParser: bytes_read
DirectParser->>Tokenizer: parse_chunk([byte], callback)
Tokenizer-->>DirectParser: Event
alt Event is Begin/End
DirectParser->>DirectBuffer: handle event (e.g. extract string/number)
end
DirectParser-->>App: Event
end
Class diagram for the new JSON Tokenizer and DirectParserclassDiagram
class Tokenizer {
- State state
- usize total_consumed
- ParseContext context
+ new()
+ parse_full(data, callback)
+ parse_chunk(data, callback)
+ finish(callback)
}
class ParseContext {
- depth
- stack
- after_comma
+ new()
+ enter_object()
+ exit_object()
+ enter_array()
+ exit_array()
+ is_object()
+ is_array()
}
class DirectParser {
- Tokenizer tokenizer
- ParserState parser_state
- Reader reader
- DirectBuffer direct_buffer
- ProcessingState processing_state
- Option<PendingContainerEnd> pending_container_end
- UnicodeEscapeCollector unicode_escape_collector
+ new(reader, buffer)
+ next()
+ next_event()
}
class Reader {
<<trait>>
+ read(buf) Result<usize, Error>
}
class DirectBuffer {
+ new(buffer)
+ is_empty()
+ current_byte()
+ advance()
+ get_fill_slice()
+ mark_filled(bytes_read)
+ has_unescaped_content()
+ append_unescaped_byte(byte)
+ clear_unescaped()
}
class Event {
<<enum>>
+ StartObject
+ EndObject
+ StartArray
+ EndArray
+ Key
+ String
+ Number
+ Bool
+ Null
+ EndDocument
}
class EventToken {
<<enum>>
+ True
+ False
+ Null
+ String
+ Key
+ Number
+ NumberAndArray
+ NumberAndObject
+ UnicodeEscape
+ EscapeSequence
+ EscapeQuote
+ EscapeBackslash
+ EscapeSlash
+ EscapeBackspace
+ EscapeFormFeed
+ EscapeNewline
+ EscapeCarriageReturn
+ EscapeTab
}
Tokenizer --> ParseContext
DirectParser --> Tokenizer
DirectParser --> DirectBuffer
DirectParser --> Reader
DirectParser --> Event
DirectParser --> EventToken
DirectParser --> UnicodeEscapeCollector
DirectBuffer --> DirectParser
EventToken --> Event
Class diagram for number and string abstractionsclassDiagram
class NumberResult {
<<enum>>
+ Integer(i64 or i32)
+ IntegerOverflow
+ Float(f64)
+ FloatTruncated(i64)
+ FloatDisabled
}
class String {
<<enum>>
+ Borrowed(&str)
+ Unescaped(&str)
+ as_str()
}
class Event {
+ Number(NumberResult)
+ String(String)
+ Key(String)
}
Event --> NumberResult
Event --> String
Class diagram for BitStack and configurable nestingclassDiagram
class BitStack {
<<trait>>
+ push(bool)
+ pop()
+ top() Option<bool>
+ default()
}
class BitStackCore {
<<trait>>
+ from(u8)
+ not()
}
ParseContext --> BitStack
ParseContext --> BitStackCore
Class diagram for error handling and event emissionclassDiagram
class Error {
+ kind: ErrKind
+ character: u8
+ position: usize
+ new(kind, character, position)
}
class ErrKind {
<<enum>>
+ EmptyStream
+ UnfinishedStream
+ InvalidRoot
+ InvalidToken
+ UnescapedControlCharacter
+ TrailingComma
+ ContentEnded
+ UnopenedArray
+ UnopenedObject
+ MaxDepthReached
+ InvalidNumber
+ InvalidUnicodeEscape
+ InvalidStringEscape
+ ExpectedObjectKey
+ ExpectedObjectValue
+ ExpectedColon
+ ExpectedArrayItem
}
class Event {
+ Begin(EventToken)
+ End(EventToken)
+ ObjectStart
+ ObjectEnd
+ ArrayStart
+ ArrayEnd
}
Error --> ErrKind
Tokenizer --> Error
Tokenizer --> Event
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kaidokert - I've reviewed your changes - here's some feedback:
- The enormous
parse_chunk_inner
state machine should be broken into smaller per-state handler methods to improve readability and maintainability. - DirectParser and PullParserFlex duplicate much of the event-dispatch and termination logic—extract their shared patterns into a common utility to reduce divergence.
- Your custom
String
enum shadowsstd::string::String
—consider renaming it (e.g. toJsonString
) to avoid confusion with the standard type.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The enormous `parse_chunk_inner` state machine should be broken into smaller per-state handler methods to improve readability and maintainability.
- DirectParser and PullParserFlex duplicate much of the event-dispatch and termination logic—extract their shared patterns into a common utility to reduce divergence.
- Your custom `String` enum shadows `std::string::String`—consider renaming it (e.g. to `JsonString`) to avoid confusion with the standard type.
## Individual Comments
### Comment 1
<location> `stax/src/direct_parser.rs:251` </location>
<code_context>
+ }
+
+ /// Process event and update state, but defer complex processing
+ fn process_tokenizer_event(&mut self, event: ujson::Event) -> Result<EventResult, ParseError> {
+ Ok(match event {
+ // Container events
</code_context>
<issue_to_address>
Consider handling unknown or unexpected events explicitly.
Matching all known event variants explicitly and handling unknown events with an error or unreachable!() will help prevent silent failures if new event types are introduced.
Suggested implementation:
```rust
Ok(match event {
// Container events
ujson::Event::ObjectStart => EventResult::Complete(Event::StartObject),
ujson::Event::ObjectEnd => {
// Check if we're in the middle of parsing a number - if so, extract it first
if matches!(self.parser_state.state, crate::shared::State::Number(_)) {
log::debug!(
"DirectParser: ObjectEnd while in Number state - extracting number first"
);
// Extract the number first, then we'll emit EndObject on the next call
self.pending_container_end = Some(PendingContainerEnd::ObjectEnd);
EventResult::ExtractNumberFromContainer
} else {
EventResult::Complete(Event::EndObject)
}
}
ujson::Event::ArrayStart => EventResult::Complete(Event::StartArray),
ujson::Event::ArrayEnd => {
if matches!(self.parser_state.state, crate::shared::State::Number(_)) {
log::debug!(
"DirectParser: ArrayEnd while in Number state - extracting number first"
);
self.pending_container_end = Some(PendingContainerEnd::ArrayEnd);
EventResult::ExtractNumberFromContainer
} else {
EventResult::Complete(Event::EndArray)
}
}
ujson::Event::StringValue(s) => EventResult::Complete(Event::String(s)),
ujson::Event::NumberValue(n) => EventResult::Complete(Event::Number(n)),
ujson::Event::BoolValue(b) => EventResult::Complete(Event::Bool(b)),
ujson::Event::NullValue => EventResult::Complete(Event::Null),
ujson::Event::Key(k) => EventResult::Complete(Event::Key(k)),
// Explicitly handle unknown or future events
_ => return Err(ParseError::UnknownEvent),
```
- You must define `ParseError::UnknownEvent` in your `ParseError` enum if it does not already exist.
- If you want to use `unreachable!()` instead of returning an error, replace `return Err(ParseError::UnknownEvent),` with `unreachable!("Unknown ujson::Event variant encountered")`.
- Make sure all current `ujson::Event` variants are matched explicitly above the `_` arm.
</issue_to_address>
### Comment 2
<location> `stax/src/direct_parser.rs:494` </location>
<code_context>
+ crate::number_parser::parse_number_event(&self.direct_buffer, start_pos, from_container_end)
+ }
+ /// Fill buffer from reader
+ fn fill_buffer_from_reader(&mut self) -> Result<(), ParseError> {
+ if let Some(fill_slice) = self.direct_buffer.get_fill_slice() {
+ let bytes_read = self
</code_context>
<issue_to_address>
Potential for infinite loop if reader returns Ok(0) but not at EOF.
Add a safeguard to handle cases where Ok(0) is returned but EOF is not reached, or clarify the Reader trait contract to ensure Ok(0) always indicates EOF.
</issue_to_address>
### Comment 3
<location> `stax/src/direct_buffer.rs:103` </location>
<code_context>
+
+ /// Start unescaping and copy existing content from a range in the buffer
+ /// This handles the common case of starting escape processing partway through a string
+ pub fn start_unescaping_with_copy(
+ &mut self,
+ max_escaped_len: usize,
</code_context>
<issue_to_address>
Possible silent truncation when copying spans to scratch buffer.
Consider returning an error if the span exceeds the buffer size, rather than silently truncating the data.
</issue_to_address>
### Comment 4
<location> `stax/src/direct_buffer.rs:162` </location>
<code_context>
+ }
+
+ /// Append a single byte to the unescaped content
+ pub fn append_unescaped_byte(&mut self, byte: u8) -> Result<(), DirectBufferError> {
+ if self.unescaped_len >= self.buffer.len() {
+ return Err(DirectBufferError::BufferFull);
</code_context>
<issue_to_address>
No check for reserved escape space when appending unescaped bytes.
Currently, the method only checks the total buffer length and not the reserved escape space. Please ensure the `escape_reserve` constraint is enforced to prevent overwriting reserved space.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
pub fn append_unescaped_byte(&mut self, byte: u8) -> Result<(), DirectBufferError> {
if self.unescaped_len >= self.buffer.len() {
return Err(DirectBufferError::BufferFull);
=======
pub fn append_unescaped_byte(&mut self, byte: u8) -> Result<(), DirectBufferError> {
if self.unescaped_len >= self.buffer.len() - self.escape_reserve {
return Err(DirectBufferError::BufferFull);
>>>>>>> REPLACE
</suggested_fix>
### Comment 5
<location> `stax/src/json_number.rs:104` </location>
<code_context>
+ }
+
+ /// Check if this number would be a float (has decimal point or exponent).
+ pub fn is_float(&self) -> bool {
+ !self.is_integer()
+ }
</code_context>
<issue_to_address>
is_float() may return true for unsupported float types.
In cases where float support is disabled, clarify or document that `is_float()` only means the value is not an integer, not that a float is actually supported.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
/// Check if this number would be a float (has decimal point or exponent).
pub fn is_float(&self) -> bool {
!self.is_integer()
}
=======
/// Returns true if this number is not an integer (i.e., has a decimal point or exponent).
///
/// Note: This does not guarantee that float values are supported or enabled in this build.
/// It only indicates that the number is not an integer, regardless of float support.
pub fn is_float(&self) -> bool {
!self.is_integer()
}
>>>>>>> REPLACE
</suggested_fix>
### Comment 6
<location> `stax/src/number_parser.rs:39` </location>
<code_context>
+ let current_pos = extractor.current_position();
+
+ // Determine if we should exclude a delimiter from the number
+ let number_end = if from_container_end || (!extractor.is_empty()) {
+ // Came from container end OR not at EOF - number was terminated by delimiter, exclude it
+ current_pos.saturating_sub(1)
</code_context>
<issue_to_address>
Delimiter exclusion logic may not handle all edge cases.
Please review the conditions for `number_end` to ensure correct handling when the buffer is empty but a delimiter is present, particularly at EOF and after delimiters.
Suggested implementation:
```rust
// Determine if we should exclude a delimiter from the number
let number_end = if from_container_end {
// Came from container end - number was terminated by delimiter, exclude it
current_pos.saturating_sub(1)
} else if extractor.is_empty() {
// Buffer is empty, check if the last character was a delimiter
if extractor.last_char_was_delimiter() {
// At EOF and last char was delimiter - exclude it
current_pos.saturating_sub(1)
} else {
// At EOF and last char was not delimiter - use full span
current_pos
}
} else {
// Not at EOF - number was terminated by delimiter, exclude it
current_pos.saturating_sub(1)
};
```
- This change assumes the existence of a method `last_char_was_delimiter()` on the `NumberExtractor` trait or struct. If it does not exist, you will need to implement it to check if the last character read was a delimiter.
- If your delimiter logic is more complex, you may need to adjust the check accordingly.
</issue_to_address>
### Comment 7
<location> `tokenizer/src/bitstack/mod.rs:31` </location>
<code_context>
+ *self = (self.clone() << 1u8) | T::from(bit as u8);
+ }
+
+ fn pop(&mut self) -> Option<bool> {
+ let bit = (self.clone() & T::from(1)) != T::from(0);
+ *self = self.clone() >> 1u8;
</code_context>
<issue_to_address>
BitStack::pop() always returns Some, even if the stack is empty.
Currently, `pop` does not indicate when the stack is empty, which may cause bugs. Please update the implementation to track stack depth and return `None` when the stack is empty.
</issue_to_address>
### Comment 8
<location> `stax/src/slice_input_buffer.rs:22` </location>
<code_context>
+}
+
+impl<'a> InputBuffer for SliceInputBuffer<'a> {
+ fn is_past_end(&self) -> bool {
+ self.pos > self.data.len()
+ }
</code_context>
<issue_to_address>
is_past_end() may not behave as intended when pos == data.len().
Consider updating the condition to self.pos >= self.data.len() to align with typical buffer behavior and prevent off-by-one errors.
</issue_to_address>
### Comment 9
<location> `stax/tests/api_test.rs:5` </location>
<code_context>
+mod tests {
+ use super::*;
+
+ #[test]
+ fn test_coe2_no_escapes() {
+ let input = b"hello world";
</code_context>
<issue_to_address>
No tests for invalid JSON or error propagation in the new API.
Please add tests for malformed JSON inputs to ensure the parser correctly handles and reports errors for invalid cases.
</issue_to_address>
### Comment 10
<location> `tokenizer/tests/array_bitstack_test.rs:3` </location>
<code_context>
+mod tests {
+ use super::*;
+
+ #[test]
+ fn test_coe2_no_escapes() {
+ let input = b"hello world";
</code_context>
<issue_to_address>
No tests for popping from an empty ArrayBitStack.
Please add a test to confirm that popping from an empty ArrayBitStack returns None without panicking, ensuring proper handling of underflow.
</issue_to_address>
### Comment 11
<location> `stax/README.md:9` </location>
<code_context>
+built in memory, please use serde-json with no_std.
+
+However - pull parsing is useful when you need to process large streams within
+constained memory, without building the entire document, and just picking
+elements from the dataset that the application needs.
+
</code_context>
<issue_to_address>
Typo: 'constained' should be 'constrained'.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
constained memory, without building the entire document, and just picking
=======
constrained memory, without building the entire document, and just picking
>>>>>>> REPLACE
</suggested_fix>
### Comment 12
<location> `DESIGN.md:156` </location>
<code_context>
+So that the input can come no-copy from any source with low buffering
+
+Note std::io has Read trait, but unfortunately that's not available in core::, so probably have to
+make our own, and auto-implent it for arrays and slices or for anything that looks like AsRef<[u8]>
+
+## 7. TODO: Working with returned values
</code_context>
<issue_to_address>
Typo: 'auto-implent' should be 'auto-implement'.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
make our own, and auto-implent it for arrays and slices or for anything that looks like AsRef<[u8]>
=======
make our own, and auto-implement it for arrays and slices or for anything that looks like AsRef<[u8]>
>>>>>>> REPLACE
</suggested_fix>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
stax/README.md (2)
9-9
: Typo: 'constained' should be 'constrained'.
59-59
: Documentation inconsistency with default features.The documentation states that "full float and int64 support is enabled" by default, but the actual default features in Cargo.toml only include int64 support, not float support.
DESIGN.md (1)
156-156
: Typo: 'auto-implent' should be 'auto-implement'.stax/src/flex_parser.rs (2)
415-415
: Potential stack overflow from unbounded recursion.The recursive call could cause stack overflow with deeply nested or malformed JSON input. This issue is already tracked in GitHub issue #2.
183-185
: Preserve tokenizer error information instead of discarding it.The original tokenizer error is being discarded and replaced with a generic
ParseError::TokenizerError
. This loses valuable debugging information.Consider mapping the tokenizer error properly:
- if let Err(_tokenizer_error) = res { - return Err(ParseError::TokenizerError); + if let Err(tokenizer_error) = res { + return Err(ParseError::Tokenizer(tokenizer_error)); }stax/src/direct_parser.rs (1)
407-407
: Consider explicit handling of unknown eventsThe catch-all pattern
_ => EventResult::Continue
could mask unexpected or future event types. Consider matching all known event variants explicitly and handling unknown events with an error or unreachable macro for better debugging.
🧹 Nitpick comments (3)
stax/src/flex_parser.rs (1)
196-423
: Consider refactoring the largenext_event
method for better maintainability.The
next_event
method is over 200 lines long and handles many different token types. This makes it difficult to understand and maintain.Consider extracting the token handling logic into separate methods:
handle_container_tokens
for object/array start/endhandle_string_key_tokens
for string and key processinghandle_number_tokens
for number parsinghandle_value_tokens
for bool/null valueshandle_escape_tokens
for escape sequencesThis would make the main method more readable and each handler easier to test independently.
stax/src/escape_processor.rs (1)
276-280
: Consider documenting surrogate pair handling for full Unicode support.The comment correctly notes that emoji like U+1F60A require surrogate pairs in JSON. While the current implementation handles Basic Multilingual Plane (BMP) characters correctly, characters outside the BMP need surrogate pair decoding.
Would you like me to create an issue to track adding full Unicode support with surrogate pair handling? This would involve:
- Detecting high surrogate (U+D800-U+DBFF) and low surrogate (U+DC00-U+DFFF) ranges
- Combining surrogate pairs into the final codepoint
- Adding tests for emoji and other non-BMP characters
stax/src/direct_parser.rs (1)
500-503
: Consider preserving reader error detailsMapping all reader errors to
ParseError::ReaderError
loses important error information that could be valuable for debugging I/O issues. Consider adding a variant that wraps the original error or provides more specific error context.- .map_err(|_| ParseError::ReaderError)?; + .map_err(ParseError::ReaderError)?;This would require updating
ParseError::ReaderError
to wrap the original error type.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
DESIGN.md
(1 hunks)stax/README.md
(1 hunks)stax/src/direct_buffer.rs
(1 hunks)stax/src/direct_parser.rs
(1 hunks)stax/src/escape_processor.rs
(1 hunks)stax/src/flex_parser.rs
(1 hunks)stax/src/shared.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- stax/src/shared.rs
- stax/src/direct_buffer.rs
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
DESIGN.md (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/escape_processor.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/direct_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/flex_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/README.md (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
🧬 Code Graph Analysis (1)
stax/src/flex_parser.rs (7)
stax/src/json_number.rs (5)
fmt
(124-141)as_str
(81-86)as_int
(57-65)as_f64
(70-77)is_float
(104-106)tokenizer/src/tokenizer/mod.rs (4)
fmt
(232-238)new
(22-28)new
(222-228)new
(248-254)stax/src/shared.rs (6)
new
(87-92)unicode_escape_bounds
(169-174)invalid_unicode_length
(266-268)end_position_excluding_delimiter
(184-186)state_mismatch
(224-233)number_start_from_current
(145-147)stax/src/escape_processor.rs (2)
new
(161-166)process_escape_token
(51-55)stax/src/slice_input_buffer.rs (2)
new
(40-42)current_pos
(36-38)stax/src/copy_on_escape.rs (1)
new
(35-46)stax/src/number_parser.rs (2)
new
(95-101)parse_number_event_simple
(63-81)
🪛 LanguageTool
DESIGN.md
[uncategorized] ~152-~152: The preposition ‘to’ seems more likely in this position.
Context: ... input IMPORTANT!!! More: In addition of taking just slice [u8] as input, we sho...
(AI_HYDRA_LEO_REPLACE_OF_TO)
[grammar] ~162-~162: It appears that an article is missing.
Context: ... like println! is convenient and easy. Same should be done with Number, but it's a ...
(MISSING_ARTICLE)
[uncategorized] ~169-~169: A determiner appears to be missing. Consider inserting it.
Context: ...ting should do sensible default things. Most tricky is number formatting. The object...
(AI_EN_LECTOR_MISSING_DETERMINER)
stax/README.md
[uncategorized] ~45-~45: Did you mean: “By default,”?
Context: ...state, one bit for every nesting level. By default this is a 32-bit int, but can be change...
(BY_DEFAULT_COMMA)
[uncategorized] ~50-~50: The preposition ‘as’ seems more likely in this position.
Context: ...nt64 ( default ) - numbers are returned in int64 values * int32 - integers are re...
(AI_HYDRA_LEO_REPLACE_IN_AS)
[uncategorized] ~53-~53: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...pport is included. * float-error - Any floating point input will yield an error, to reduce fl...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~59-~59: Did you mean: “By default,”?
Context: ... Please see examples/no_float_demo.rs By default full float and int64 support is enabled...
(BY_DEFAULT_COMMA)
🔇 Additional comments (7)
stax/src/escape_processor.rs (1)
1-488
: Well-structured escape processing module with comprehensive test coverage.The module provides a clean separation of concerns with:
- Pure functions for escape processing that can be reused across components
- Clear documentation with usage examples
- Comprehensive test coverage including edge cases
- Proper error handling with appropriate error types
stax/src/direct_parser.rs (6)
9-82
: Well-designed trait and type definitionsThe
Reader
trait contract is clearly documented with strong EOF semantics, and the internal enums provide good type safety for state management. TheDirectParser
struct layout is comprehensive and well-organized.
115-252
: Complex but well-structured main parsing loopThe
next_event()
method handles multiple concerns effectively: pending events, buffer management, EOF detection, and tokenizer integration. The state transitions appear correct and the Unicode escape handling shows attention to detail.
411-496
: Well-structured content extraction with proper error handlingThe extraction methods properly handle both unescaped and borrowed content scenarios, with consistent patterns across string, key, and number types. The delegation to unified number parsing logic is a good design choice.
542-544
: Verify error handling strategy for Unicode escapesThe code ignores errors from
unicode_escape_collector.add_hex_digit()
with the assumption that "error will be handled by tokenizer". Ensure this approach doesn't mask important validation failures that could lead to inconsistent state.
557-699
: Well-implemented escape processing and state managementThe escape processing methods show good separation of concerns and proper integration with the DirectBuffer. The state management for unescaped content resets is handled correctly with the ProcessingState enum.
701-1396
: Comprehensive and well-structured test suiteThe test suite provides excellent coverage of core functionality, edge cases, and error scenarios. The comparison test between DirectParser and FlexParser ensures consistency across parser implementations. The tests are well-organized and use appropriate assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
stax/README.md (1)
59-59
: Documentation inconsistency with default featuresThe documentation states "full float and int64 support is enabled" by default, but this conflicts with the actual default features configuration which only includes
int64
without thefloat
feature.stax/src/direct_parser.rs (2)
500-503
: Reader error information is lostThe Reader error is being discarded and mapped to a generic
ParseError::ReaderError
, losing important error details that could help with debugging I/O issues.
407-407
: Consider explicit handling of unknown eventsThe catch-all pattern
_ => EventResult::Continue
could mask unexpected events. Consider matching all known event variants explicitly to improve error detection.
🧹 Nitpick comments (2)
stax/README.md (2)
50-55
: Fix markdown list indentationThe bullet points are incorrectly indented according to markdownlint. Lists should start at column 0 without leading spaces.
- * int64 ( default ) - numbers are returned in int64 values - * int32 - integers are returned as int32, to avoid 64-bit math on constrained targets, e.g. Cortex-M0 - * float - full float support is included. - * float-error - Any floating point input will yield an error, to reduce float math dependency - * float-skip - Float values are skipped. - * float-truncate - float values are truncated to integers. Scientific notation will generate an error +* int64 ( default ) - numbers are returned in int64 values +* int32 - integers are returned as int32, to avoid 64-bit math on constrained targets, e.g. Cortex-M0 +* float - full float support is included. +* float-error - Any floating point input will yield an error, to reduce float math dependency +* float-skip - Float values are skipped. +* float-truncate - float values are truncated to integers. Scientific notation will generate an error
45-45
: Add comma for better readabilityConsider adding a comma after "By default" for improved grammatical flow.
-By default this is a 32-bit int, but can be changed +By default, this is a 32-bit int, but can be changed
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
stax/Cargo.toml
(1 hunks)stax/README.md
(1 hunks)stax/src/direct_buffer.rs
(1 hunks)stax/src/direct_parser.rs
(1 hunks)stax/src/json_number.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- stax/Cargo.toml
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/direct_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/direct_buffer.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/README.md (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
🪛 LanguageTool
stax/README.md
[uncategorized] ~45-~45: Did you mean: “By default,”?
Context: ...state, one bit for every nesting level. By default this is a 32-bit int, but can be change...
(BY_DEFAULT_COMMA)
[uncategorized] ~50-~50: The preposition ‘as’ seems more likely in this position.
Context: ...nt64 ( default ) - numbers are returned in int64 values * int32 - integers are re...
(AI_HYDRA_LEO_REPLACE_IN_AS)
[uncategorized] ~53-~53: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...pport is included. * float-error - Any floating point input will yield an error, to reduce fl...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🪛 markdownlint-cli2 (0.17.2)
stax/README.md
50-50: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
51-51: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
52-52: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
53-53: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
54-54: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
55-55: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
🔇 Additional comments (5)
stax/src/json_number.rs (1)
1-347
: Excellent implementation of configurable number parsingThis module demonstrates excellent design with comprehensive feature flag support, proper error handling, and extensive test coverage. The conditional compilation logic correctly handles different number parsing strategies based on build configuration.
stax/src/direct_parser.rs (1)
84-103
: Well-structured constructor with clear state initializationThe
new
method properly initializes all state components and provides clear documentation of the state machine setup.stax/src/direct_buffer.rs (3)
121-124
: Proper error handling for buffer overflowGood improvement from past reviews - the code now returns
BufferFull
error instead of silently truncating data when the span exceeds buffer capacity.
167-176
: Escape reserve constraint properly enforcedThe
append_unescaped_byte
method correctly checks againstavailable_space
which accounts for the escape reserve, addressing the previous concern about not enforcing the escape reserve constraint.
514-531
: Clean trait implementation for number extractionThe
NumberExtractor
trait implementation provides a clean abstraction for number parsing integration with appropriate error mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
tokenizer/src/tokenizer/mod.rs (1)
859-865
: Trailing comma check for objects has been properly implementedThe past review comment about missing trailing comma checks for objects has been addressed. The implementation correctly detects and reports trailing commas in objects.
stax/src/direct_parser.rs (1)
524-526
: Error handling improvements properly implementedThe past review comments about error handling have been addressed:
- Reader errors are now properly wrapped in
ParseError::ReaderError
- Errors from
append_byte_to_escape_buffer
are correctly propagatedThese changes prevent silent failures and improve error transparency.
Also applies to: 684-687
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
stax/src/direct_parser.rs
(1 hunks)stax/src/slice_input_buffer.rs
(1 hunks)stax/tests/api_test_errors.rs
(1 hunks)stax/tests/configurable_numbers.rs
(1 hunks)stax/tests/debug_root_numbers.rs
(1 hunks)tokenizer/src/bitstack/mod.rs
(1 hunks)tokenizer/src/tokenizer/mod.rs
(1 hunks)tokenizer/tests/array_bitstack_test.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- tokenizer/tests/array_bitstack_test.rs
- stax/tests/configurable_numbers.rs
- tokenizer/src/bitstack/mod.rs
- stax/src/slice_input_buffer.rs
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/tests/debug_root_numbers.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/tests/api_test_errors.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
tokenizer/src/tokenizer/mod.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/direct_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
🧬 Code Graph Analysis (1)
stax/tests/debug_root_numbers.rs (1)
stax/src/flex_parser.rs (1)
new_with_buffer
(74-85)
🔇 Additional comments (6)
stax/tests/debug_root_numbers.rs (3)
1-2
: LGTM! Clear imports and purpose.The file comment clearly describes its purpose for debugging root-level number parsing, and the imports are appropriate for the testing functionality.
4-32
: Well-implemented debug helper with good safety mechanisms.The
test_json
helper function is well-designed with:
- Clear debug output showing input and parsed events
- Proper error handling for parsing failures
- Safety mechanism to prevent infinite loops (10-event limit)
- Appropriate scratch buffer allocation for test purposes
The implementation follows good practices for debug/test utilities.
34-48
: Comprehensive test coverage for JSON primitives and structures.The test function covers a good range of JSON inputs:
- All primitive types (numbers, strings, booleans, null)
- Both simple and complex structured data (objects and arrays)
- Comparison between root-level and nested values
This provides valuable debugging coverage for the JSON parser's handling of different value types.
stax/src/direct_parser.rs (3)
9-22
: Well-designed Reader trait with clear contractThe Reader trait has a well-documented contract that clearly specifies the EOF behavior, addressing previous concerns about potential infinite loops. The documentation explicitly states that returning 0 must indicate true end of stream.
49-61
: Excellent state machine design with enforced invariantsThe
ProcessingState
enum provides a robust state machine that enforces logical invariants. The separation betweenActive
andFinished
states prevents invalid state transitions and makes the code more maintainable.
814-880
: Comprehensive test coverage with excellent edge case handlingThe test suite thoroughly covers critical edge cases:
- Pending state handling for numbers at container boundaries
- Rapid consecutive container endings
- Parser consistency verification between implementations
- Various escape sequence scenarios
These tests ensure robust parser behavior in complex scenarios.
Also applies to: 904-969, 999-1042
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (3)
stax/src/number_parser.rs (1)
40-47
: Improved delimiter exclusion logic addresses previous concerns.The delimiter handling logic now properly considers both container end context and EOF conditions, which addresses the previous feedback about edge case handling. The logic correctly excludes delimiters when terminated by container end or when not at EOF.
stax/src/flex_parser.rs (1)
391-411
: EscapeSequence event handlers serve an important architectural purpose.Based on the codebase learning, these handlers exist to avoid catch-all patterns in match statements, even though this parser variant doesn't process EscapeSequence events. The other parser variant (direct parser) does need them, making this a sound architectural decision.
tokenizer/src/tokenizer/mod.rs (1)
861-867
: The object trailing comma check is properly implemented.The past review comment about missing trailing comma check for objects is no longer applicable. The code correctly detects and reports trailing commas in objects.
🧹 Nitpick comments (4)
stax/src/config_check.rs (1)
24-29
: Remove redundant validation check.This check for all three float behavior features being enabled simultaneously is redundant since the pairwise checks above already prevent this scenario. If any two features are mutually exclusive, then having all three enabled is impossible.
Consider removing this redundant check:
-#[cfg(all( - feature = "float-skip", - feature = "float-error", - feature = "float-truncate" -))] -compile_error!("Cannot enable multiple float behavior features: choose only one of 'float-skip', 'float-error', or 'float-truncate'");tokenizer/src/bitstack/mod.rs (1)
105-185
: Consider adding empty stack behavior tests.While the current tests are comprehensive, they don't verify edge case behavior when the stack is empty, such as what happens when
pop()
ortop()
are called on a freshly created stack.Add tests for empty stack edge cases:
+ #[test] + fn test_empty_stack_behavior() { + let mut bitstack = 0u8; + // Test behavior on empty stack + let first_pop = bitstack.pop(); + let first_top = bitstack.top(); + + let mut array_bitstack: ArrayBitStack<2, u8> = ArrayBitStack::default(); + let array_pop = array_bitstack.pop(); + let array_top = array_bitstack.top(); + + // Document the expected behavior for empty stacks + // (currently returns LSB of zero-initialized data) + }tokenizer/src/tokenizer/mod.rs (1)
397-1050
: Consider refactoring the parse_chunk_inner method for better maintainability.The
parse_chunk_inner
method is over 650 lines long with a massive match statement. While the implementation appears correct, this complexity could make maintenance and debugging more difficult.Consider extracting logical groups of match arms into separate helper methods, such as:
impl<T: BitStack + core::fmt::Debug, D: BitStackCore> Tokenizer<T, D> { fn handle_number_state(&mut self, state: &Num, data: u8, pos: usize, callback: &mut dyn FnMut(Event, usize)) -> Result<State, Error> { // Extract number parsing logic here } fn handle_string_state(&mut self, state: &String, key: bool, data: u8, pos: usize, callback: &mut dyn FnMut(Event, usize)) -> Result<State, Error> { // Extract string parsing logic here } fn handle_structural_tokens(&mut self, data: u8, pos: usize, callback: &mut dyn FnMut(Event, usize)) -> Result<State, Error> { // Extract object/array start logic here } }stax/src/slice_input_buffer.rs (1)
52-55
: Remove unused method or add usage.The
slice_to_current
method is never used in the codebase.Either remove this method if it's not needed, or ensure it's used where appropriate:
- /// Gets a slice from start position to current position - 1. - /// Useful for extracting tokens that end at the current position. - pub fn slice_to_current(&self, start: usize) -> &'a [u8] { - &self.data[start..self.pos.saturating_sub(1)] - }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
.github/workflows/build.yaml
(1 hunks)stax/src/config_check.rs
(1 hunks)stax/src/copy_on_escape.rs
(1 hunks)stax/src/direct_buffer.rs
(1 hunks)stax/src/direct_parser.rs
(1 hunks)stax/src/escape_processor.rs
(1 hunks)stax/src/flex_parser.rs
(1 hunks)stax/src/json_number.rs
(1 hunks)stax/src/json_string.rs
(1 hunks)stax/src/lib.rs
(1 hunks)stax/src/number_parser.rs
(1 hunks)stax/src/shared.rs
(1 hunks)stax/src/slice_input_buffer.rs
(1 hunks)tokenizer/src/bin/main.rs
(1 hunks)tokenizer/src/bitstack/mod.rs
(1 hunks)tokenizer/src/lib.rs
(1 hunks)tokenizer/src/tokenizer/mod.rs
(1 hunks)tokenizer/tests/array_bitstack_test.rs
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- stax/src/json_string.rs
🚧 Files skipped from review as they are similar to previous changes (5)
- tokenizer/src/bin/main.rs
- tokenizer/src/lib.rs
- tokenizer/tests/array_bitstack_test.rs
- stax/src/lib.rs
- stax/src/direct_buffer.rs
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/config_check.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.845Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.
stax/src/number_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/flex_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/copy_on_escape.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/escape_processor.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
tokenizer/src/tokenizer/mod.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/direct_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/shared.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/slice_input_buffer.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
🪛 actionlint (1.7.7)
.github/workflows/build.yaml
13-13: the runner of "actions/checkout@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
16-16: the runner of "actions-rs/toolchain@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
23-23: the runner of "actions-rs/cargo@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
39-39: the runner of "actions/checkout@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
42-42: the runner of "actions-rs/toolchain@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🪛 GitHub Check: Check
stax/src/escape_processor.rs
[warning] 142-142:
associated function parse_unicode_escape_from_str
is never used
stax/src/shared.rs
[warning] 246-246:
associated function validate_buffer_bounds
is never used
stax/src/slice_input_buffer.rs
[warning] 53-53:
method slice_to_current
is never used
🔇 Additional comments (23)
stax/src/config_check.rs (3)
1-12
: LGTM! Well-structured compile-time validation.The module documentation is clear and the integer width validation correctly prevents conflicting feature combinations with an informative error message.
14-23
: LGTM! Comprehensive pairwise validation.The pairwise validation of float behavior features is thorough and covers all possible conflicting combinations with clear error messages.
31-39
: LGTM! Logical feature conflict prevention.The validation correctly prevents the main 'float' feature from being combined with float-behavior features, with clear explanations that the behavior features are only intended for when float parsing is disabled.
stax/src/shared.rs (2)
103-189
: Excellent use of saturating arithmetic for safety.The
ContentRange
utility struct demonstrates good safety practices by usingsaturating_sub()
to prevent underflow in position calculations. This prevents potential panics when dealing with boundary conditions in JSON parsing.
191-276
: Well-designed error handling utilities.The
ParserErrorHandler
provides consistent error creation patterns and UTF-8 validation. The predefined error messages instate_mismatch
work well for no_std environments where dynamic formatting isn't available.tokenizer/src/bitstack/mod.rs (1)
63-78
: Good resolution of hardcoded bit shift issues.The implementation now correctly calculates bit width dynamically using
core::mem::size_of::<T>() * 8
, which addresses the previous critical issues with hardcoded shifts that only worked for u8 types.stax/src/number_parser.rs (2)
6-24
: Well-designed abstraction for unified number parsing.The
NumberExtractor
trait provides a clean abstraction that allows the same parsing logic to work with different buffer implementations (FlexParser and DirectParser), promoting code reuse and consistency.
85-166
: Comprehensive test coverage with realistic scenarios.The tests effectively cover different parsing scenarios including simple parsing, container end delimiter exclusion, and EOF handling. The mock implementation provides good isolation for testing the parsing logic.
stax/src/flex_parser.rs (3)
30-87
: Excellent dual constructor design for different use cases.The two constructors (
new
andnew_with_buffer
) provide a clean API that optimizes for the common case (no escapes) while supporting the complex case (with escapes). The zero-length internal buffer is a clever optimization.
200-425
: Robust event processing with comprehensive state management.The
next_event
method demonstrates solid parser design with proper state transitions, error handling, and delegation to specialized components likeCopyOnEscape
andUnicodeEscapeCollector
. The pattern matching is comprehensive and handles all tokenizer events appropriately.
428-770
: Thorough test coverage validates parser behavior.The test suite comprehensively covers different JSON constructs (objects, arrays, numbers, booleans, nulls), string handling (both borrowed and unescaped), escape sequences, and Unicode escapes. The tests effectively validate both the fast path (zero-copy) and slow path (unescaping) optimizations.
stax/src/copy_on_escape.rs (1)
10-175
: Well-implemented copy-on-escape mechanism with proper buffer management.The implementation correctly handles:
- Zero-copy optimization for strings without escapes
- Proper bounds checking to prevent buffer overflows
- UTF-8 validation for safety
- Clever global scratch position tracking to enable buffer reuse across multiple strings
The comprehensive test coverage demonstrates the robustness of the implementation.
stax/src/direct_parser.rs (5)
11-84
: Well-designed trait and type definitions.The
Reader
trait contract is clearly documented, and the enum types provide comprehensive state management for the streaming parser. TheProcessingState
enum effectively enforces logical invariants.
117-254
: Solid implementation of the main parsing state machine.The
next_event()
method properly handles complex scenarios including pending container ends, EOF detection, and tokenizer integration. The state transitions are logical and the event processing delegation is well-structured.
257-433
: Comprehensive event processing with proper separation of concerns.The method handles all known
ujson::Event
variants explicitly (addressing previous feedback about catch-all patterns) and correctly implements the complex logic for container ends that require pending number extraction. Escape sequence handling is properly delegated to specialized methods.
435-520
: Well-implemented content extraction with proper state validation.The extraction methods correctly validate parser state, handle both borrowed and unescaped content variants, and use unified number parsing logic for consistency. Error handling provides meaningful context through
ParserErrorHandler
.
580-1422
: Excellent escape processing implementation and comprehensive test coverage.The escape processing correctly handles both simple and Unicode escape sequences using shared components. The state management ensures proper cleanup of unescaped content. The test suite is exemplary, covering edge cases, error recovery, and parser consistency validation.
stax/src/escape_processor.rs (3)
142-142
: False positive: Function is used in testsThe static analysis warning about
parse_unicode_escape_from_str
being unused is incorrect - it's used in the test module at lines 312 and 315.
10-149
: Well-designed escape processing utilitiesThe
EscapeProcessor
implementation is excellent:
- Comprehensive handling of all JSON escape sequences
- Proper validation and error handling
- Modular design with static methods for reusability
- The previous error type issue has been properly addressed
151-212
: Robust Unicode escape collector implementationThe
UnicodeEscapeCollector
provides a safe, stateful way to collect Unicode escape sequences:
- Proper bounds checking prevents buffer overflow
- Clear API with reset functionality
- Good separation of concerns between collection and processing
stax/src/json_number.rs (3)
14-112
: Excellent configurable number representation designThe
NumberResult
andJsonNumber
enums provide a well-thought-out abstraction:
- Clear handling of different number types based on feature flags
- Zero-allocation design with borrowed/copied string variants
- The suggested documentation for
is_float()
has been properly implemented- Comprehensive API for accessing parsed values and raw strings
171-210
: Robust float handling for embedded scenariosThe float parsing implementation without float support is well-designed:
- Correctly errors on scientific notation in truncate mode (avoiding float math)
- Safe integer extraction from decimal numbers
- Clear behavior for each configuration mode
- The previous undefined variable issue has been fixed
128-147
: Smart Display implementation preserves precisionThe Display implementation uses an intelligent strategy:
- Shows parsed values when they fit
- Falls back to raw strings for overflow/unsupported cases
- This ensures no loss of precision for large numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
.github/workflows/build.yaml (2)
33-36
:matrix
key is misspelled → job will receive no feature flagThe first include entry sets
fatures
instead offeatures
, so
${{ matrix.features }}
expands to an empty string → the build runs only the “No features” variant and silently skips the default-feature build.- - name: Default - fatures: "" + - name: Default + features: ""
48-50
: Step name says “test” but only builds; tests never runThe CI never executes
cargo test
, defeating the purpose of the “Tests” job.- - name: Run cargo test - working-directory: stax - run: cargo build ${{ matrix.features }} + - name: Run cargo tests + working-directory: stax + run: cargo test ${{ matrix.features }}
🧹 Nitpick comments (1)
.github/workflows/build.yaml (1)
22-26
: Opportunity: cache Cargo registry & target for faster buildsA simple
actions/cache
step can cut CI times dramatically:- uses: actions/cache@v4 with: path: | ~/.cargo/registry ~/.cargo/git target key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/build.yaml
(1 hunks)stax/src/flex_parser.rs
(1 hunks)stax/src/shared.rs
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.754Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/shared.rs (3)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.754Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.845Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.
stax/src/flex_parser.rs (2)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.754Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
🪛 actionlint (1.7.7)
.github/workflows/build.yaml
16-16: the runner of "actions-rs/toolchain@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
39-39: the runner of "actions/checkout@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
42-42: the runner of "actions-rs/toolchain@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🔇 Additional comments (9)
stax/src/flex_parser.rs (5)
44-73
: Well-designed constructor with clear constraints.The
new
method properly handles the zero-allocation case with good documentation about when to use it versusnew_with_buffer
. The mutable reference to an empty slice for the internal buffer is a clever approach.
177-203
: Appropriate error handling for tokenizer integration.The error handling properly converts tokenizer errors to
ParseError::TokenizerError
and handles end-of-data scenarios correctly. The callback-based event collection is well-implemented.
214-412
: Complex but well-structured main parsing loop.The
next_event
method effectively handles all JSON event types with proper state transitions. The use ofEventResult
enum for control flow is clean, and the match expressions are comprehensive.The loop-based structure properly avoids the stack overflow issues that could occur with recursive implementations.
334-341
: Proper handling of EscapeSequence events.These handlers correctly implement the design pattern of avoiding catch-all matches while not processing events that aren't needed by this parser variant, consistent with the architectural decision explained in the retrieved learnings.
415-757
: Comprehensive test coverage.The test suite thoroughly covers the parser functionality including edge cases, escape handling, Unicode processing, and the copy-on-escape optimization. The tests demonstrate both the zero-copy and unescaping paths effectively.
stax/src/shared.rs (4)
6-29
: Well-designed Event enum with proper lifetime management.The Event enum comprehensively covers all JSON parsing events with appropriate lifetime parameters that support both borrowed and owned string data through the String and JsonNumber types.
31-65
: Comprehensive error handling with conditional compilation.The ParseError enum covers all necessary error cases including tokenizer errors, buffer constraints, UTF-8 validation, and specialized cases like Unicode handling. The conditional compilation for float features is properly implemented.
107-189
: Safe and well-documented utility functions.The ContentRange utility functions use
saturating_sub()
consistently to prevent underflow and are well-documented with clear parameter descriptions. These functions provide a clean abstraction for boundary calculations across different parsing contexts.
195-251
: Consistent error handling utilities.The ParserErrorHandler provides good abstractions for common error creation patterns, including UTF-8 validation and state mismatch handling with predefined messages for no_std compatibility.
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an impressive initial implementation, establishing a solid foundation for a streaming JSON parser. The code is well-structured, extensively documented, and includes a comprehensive set of features and tests. The design choices, such as zero-allocation, configurable number handling, and the "copy-on-escape" strategy, are well-thought-out and clearly explained in the design documents.
My review has identified a few areas for improvement, including a critical bug related to infinite recursion and some opportunities to enhance code clarity and maintainability. Please see the detailed comments below.
Overall, this is excellent work. I'm looking forward to seeing this project evolve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
stax/src/direct_parser.rs (1)
25-38
: Address the misleading 'static lifetimes in EventResult.The
Complete
variant uses'static
lifetimes which is misleading sinceEvent
is generic over lifetimes'a
and'b
. While this works because the variant is only used for lifetime-independent events (containers, booleans, null), it obscures the actual lifetime relationships and could cause confusion during maintenance.Consider making
EventResult
generic over the buffer lifetime to improve type safety and clarity:-enum EventResult { +enum EventResult<'b> { /// Event processing complete, return this event - Complete(Event<'static, 'static>), + Complete(Event<'b, 'b>), /// Continue processing, no event to return yet Continue, // ... rest unchanged }This change would require updating the method signatures but would make the lifetime relationships explicit and prevent potential issues if the usage patterns change in the future.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
stax/src/direct_buffer.rs
(1 hunks)stax/src/direct_parser.rs
(1 hunks)stax/src/flex_parser.rs
(1 hunks)stax/tests/configurable_numbers.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- stax/tests/configurable_numbers.rs
- stax/src/flex_parser.rs
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.754Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/direct_parser.rs (2)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.754Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
stax/src/direct_buffer.rs (2)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.754Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
🧬 Code Graph Analysis (1)
stax/src/direct_parser.rs (9)
stax/examples/direct_parser_demo.rs (2)
read
(28-48)new
(16-22)stax/src/json_number.rs (2)
fmt
(129-146)as_str
(83-88)stax/src/json_string.rs (2)
fmt
(44-46)as_str
(18-23)stax/src/direct_buffer.rs (1)
new
(48-59)stax/src/slice_input_buffer.rs (2)
new
(44-46)current_pos
(40-42)stax/src/flex_parser.rs (3)
new
(60-73)next
(200-205)next_event
(209-386)stax/src/shared.rs (8)
new
(89-94)quote_position_from_current
(159-161)number_start_from_current
(147-149)state_mismatch
(226-235)bytes_to_utf8_str
(203-205)string_content_bounds
(117-121)string_content_bounds_before_escape
(131-138)invalid_unicode_escape
(238-240)stax/src/escape_processor.rs (3)
new
(152-157)hex_count
(198-200)process_escape_token
(53-57)stax/src/number_parser.rs (2)
new
(97-103)parse_number_event
(33-61)
🔇 Additional comments (13)
stax/src/direct_parser.rs (7)
132-242
: Excellent main parsing loop implementation.The core parsing logic is well-architected with clear separation of concerns:
- Proper buffer filling and end-of-data handling
- Clean integration with tokenizer and DirectBuffer
- Appropriate error propagation
- Logical flow control with deferred event extraction
The state management and event processing patterns are consistent with the overall parser design.
194-202
: Well-implemented Unicode escape sequence handling.The Unicode escape processing correctly handles the tokenizer events:
- Proper initialization and hex digit collection on Begin events
- Correct completion and processing on End events
- Good integration with the shared UnicodeEscapeCollector
The special case handling in the main loop ensures hex digits are captured at the right tokenizer state transitions.
Also applies to: 351-357
529-630
: Robust escape sequence processing implementation.The escape handling methods demonstrate excellent architecture:
- Proper state management through ProcessingState enum
- Clean integration with DirectBuffer for unescaping
- Unified use of EscapeProcessor for simple escapes
- Comprehensive Unicode escape handling with UnicodeEscapeCollector
- Appropriate error handling and validation
The separation of concerns between different escape types and the deferred processing approach aligns well with the parser's overall design.
459-471
: Good integration with unified number parsing.The number extraction correctly delegates to the shared
number_parser
module, ensuring consistent behavior across parser implementations. Thefrom_container_end
context parameter properly handles delimiter interpretation for numbers terminated by container ends.
674-1437
: Comprehensive and well-structured test suite.The test coverage is excellent with:
- Complete coverage of JSON constructs (objects, arrays, strings, numbers, booleans, nulls)
- Proper edge case testing (nested containers, empty containers, error conditions)
- Appropriate feature flag handling for different build configurations
- Valuable parser comparison tests ensuring consistency with FlexParser
- Thorough escape sequence and Unicode testing
The conditional compilation based on feature flags ensures tests work correctly across all supported configurations.
473-486
: Excellent error handling and propagation.The error handling throughout the parser is well-designed:
- Proper error mapping from Reader to ParseError::ReaderError
- Consistent error propagation from DirectBuffer operations
- Meaningful error messages for state validation
- Defensive error handling for unexpected tokenizer states
The error handling patterns ensure robustness without sacrificing performance.
Also applies to: 238-240
1-672
: Excellent overall architecture and implementation.The DirectParser demonstrates outstanding software engineering with:
Architecture:
- Clean separation between tokenization, buffer management, and event processing
- Elegant state machine design with ProcessingState enum
- Proper abstraction layers and interfaces
Integration:
- Excellent reuse of shared utilities (EscapeProcessor, UnicodeEscapeCollector, number_parser)
- Consistent patterns with other parser implementations
- Good integration with ujson tokenizer
Performance:
- Zero-copy string extraction when possible
- Efficient single-buffer design with DirectBuffer
- Minimal allocations and smart escape processing
Robustness:
- Comprehensive error handling and edge case coverage
- Defensive programming against invalid states
- Thorough test coverage with feature flag awareness
This implementation successfully provides a streaming JSON parser that complements the existing PullParserFlex while maintaining consistency and code quality standards.
stax/src/direct_buffer.rs (6)
33-59
: Well-designed buffer structure and initialization.The DirectBuffer design demonstrates excellent architectural decisions:
- Clear field layout with comprehensive documentation
- Smart escape reserve calculation (10% or 64 bytes minimum)
- Good separation of concerns between tokenization position and data boundaries
- Zero-copy design principles clearly documented
The constructor properly initializes all fields and calculates appropriate escape reserve for different buffer sizes.
62-81
: Safe and efficient buffer reading operations.The buffer reading methods demonstrate good safety practices:
- Proper bounds checking before accessing buffer data
- Appropriate error handling with meaningful error types
- Safe arithmetic using saturating_sub to prevent underflow
- Clear and simple interfaces for position management
85-101
: Robust buffer filling operations for Reader integration.The buffer filling methods provide safe integration with streaming input:
- get_fill_slice properly returns available space or None when full
- mark_filled includes validation to prevent buffer overruns
- Clean interface that works well with the Reader trait contract
- Appropriate error handling for invalid operations
105-178
: Excellent escape processing support with proper safety guarantees.The escape processing implementation demonstrates careful attention to safety:
- start_unescaping_with_copy properly handles content copying with bounds checking
- append_unescaped_byte correctly enforces escape reserve constraints
- Proper error handling prevents buffer overruns and invalid states
- Clean lifecycle management with clear_unescaped for cleanup
The implementation addresses past review concerns about escape reserve enforcement and silent truncation.
516-533
: Correct NumberExtractor implementation.The NumberExtractor trait implementation is now properly implemented:
- get_number_slice correctly delegates to get_string_slice with appropriate error mapping
- current_position returns the tokenize_pos field directly (no recursion)
- is_empty properly compares tokenize_pos >= data_end (no recursion)
This addresses the infinite recursion issue mentioned in past reviews and provides the correct interface for unified number parsing.
216-514
: Comprehensive test coverage with excellent edge case handling.The test suite demonstrates thorough validation:
- Complete coverage of buffer operations (filling, advancing, reading)
- Proper testing of error conditions and boundary cases
- Excellent edge case testing (minimal buffers, buffer full scenarios)
- Thorough validation of escape reserve functionality
- Good testing of the escape processing lifecycle
The tests include important scenarios like tiny buffers and escape reserve constraints that ensure robustness in real-world usage.
Summary by Sourcery
Initialize the workspace with core JSON streaming components and high-level pull parsers. Establish the
ujson
tokenizer crate for event‐based non-recursive tokenization, and build thestax
crate on top of it with both slice-based and reader-based pull parser APIs. Provide zero-allocation string unescaping, configurable number parsing policies, arbitrary nesting depth via BitStack abstractions, and ship extensive tests, examples and documentation.New Features:
Enhancements:
Build:
CI:
Documentation:
Tests:
Chores:
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores