Skip to content

Conversation

kaidokert
Copy link
Owner

@kaidokert kaidokert commented Jun 28, 2025

Summary by Sourcery

Initialize the workspace with core JSON streaming components and high-level pull parsers. Establish the ujson tokenizer crate for event‐based non-recursive tokenization, and build the stax crate on top of it with both slice-based and reader-based pull parser APIs. Provide zero-allocation string unescaping, configurable number parsing policies, arbitrary nesting depth via BitStack abstractions, and ship extensive tests, examples and documentation.

New Features:

  • Add ujson tokenizer crate for streaming JSON token emission of tokens and structural events
  • Introduce stax crate with PullParserFlex (slice-based) and DirectParser (Reader-based) pull‐parser APIs
  • Implement zero-allocation string unescaping via CopyOnEscape and DirectBuffer
  • Support configurable number parsing with compile-time int32/int64 and float behavior options
  • Provide arbitrary nesting depth tracking via BitStack trait and ArrayBitStack implementation

Enhancements:

  • Set up a unified workspace combining tokenizer and stax crates
  • Unify number parsing logic in a shared number_parser module
  • Modularize escape processing into a standalone EscapeProcessor and UnicodeEscapeCollector

Build:

  • Initialize Cargo workspace with stax and tokenizer crates and configure feature flags in Cargo.toml

CI:

  • Add pre-commit configuration for linting, formatting and file checks

Documentation:

  • Add DESIGN.md and CONFIGURATION.md with design notes, API guidance and feature explanations
  • Include README files and examples illustrating pull parser usage and configurations

Tests:

  • Include comprehensive unit tests for tokenizer, BitStack, parsers, escape processing and number parsing
  • Provide integration tests and examples covering different parser modes, features and edge cases

Chores:

  • Add TODO.md with future work items

Summary by CodeRabbit

  • New Features

    • Introduced a streaming JSON parser supporting incremental input with direct escape processing.
    • Added flexible BitStack implementations for configurable JSON nesting depth.
    • Provided multiple new example programs demonstrating parser usage, configurable nesting, and number parsing modes.
    • Added a unified number parsing abstraction supporting various integer and float configurations.
    • Included a JSON string type supporting zero-copy and unescaped variants.
    • Introduced a comprehensive JSON tokenizer with detailed event emission and error reporting.
    • Added utilities for processing JSON escape sequences and Unicode escapes.
    • Implemented buffer management modules for efficient tokenization and parsing.
    • Provided a new simple API for the JSON pull parser with ergonomic event iteration.
  • Bug Fixes

    • Ensured consistent line endings across platforms via repository configuration.
    • Updated ignore rules to exclude unnecessary files from version control.
  • Documentation

    • Added detailed design and configuration documents explaining parser architecture and feature flags.
    • Included README files describing usage and configuration for parser and tokenizer crates.
  • Tests

    • Added extensive tests covering parser APIs, configurable number parsing, tokenizer correctness, and BitStack implementations.
    • Included example-driven tests demonstrating parser behavior under various configurations.
    • Added comprehensive error handling tests for the JSON parsing API.
  • Chores

    • Added repository configuration files for pre-commit hooks and workspace management.
    • Configured Cargo workspace and package manifests for modular development.
    • Added compile-time feature validation to prevent incompatible configurations.

@kaidokert
Copy link
Owner Author

/gemini review

Copy link

coderabbitai bot commented Jun 28, 2025

Walkthrough

This change introduces the initial implementation of a zero-allocation, no_std-friendly JSON pull parser library and its supporting tokenizer. It adds parser and tokenizer crates, detailed configuration and design documentation, flexible number and string handling, escape processing, multiple parser implementations, comprehensive examples, and extensive tests. Workspace and development configuration files are also included.

Changes

File(s) / Path(s) Change Summary
.gitattributes, .gitignore, .pre-commit-config.yaml Added/updated repository configuration for line endings, ignored files, and pre-commit hooks.
Cargo.toml Added workspace manifest listing member crates.
CONFIGURATION.md, DESIGN.md, TODO.md Added documentation for configuration, design, and project tasks.
demos/Cargo.toml, demos/src/lib.rs Added minimal demo crate setup.
stax/Cargo.toml, stax/README.md Added crate manifest and introductory documentation for the JSON parser.
stax/examples/advanced_bitstack_demo.rs, stax/examples/array_bitstack_demo.rs,
stax/examples/direct_parser_demo.rs, stax/examples/no_float_demo.rs,
stax/examples/simple_api_demo.rs
Added multiple example binaries demonstrating parser features, configuration, and usage.
stax/src/copy_on_escape.rs, stax/src/direct_buffer.rs, stax/src/direct_parser.rs,
stax/src/escape_processor.rs, stax/src/flex_parser.rs, stax/src/json_number.rs,
stax/src/json_string.rs, stax/src/lib.rs, stax/src/number_parser.rs,
stax/src/shared.rs, stax/src/slice_input_buffer.rs
Introduced core parser modules, buffer management, number/string handling, escape processing, and shared utilities.
stax/src/config_check.rs Added compile-time feature flag validation to prevent incompatible configurations.
stax/tests/api_test.rs, stax/tests/api_test_errors.rs, stax/tests/configurable_numbers.rs, stax/tests/debug_root_numbers.rs Added test suites for API behavior, error handling, configurable number parsing, and debugging root-level numbers.
tokenizer/Cargo.toml, tokenizer/README.md Added tokenizer crate manifest and documentation.
tokenizer/src/bin/main.rs Added tokenizer binary for file-based parsing.
tokenizer/src/bitstack/mod.rs, tokenizer/src/lib.rs, tokenizer/src/tokenizer/mod.rs Implemented generic bitstack, tokenizer core, and public API.
tokenizer/src/tokenizer/testdata/i_structure_500_nested_arrays.json Added test data for deeply nested arrays.
tokenizer/tests/array_bitstack_test.rs Added unit tests for ArrayBitStack.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Parser
    participant Tokenizer
    participant Buffer
    participant ScratchBuffer

    User->>Parser: Create PullParser(input, scratch_buffer)
    loop Event Iteration
        Parser->>Tokenizer: next_token()
        Tokenizer-->>Parser: EventToken (e.g., String, Number, StartObject)
        alt String Event
            Parser->>Buffer: Get string slice
            alt No escapes
                Buffer-->>Parser: Borrowed slice
            else Escapes found
                Parser->>ScratchBuffer: Unescape and copy
                ScratchBuffer-->>Parser: Unescaped slice
            end
            Parser-->>User: Event::String(Borrowed/Unescaped)
        else Number Event
            Parser->>Buffer: Get number slice
            Parser->>Parser: Parse number (int/float/overflow)
            Parser-->>User: Event::Number(JsonNumber)
        else Structural Event
            Parser-->>User: Event (StartObject, EndArray, etc.)
        end
    end
    Parser-->>User: Event::EndDocument
Loading

Poem

🐇
In a warren of code, JSON hops through,
With stacks of bits and buffers too.
Strings unescape, numbers flex and bend,
Parsers pull events from start to end.
Docs and tests in cozy array,
This rabbit says: "The parser’s ready—hooray!"

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @kaidokert, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request represents the initial implementation of a highly configurable, zero-allocation JSON pull parser designed for no_std environments, particularly embedded systems. It establishes a multi-crate workspace, lays down core architectural principles, and provides robust parsing capabilities with a focus on memory efficiency and compile-time configurability.

Highlights

  • Initial Project Setup: This pull request establishes the foundational structure for a new Rust-based JSON pull parser, including workspace configuration, Git attributes, and pre-commit hooks for consistent code quality and formatting.
  • Core JSON Parser Implementation: The stax crate introduces a no_std, zero-allocation JSON pull parser with flexible configuration options for embedded environments, including configurable integer widths (i32/i64) and various float handling behaviors (full support, error on float, truncate float, or raw string preservation).
  • Streaming and Slice-based Parsers: Two primary parser implementations are provided: a DirectParser for streaming input from a Reader trait with a single internal buffer, and a PullParser (FlexParser) for parsing from a slice with a copy-on-escape optimization for string unescaping.
  • Modular Tokenizer: A separate ujson tokenizer crate is introduced, featuring a non-recursive design and a generic BitStack implementation (including ArrayBitStack) for efficient tracking of arbitrary nesting depths, crucial for embedded systems.
  • Comprehensive Documentation and Examples: Detailed DESIGN.md and CONFIGURATION.md documents outline the architectural decisions and configurable aspects of the parser. A demos crate provides multiple examples showcasing different parser configurations and usage patterns.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the initial implementation of the stax JSON parser and its underlying tokenizer. The project is well-structured with a workspace, examples, and documentation. The code demonstrates a strong focus on no_std compatibility and performance, particularly with the copy-on-escape mechanism and configurable features.

I've identified several areas for improvement, including a critical bug in the ArrayBitStack implementation, a potential stack overflow in the recursive parser, and several places where errors are not propagated correctly. I've also made suggestions to improve dependency management. Overall, this is a very solid foundation for a high-performance JSON parser. Addressing the issues raised will significantly improve its robustness and correctness.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The code changes introduce a high-level, zero-allocation JSON pull parser with flexible configuration for integer and float handling, suitable for embedded and no_std environments. I've found a few issues, including a critical correctness bug in the ArrayBitStack implementation, and some dependency improvements. Addressing these will significantly improve the robustness and correctness of the library.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🔭 Outside diff range comments (1)
tokenizer/Cargo.toml (1)

1-16: Align crate metadata & naming with folder, add missing fields

The crate lives under tokenizer/ but is published as ujson, which is confusing. Pick one or document the rationale. Also add standard metadata and tighten feature gating:

 [package]
-name = "ujson"
-version = "0.1.0"
+name = "tokenizer"           # or rename the folder; keep consistent
+version = "0.1.0"
 edition = "2021"
+license = "MIT OR Apache-2.0"
+repository = "https://github.com/kaidokert/picojson-rs"
+description = "Zero-allocation JSON tokenizer used by the stax pull parser"
+
+# Prevent accidental publish until 1.0
+publish = false

 [dependencies]
 clap = { version = "4.5.31", optional = true, features = ["derive"] }
 log = "0.4.26"

 [features]
-clap = ["dep:clap"]
+default = []                 # keep no_std friendly default
+cli = ["dep:clap"]           # clearer than repeating the crate name

Benefits:
• Avoids name/path mismatch confusion.
• Provides license/compliance data.
• Ensures examples don’t leak to crates.io unintentionally.
• Clearer feature naming.

🧹 Nitpick comments (25)
.pre-commit-config.yaml (1)

1-24: Add Rust-specific quality hooks for stronger gatekeeping

rustfmt is covered, but you can cheaply catch many issues before CI by adding cargo clippy --all-targets --all-features -- -D warnings and a plain cargo check.
They are available in the same pre-commit-rust repo.

     hooks:
     -   id: fmt
         name: Fmt
+    -   id: clippy
+        name: Clippy (deny warnings)
+        args: ["--all-targets", "--all-features", "--", "-D", "warnings"]
+    -   id: cargo-check
+        name: Cargo Check
demos/src/lib.rs (1)

1-2: Empty library file – clarify intent

Currently the crate compiles to an empty rlib.
If the demos package is meant only for binaries/examples, consider:

  1. Removing src/lib.rs (and lib section in Cargo.toml) and keeping just src/bin/*.
  2. Or adding a minimal doc-comment so cargo doc doesn’t emit “crate has no documentation”.
- 
+//! Helper crate that aggregates demonstration binaries for `stax`.
tokenizer/README.md (1)

1-3: Minor wording tweak for conciseness

-Non-recursive JSON stream tokenizer.
-
-It uses 1 bit per nesting depth to track whether the level represents an array [] or an object {}
+Non-recursive JSON stream tokenizer using one bit per nesting level to distinguish arrays (`[]`) from objects (`{}`).
demos/Cargo.toml (1)

1-4: Add metadata & mark crate as non-publishable

demos is only for examples; consider preventing accidental publication and improving discoverability by adding standard fields:

 [package]
 name = "demos"
 version = "0.0.1"
 edition = "2021"
+publish = false            # avoid pushing examples to crates.io
+license = "MIT OR Apache-2.0"
+description = "Demonstration binaries for the stax / tokenizer workspace"
+repository = "https://github.com/kaidokert/picojson-rs"
+keywords = ["json", "examples"]
Cargo.toml (1)

1-3: Minor readability nit: add spacing & newline

The members list is valid but a bit cramped; adding spaces (or line-per-item) makes diffs cleaner:

-members = [ "stax","tokenizer", "demos"]
+members = [
+  "stax",
+  "tokenizer",
+  "demos",
+]
tokenizer/src/tokenizer/testdata/i_structure_500_nested_arrays.json (1)

1-1: Consider generating this deep-nest test at runtime

Checking very deep nesting is great, but a 500-level literal inflates repo size and hurts diff readability. You could:

  1. Generate it in the test with a loop.
  2. Store it gzip-compressed and expand on test run.

No functional change required, just a maintenance tip.

tokenizer/src/bin/main.rs (3)

4-4: Remove unused commented import.

The commented import is not needed since std::process::exit is used with the full path.

-//use std::process;

7-7: Remove debug print statement.

This appears to be leftover debug code that should be removed from the final implementation.

-    println!("Hello, world!");
-

15-21: Consider memory usage for large files.

Reading the entire file into memory could be problematic for very large JSON files. Consider if streaming parsing would be more appropriate for this use case.

stax/Cargo.toml (2)

22-25: Address TODO comments.

These TODO comments indicate unfinished configuration decisions that should be resolved.

The comments suggest:

  • Line 22-23: Decision needed on log vs defmt integration
  • Line 24-25: test-env-log dependency may not be needed

Should these be cleaned up or converted to GitHub issues for tracking?


10-17: Consider feature flag organization.

The mutually exclusive features (int32/int64 and float behavior options) might benefit from better organization or documentation to prevent invalid combinations.

Consider adding feature conflict validation or documenting the mutual exclusivity more clearly in the manifest comments.

stax/src/json_string.rs (1)

30-39: Simplify Deref implementation.

The Deref implementation duplicates the pattern matching logic from as_str(). Consider delegating to the existing method for consistency and maintainability.

 impl Deref for String<'_, '_> {
     type Target = str;
 
     fn deref(&self) -> &Self::Target {
-        match self {
-            String::Borrowed(s) => s,
-            String::Unescaped(s) => s,
-        }
+        self.as_str()
     }
 }
stax/README.md (3)

45-45: Add missing comma for better readability.

Static analysis correctly identified missing comma after "By default".

-By default this is a 32-bit int, but can be changed
+By default, this is a 32-bit int, but can be changed

50-55: Fix markdown list indentation.

The unordered list items should start at column 0 for proper markdown formatting.

- * int64 ( default ) - numbers are returned in int64 values
- * int32 - integers are returned as int32, to avoid 64-bit math on constrained targets, e.g. Cortex-M0
- * float - full float support is included.
- * float-error - Any floating point input will yield an error, to reduce float math dependency
- * float-skip - Float values are skipped.
- * float-truncate - float values are truncated to integers. Scientific notation will generate an error
+* int64 (default) - numbers are returned as int64 values
+* int32 - integers are returned as int32, to avoid 64-bit math on constrained targets, e.g. Cortex-M0
+* float - full float support is included
+* float-error - Any floating-point input will yield an error, to reduce float math dependency
+* float-skip - Float values are skipped
+* float-truncate - float values are truncated to integers. Scientific notation will generate an error

59-59: Add missing comma for consistency.

Same grammar issue as line 45.

-By default full float and int64 support is enabled.
+By default, full float and int64 support is enabled.
tokenizer/src/lib.rs (1)

10-30: Consider if the BitStackCore trait adds value.

The BitStackCore trait aggregates multiple trait bounds, but it's only used as a re-export. Consider whether this abstraction is necessary or if the individual bounds could be used directly where needed.

CONFIGURATION.md (1)

10-10: Minor grammar improvement for compound adjective.

Consider using a hyphen for the compound adjective "full-range" that modifies "integer support".

-- **`int64`** (default): Use `i64` for full range integer support
+- **`int64`** (default): Use `i64` for full-range integer support
stax/examples/direct_parser_demo.rs (1)

63-68: Consider documenting the rationale for buffer and chunk sizes.

While these sizes work well for the demonstration, consider adding a comment explaining why 256 bytes for buffer and 8 bytes for chunks were chosen. This would help users understand the trade-offs when adapting the example.

     // Create ArrayReader that reads in small chunks (simulates network streaming)
+    // Using 8-byte chunks to demonstrate multiple read calls for the 74-byte input
     let reader = ArrayReader::new(json, 8); // Read 8 bytes at a time
 
     // Create DirectParser with a reasonably sized buffer
+    // 256 bytes is sufficient for this example while keeping memory usage low
     let mut buffer = [0u8; 256];
stax/examples/array_bitstack_demo.rs (1)

93-99: Consider propagating the error instead of silently breaking

The error is caught but not propagated. Since the function returns Result<(), stax::ParseError>, you could propagate the error for better error reporting.

-            Some(Err(_)) => {
-                println!(
-                    "  ! Parse error encountered at depth {}, continuing...",
-                    depth
-                );
-                break;
-            }
+            Some(Err(e)) => {
+                println!("  ! Parse error encountered at depth {}: {:?}", depth, e);
+                return Err(e);
+            }
DESIGN.md (2)

152-152: Fix grammatical error: "In addition of" → "In addition to"

-More: In addition of taking just slice [u8] as input, we should accept an `impl Reader` of some sort.
+More: In addition to taking just slice [u8] as input, we should accept an `impl Reader` of some sort.

155-155: Fix typo: "implent" → "implement"

-make our own, and auto-implent it for arrays and slices or for anything that looks like AsRef<[u8]>
+make our own, and auto-implement it for arrays and slices or for anything that looks like AsRef<[u8]>
stax/src/json_number.rs (1)

178-196: Consider adding more robust scientific notation handling

The float-truncate implementation rejects all scientific notation, but simple cases like "1e3" (= 1000) could be handled without float math by parsing the exponent and shifting.

Would you like me to provide an implementation that handles simple scientific notation cases without using float operations?

stax/src/copy_on_escape.rs (1)

162-163: Simplify error conversion.

The error conversion can be simplified using the ? operator directly.

-            let unescaped_str =
-                core::str::from_utf8(unescaped_slice).map_err(ParseError::InvalidUtf8)?;
+            let unescaped_str = core::str::from_utf8(unescaped_slice)
+                .map_err(ParseError::InvalidUtf8)?;
tokenizer/src/tokenizer/mod.rs (2)

183-184: Document test-only variant purpose.

The Uninitialized variant should have documentation explaining its test-only purpose.

     ArrayEnd,
+    /// Test-only variant used as a placeholder in test arrays
     #[cfg(test)]
     Uninitialized,

1542-1543: Track TODO for array BitStack support.

The comment indicates that array BitStack support needs custom implementation.

Would you like me to create an issue to track implementing BitStack support for arrays?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d642b7 and 7fc8294.

📒 Files selected for processing (37)
  • .gitattributes (1 hunks)
  • .gitignore (1 hunks)
  • .pre-commit-config.yaml (1 hunks)
  • CONFIGURATION.md (1 hunks)
  • Cargo.toml (1 hunks)
  • DESIGN.md (1 hunks)
  • TODO.md (1 hunks)
  • demos/Cargo.toml (1 hunks)
  • demos/src/lib.rs (1 hunks)
  • stax/Cargo.toml (1 hunks)
  • stax/README.md (1 hunks)
  • stax/examples/advanced_bitstack_demo.rs (1 hunks)
  • stax/examples/array_bitstack_demo.rs (1 hunks)
  • stax/examples/direct_parser_demo.rs (1 hunks)
  • stax/examples/no_float_demo.rs (1 hunks)
  • stax/examples/simple_api_demo.rs (1 hunks)
  • stax/src/copy_on_escape.rs (1 hunks)
  • stax/src/direct_buffer.rs (1 hunks)
  • stax/src/direct_parser.rs (1 hunks)
  • stax/src/escape_processor.rs (1 hunks)
  • stax/src/flex_parser.rs (1 hunks)
  • stax/src/json_number.rs (1 hunks)
  • stax/src/json_string.rs (1 hunks)
  • stax/src/lib.rs (1 hunks)
  • stax/src/number_parser.rs (1 hunks)
  • stax/src/shared.rs (1 hunks)
  • stax/src/slice_input_buffer.rs (1 hunks)
  • stax/tests/api_test.rs (1 hunks)
  • stax/tests/configurable_numbers.rs (1 hunks)
  • tokenizer/Cargo.toml (1 hunks)
  • tokenizer/README.md (1 hunks)
  • tokenizer/src/bin/main.rs (1 hunks)
  • tokenizer/src/bitstack/mod.rs (1 hunks)
  • tokenizer/src/lib.rs (1 hunks)
  • tokenizer/src/tokenizer/mod.rs (1 hunks)
  • tokenizer/src/tokenizer/testdata/i_structure_500_nested_arrays.json (1 hunks)
  • tokenizer/tests/array_bitstack_test.rs (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
tokenizer/tests/array_bitstack_test.rs (1)
tokenizer/src/bitstack/mod.rs (9)
  • default (5-5)
  • default (24-26)
  • default (57-59)
  • top (11-11)
  • top (37-39)
  • top (93-96)
  • pop (9-9)
  • pop (31-35)
  • pop (76-91)
stax/src/json_string.rs (2)
stax/src/json_number.rs (4)
  • as_str (81-86)
  • as_ref (110-112)
  • deref (118-120)
  • fmt (124-141)
tokenizer/src/tokenizer/mod.rs (1)
  • fmt (232-238)
stax/src/json_number.rs (2)
stax/src/json_string.rs (4)
  • as_str (16-21)
  • as_ref (25-27)
  • deref (33-38)
  • fmt (42-44)
tokenizer/src/tokenizer/mod.rs (1)
  • fmt (232-238)
stax/src/slice_input_buffer.rs (2)
stax/src/number_parser.rs (7)
  • new (95-101)
  • get_number_slice (15-15)
  • get_number_slice (105-112)
  • current_position (18-18)
  • current_position (114-116)
  • is_empty (21-21)
  • is_empty (118-120)
stax/src/flex_parser.rs (1)
  • new (44-57)
🪛 LanguageTool
stax/README.md

[uncategorized] ~45-~45: Did you mean: “By default,”?
Context: ...state, one bit for every nesting level. By default this is a 32-bit int, but can be change...

(BY_DEFAULT_COMMA)


[uncategorized] ~50-~50: The preposition ‘as’ seems more likely in this position.
Context: ...nt64 ( default ) - numbers are returned in int64 values * int32 - integers are re...

(AI_HYDRA_LEO_REPLACE_IN_AS)


[uncategorized] ~53-~53: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...pport is included. * float-error - Any floating point input will yield an error, to reduce fl...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~59-~59: Did you mean: “By default,”?
Context: ... Please see examples/no_float_demo.rs By default full float and int64 support is enabled...

(BY_DEFAULT_COMMA)

CONFIGURATION.md

[uncategorized] ~10-~10: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... - int64 (default): Use i64 for full range integer support - int32: Use `i32...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~16-~16: Loose punctuation mark.
Context: ...l float parsing behavior: - float: Enable full f64 parsing support - **No ...

(UNLIKELY_OPENING_PUNCTUATION)

DESIGN.md

[uncategorized] ~152-~152: The preposition ‘to’ seems more likely in this position.
Context: ... input IMPORTANT!!! More: In addition of taking just slice [u8] as input, we sho...

(AI_HYDRA_LEO_REPLACE_OF_TO)


[grammar] ~162-~162: It appears that an article is missing.
Context: ... like println! is convenient and easy. Same should be done with Number, but it's a ...

(MISSING_ARTICLE)


[uncategorized] ~169-~169: A determiner appears to be missing. Consider inserting it.
Context: ...ting should do sensible default things. Most tricky is number formatting. The object...

(AI_EN_LECTOR_MISSING_DETERMINER)

🪛 markdownlint-cli2 (0.17.2)
stax/README.md

50-50: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


51-51: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


52-52: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


53-53: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


54-54: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


55-55: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

🔇 Additional comments (27)
.gitignore (1)

22-22: Re-consider ignoring Cargo.lock at workspace root

For Rust binary projects or mixed workspaces (binaries + libs) keeping Cargo.lock under VCS is the recommended practice – it guarantees reproducible CI and demo builds.
Unless every crate in the workspace is intended to be a pure library published on crates.io, tracking the lock-file is safer.

-Cargo.lock
+# If all crates are libraries published on crates.io you may ignore the lock file,
+# otherwise keep it under version control for reproducibility.
+# Cargo.lock
TODO.md (1)

1-12: LGTM – plaintext TODO list

The task list is clear and scoped; no review remarks.

.gitattributes (1)

1-1: Consistent line endings enforced – looks good
No issues spotted.

stax/examples/simple_api_demo.rs (1)

5-40: Well-structured example with good practices.

This example effectively demonstrates the PullParser API with:

  • Proper error handling using Result return type
  • Clear pattern matching for different event types
  • Good use of the iterator pattern
  • Informative output for learning purposes

The code follows Rust best practices and provides a clear introduction to the API.

stax/src/json_string.rs (2)

6-22: Well-designed copy-on-escape string type.

The String enum provides an elegant solution for handling escaped vs unescaped strings without heap allocation. The design aligns well with the zero-allocation goals and follows similar patterns used in JsonNumber (as seen in the relevant code snippets).


47-63: Good test coverage for core functionality.

The tests verify the key Deref behavior and string compatibility, which are essential for the ergonomic API.

stax/README.md (1)

1-60: Comprehensive and well-structured documentation.

The README effectively explains the library's purpose, provides clear usage examples, and documents configuration options. The content clearly differentiates when to use this library vs alternatives like serde-json.

tokenizer/tests/array_bitstack_test.rs (1)

1-57: Excellent test coverage for ArrayBitStack.

The test suite comprehensively validates the ArrayBitStack implementation with well-chosen scenarios:

  • Basic LIFO operations and non-mutating top() behavior
  • Multi-element handling with larger capacities
  • Cross-element boundary operations when exceeding single element capacity

The tests use appropriate type parameters to demonstrate flexibility and cover edge cases effectively.

tokenizer/src/lib.rs (1)

1-8: Well-structured crate entry point.

The no_std configuration and module structure are appropriate for a tokenizer library. The public API re-exports provide a clean interface.

stax/src/lib.rs (2)

1-30: Well-organized crate structure with comprehensive module coverage.

The module organization effectively separates concerns (parsing strategies, buffer management, JSON types, utilities) and the public API re-exports provide a clean interface for users.


31-37: Error conversion implementation is correct.

The mapping from slice_input_buffer::Error::ReachedEnd to ParseError::EndOfData is straightforward and appropriate.

CONFIGURATION.md (1)

1-142: Excellent comprehensive configuration documentation.

This documentation provides thorough coverage of the parser's configurability with:

  • Clear feature flag explanations
  • Practical dependency configuration examples
  • Detailed API usage patterns with conditional compilation
  • Testing instructions and edge case handling

The content effectively guides users in choosing appropriate configurations for their use cases.

stax/examples/advanced_bitstack_demo.rs (1)

1-90: Excellent demonstration of BitStack configuration flexibility.

This example effectively showcases the library's adaptability with three well-chosen scenarios:

  • Standard configuration for typical use cases
  • Memory-efficient setup for resource-constrained environments
  • Deep-nesting configuration for complex JSON structures

The depth tracking logic is correct, error handling is appropriate, and the output formatting makes the differences clear to users.

stax/examples/direct_parser_demo.rs (1)

7-49: Good implementation of Reader trait for demonstration purposes!

The ArrayReader implementation effectively simulates streaming behavior with configurable chunk sizes. Good use of saturating_sub to prevent underflow.

stax/examples/no_float_demo.rs (1)

1-166: Excellent demonstration of configurable number parsing!

The example effectively showcases different number parsing behaviors based on compile-time features. The configuration detection, test case selection, and summary output provide clear insights into how the parser behaves under various settings.

stax/tests/api_test.rs (1)

1-131: Well-structured and comprehensive API tests!

The test suite provides good coverage of the PullParser API, including:

  • Parsing without escapes
  • Error handling for escapes without buffer
  • Successful escape handling with buffer
  • Mixed data types
  • Borrowed vs unescaped string handling
stax/tests/configurable_numbers.rs (1)

1-248: Excellent test coverage for configurable number parsing!

The test suite comprehensively covers all number parsing configurations including:

  • Integer overflow detection for i32/i64
  • Float error behavior
  • Float truncation modes
  • Scientific notation handling
  • Default float-disabled behavior
  • Embedded-friendly configurations

The use of conditional compilation ensures tests run only for their intended configurations.

stax/src/number_parser.rs (1)

1-82: Well-designed number parsing abstraction

The NumberExtractor trait and parsing functions provide a clean, reusable abstraction for extracting and parsing numbers from different buffer implementations. The delimiter handling logic is particularly well thought out.

stax/src/slice_input_buffer.rs (1)

1-79: Clean buffer implementation with proper trait support

The SliceInputBuffer implementation is well-structured with clear separation of concerns. The NumberExtractor implementation correctly handles position adjustments for parser compatibility.

stax/src/copy_on_escape.rs (2)

1-27: Well-designed struct with clear documentation.

The CopyOnEscape struct is well-organized with appropriate lifetime parameters and comprehensive documentation for each field. The separation between global buffer state and per-string state is clean.


175-337: Excellent test coverage.

The tests comprehensively cover all major scenarios including:

  • Zero-copy optimization for strings without escapes
  • Escape handling with buffer switching
  • Buffer reuse across multiple strings
  • Unicode escape processing
  • Error conditions

The test structure is clear and well-organized.

tokenizer/src/tokenizer/mod.rs (1)

395-1048: Well-implemented JSON tokenizer state machine.

The parsing logic is comprehensive and correctly implements the JSON grammar:

  • Proper handling of all number formats (integers, decimals, exponents)
  • Complete escape sequence processing for strings
  • Correct structural validation for objects and arrays
  • Good error reporting with specific error kinds

The state machine design makes the parser easy to understand and maintain.

stax/src/escape_processor.rs (2)

149-210: Well-designed Unicode escape collector.

The UnicodeEscapeCollector provides a clean interface for incremental hex digit collection with:

  • Proper state management and reset capability
  • Good bounds checking to prevent buffer overflow
  • Clear completion tracking
  • Appropriate error handling

276-280: Note: Unicode surrogate pair handling limitation.

The comment indicates that the test doesn't fully handle Unicode surrogate pairs. JSON allows encoding characters outside the BMP using surrogate pairs (e.g., \uD83D\uDE0A for 😊).

Consider whether the parser needs to handle surrogate pairs for full JSON compliance. If so, this would require detecting high surrogates (U+D800-U+DBFF) and combining them with low surrogates (U+DC00-U+DFFF).

stax/src/direct_parser.rs (1)

7-7: Fix incorrect import syntax

The import statement is missing the use keyword.

-use log;
+use log;

Actually, this appears to be correct syntax. However, if you're not using any specific items from the log crate, you can remove this line entirely since the log! macros are available globally when the crate is included as a dependency.

stax/src/shared.rs (1)

1-270: Well-structured shared components

The shared module is well-designed with:

  • Clear event and error type definitions
  • Consistent error handling patterns through ParserErrorHandler
  • Useful position calculation utilities in ContentRange
  • Good separation of concerns
stax/src/direct_buffer.rs (1)

1-437: Excellent buffer management implementation

The DirectBuffer implementation is well-designed with:

  • Clear documentation of design principles
  • Proper error handling and validation
  • Safe handling of overlapping ranges using copy_within
  • Comprehensive test coverage including edge cases
  • Good integration with the NumberExtractor trait

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
stax/src/flex_parser.rs (3)

399-411: Address TODO comment about escape sequence end events.

The TODO comment indicates uncertainty about whether these event handlers are needed.

Would you like me to help investigate whether these End events for escape sequences are used elsewhere in the codebase, or create an issue to track this technical debt?


183-187: Preserve tokenizer error information.

The specific error from the tokenizer is being discarded and replaced with a generic error message.

-        if res.is_err() {
-            return Err(ParseError::UnexpectedState(
-                "Failed to pull tokenizer events",
-            ));
-        }
+        res.map_err(|_| ParseError::TokenizerError)?;

415-418: Replace recursion with loop continuation.

The recursive call could cause stack overflow with malformed JSON input.

Since you already have a while loop at line 205, continue the loop instead:

-            } else {
-                // No event was produced, need to call next_event recursively
-                return self.next_event();
-            }
+            }
+            // If no event was produced, the while loop will continue
stax/src/direct_parser.rs (2)

494-505: Preserve reader error information.

The error from self.reader.read() is being discarded and mapped to ParseError::EndOfData. The reader could fail for reasons other than reaching EOF (e.g., I/O errors, network issues).

Consider adding a new ParseError variant to wrap reader errors:

-                .map_err(|_| ParseError::EndOfData)?;
+                .map_err(|_| ParseError::ReaderError)?;

This would require adding a ReaderError variant to the ParseError enum to preserve the underlying error details.


596-604: Handle escape buffer errors properly.

The error from append_byte_to_escape_buffer is being silently ignored, which could lead to data corruption.

-        if let Ok(unescaped_char) = EscapeProcessor::process_escape_token(escape_token) {
-            if let Err(_) = self.append_byte_to_escape_buffer(unescaped_char) {
-                // Handle error - for now just continue
-            }
-        }
+        let unescaped_char = EscapeProcessor::process_escape_token(escape_token)
+            .map_err(|e| ParseError::InvalidEscape)?;
+        self.append_byte_to_escape_buffer(unescaped_char)?;
🧹 Nitpick comments (2)
tokenizer/src/bin/main.rs (2)

4-4: Remove commented import.

The commented std::process import should be removed as it's not used.

-//use std::process;

7-7: Remove or replace generic greeting.

The "Hello, world!" message doesn't provide value for a JSON parser utility. Consider removing it or replacing it with a more informative message.

Either remove the line:

-    println!("Hello, world!");

Or replace with something more informative:

-    println!("Hello, world!");
+    println!("JSON Parser - validating JSON file...");
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fc8294 and 6964625.

📒 Files selected for processing (6)
  • stax/Cargo.toml (1 hunks)
  • stax/examples/array_bitstack_demo.rs (1 hunks)
  • stax/src/direct_parser.rs (1 hunks)
  • stax/src/flex_parser.rs (1 hunks)
  • tokenizer/src/bin/main.rs (1 hunks)
  • tokenizer/src/lib.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • stax/Cargo.toml
  • stax/examples/array_bitstack_demo.rs
  • tokenizer/src/lib.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
stax/src/direct_parser.rs (7)
stax/examples/direct_parser_demo.rs (2)
  • read (28-48)
  • new (16-22)
stax/src/json_string.rs (2)
  • fmt (42-44)
  • as_str (16-21)
stax/src/direct_buffer.rs (1)
  • new (46-57)
stax/src/flex_parser.rs (3)
  • new (44-57)
  • next (191-196)
  • next_event (200-425)
stax/src/escape_processor.rs (3)
  • new (161-166)
  • hex_count (207-209)
  • process_escape_token (51-55)
stax/src/number_parser.rs (2)
  • new (95-101)
  • parse_number_event (31-59)
stax/src/shared.rs (8)
  • new (83-88)
  • quote_position_from_current (153-155)
  • number_start_from_current (141-143)
  • state_mismatch (220-229)
  • bytes_to_utf8_str (197-199)
  • string_content_bounds (111-115)
  • string_content_bounds_before_escape (125-132)
  • invalid_unicode_escape (257-259)
🔇 Additional comments (1)
stax/src/direct_parser.rs (1)

693-1383: Excellent test coverage!

The comprehensive test suite covers edge cases, error conditions, escape sequences, Unicode handling, and parser state transitions. The tests are well-organized and provide good validation of the implementation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
tokenizer/src/bitstack/mod.rs (2)

31-35: Critical: pop() method violates trait contract by never returning None.

This issue was flagged in previous reviews and remains unresolved. The trait documentation clearly states that pop() should return None when the stack is empty, but this implementation always returns Some(bit).


78-95: Critical: ArrayBitStack::pop() method violates trait contract by never returning None.

This implementation also always returns Some(bit) and never returns None, violating the trait contract that requires None when the stack is empty.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6964625 and 30b6e50.

📒 Files selected for processing (1)
  • tokenizer/src/bitstack/mod.rs (1 hunks)
🔇 Additional comments (1)
tokenizer/src/bitstack/mod.rs (1)

66-67: Well done: Dynamic bit width calculation correctly addresses previous hardcoded shift issue.

The implementation now properly calculates the bit width based on the generic type size, fixing the previous critical issue with hardcoded 7-bit shifts that only worked for u8.

Also applies to: 84-85

@kaidokert
Copy link
Owner Author

@sourcery-ai review

Copy link

sourcery-ai bot commented Jun 28, 2025

Reviewer's Guide

This PR introduces the first end-to-end implementation of a zero-allocation, no-std JSON pull-parsing toolkit, built on top of the low-level ujson tokenizer. It adds two parsing front-ends (a streaming DirectParser and a flexible PullParserFlex), unified abstractions for numbers and strings (supporting zero-copy or unescaped variants), configurable nesting depth via generic BitStack/ArrayBitStack, robust escape and Unicode processing, buffer managers, extensive tests, examples, and documentation for driving different embedded or full-featured configurations.

Sequence diagram for streaming JSON parsing with DirectParser

sequenceDiagram
    participant App as Application
    participant Reader
    participant DirectParser
    participant Tokenizer
    participant DirectBuffer
    App->>DirectParser: next_event()
    loop While not EndDocument
        DirectParser->>DirectBuffer: fill_buffer_from_reader()
        DirectParser->>Reader: read(buf)
        Reader-->>DirectParser: bytes_read
        DirectParser->>Tokenizer: parse_chunk([byte], callback)
        Tokenizer-->>DirectParser: Event
        alt Event is Begin/End
            DirectParser->>DirectBuffer: handle event (e.g. extract string/number)
        end
        DirectParser-->>App: Event
    end
Loading

Class diagram for the new JSON Tokenizer and DirectParser

classDiagram
    class Tokenizer {
        - State state
        - usize total_consumed
        - ParseContext context
        + new()
        + parse_full(data, callback)
        + parse_chunk(data, callback)
        + finish(callback)
    }
    class ParseContext {
        - depth
        - stack
        - after_comma
        + new()
        + enter_object()
        + exit_object()
        + enter_array()
        + exit_array()
        + is_object()
        + is_array()
    }
    class DirectParser {
        - Tokenizer tokenizer
        - ParserState parser_state
        - Reader reader
        - DirectBuffer direct_buffer
        - ProcessingState processing_state
        - Option<PendingContainerEnd> pending_container_end
        - UnicodeEscapeCollector unicode_escape_collector
        + new(reader, buffer)
        + next()
        + next_event()
    }
    class Reader {
        <<trait>>
        + read(buf) Result<usize, Error>
    }
    class DirectBuffer {
        + new(buffer)
        + is_empty()
        + current_byte()
        + advance()
        + get_fill_slice()
        + mark_filled(bytes_read)
        + has_unescaped_content()
        + append_unescaped_byte(byte)
        + clear_unescaped()
    }
    class Event {
        <<enum>>
        + StartObject
        + EndObject
        + StartArray
        + EndArray
        + Key
        + String
        + Number
        + Bool
        + Null
        + EndDocument
    }
    class EventToken {
        <<enum>>
        + True
        + False
        + Null
        + String
        + Key
        + Number
        + NumberAndArray
        + NumberAndObject
        + UnicodeEscape
        + EscapeSequence
        + EscapeQuote
        + EscapeBackslash
        + EscapeSlash
        + EscapeBackspace
        + EscapeFormFeed
        + EscapeNewline
        + EscapeCarriageReturn
        + EscapeTab
    }
    Tokenizer --> ParseContext
    DirectParser --> Tokenizer
    DirectParser --> DirectBuffer
    DirectParser --> Reader
    DirectParser --> Event
    DirectParser --> EventToken
    DirectParser --> UnicodeEscapeCollector
    DirectBuffer --> DirectParser
    EventToken --> Event
Loading

Class diagram for number and string abstractions

classDiagram
    class NumberResult {
        <<enum>>
        + Integer(i64 or i32)
        + IntegerOverflow
        + Float(f64)
        + FloatTruncated(i64)
        + FloatDisabled
    }
    class String {
        <<enum>>
        + Borrowed(&str)
        + Unescaped(&str)
        + as_str()
    }
    class Event {
        + Number(NumberResult)
        + String(String)
        + Key(String)
    }
    Event --> NumberResult
    Event --> String
Loading

Class diagram for BitStack and configurable nesting

classDiagram
    class BitStack {
        <<trait>>
        + push(bool)
        + pop()
        + top() Option<bool>
        + default()
    }
    class BitStackCore {
        <<trait>>
        + from(u8)
        + not()
    }
    ParseContext --> BitStack
    ParseContext --> BitStackCore
Loading

Class diagram for error handling and event emission

classDiagram
    class Error {
        + kind: ErrKind
        + character: u8
        + position: usize
        + new(kind, character, position)
    }
    class ErrKind {
        <<enum>>
        + EmptyStream
        + UnfinishedStream
        + InvalidRoot
        + InvalidToken
        + UnescapedControlCharacter
        + TrailingComma
        + ContentEnded
        + UnopenedArray
        + UnopenedObject
        + MaxDepthReached
        + InvalidNumber
        + InvalidUnicodeEscape
        + InvalidStringEscape
        + ExpectedObjectKey
        + ExpectedObjectValue
        + ExpectedColon
        + ExpectedArrayItem
    }
    class Event {
        + Begin(EventToken)
        + End(EventToken)
        + ObjectStart
        + ObjectEnd
        + ArrayStart
        + ArrayEnd
    }
    Error --> ErrKind
    Tokenizer --> Error
    Tokenizer --> Event
Loading

File-Level Changes

Change Details Files
Streaming pull parser (DirectParser) with single-buffer management
  • Define a generic Reader trait for streaming input
  • Implement DirectParser state machine over ujson::Tokenizer events
  • Integrate DirectBuffer to feed and manage unescaped content in one buffer
  • Handle escape sequences and Unicode via EscapeProcessor and UnicodeEscapeCollector
  • Manage pending container-end events and ergonomic next_event() iteration
stax/src/direct_parser.rs
stax/src/direct_buffer.rs
stax/src/escape_processor.rs
Flexible pull parser (PullParserFlex) and zero-copy/escape-aware strings
  • Add PullParserFlex with SliceInputBuffer for non-streaming input
  • Embed CopyOnEscape scratch-buffer strategy for on-demand string unescaping
  • Share UnicodeEscapeCollector logic between parsers
  • Expose ergonomic new/new_with_buffer and implement Iterator<Item=Result<Event,ParseError>>
stax/src/flex_parser.rs
stax/src/copy_on_escape.rs
stax/src/slice_input_buffer.rs
Unified JSON string type and escape utilities
  • Introduce String<'a,'b> enum for borrowed vs unescaped variants
  • Implement pure-function EscapeProcessor and UnicodeEscapeCollector
  • Integrate escape routines into both parsers and add unit tests
stax/src/json_string.rs
stax/src/shared.rs
stax/src/escape_processor.rs
Unified number parsing abstraction
  • Define JsonNumber and NumberResult to preserve raw texts and parsed values
  • Implement parse_number_from_str and NumberExtractor trait in number_parser.rs
  • Integrate number extraction into both parsers with container-end context
  • Configure integer and float behaviors via Cargo features
stax/src/json_number.rs
stax/src/number_parser.rs
stax/src/flex_parser.rs
stax/src/direct_parser.rs
Enhanced streaming JSON tokenizer
  • Extend Tokenizer<T,D> with a full JSON state machine for strings, numbers, tokens
  • Emit rich EventToken and ErrorKind for precise error reporting
  • Support incremental parse_chunk and robust finish logic
  • Add generic BitStack depth checks and consistent cross-platform settings
tokenizer/src/tokenizer/mod.rs
tokenizer/src/lib.rs
.gitignore
Configurable nesting depth via BitStack and ArrayBitStack
  • Implement BitStack trait for integer types
  • Add ArrayBitStack<const N,T> to scale nesting capacity
  • Test multi-element push/pop and overflow behavior
tokenizer/src/bitstack/mod.rs
Buffer management modules
  • Implement DirectBuffer for single-buffer streaming and unescaped content
  • Provide DirectBufferStats for diagnostics
  • Test buffer fill, advance, unescaped capture, and error conditions
stax/src/direct_buffer.rs
Examples, tests and documentation
  • Add multiple examples/ demonstrating parser usage, nesting, streaming and number configs
  • Provide extensive unit and integration tests under stax/tests and tokenizer/tests
  • Publish design docs (DESIGN.md, CONFIGURATION.md, README.md), workspace and pre-commit configurations
stax/examples/*
stax/tests/*
DESIGN.md
CONFIGURATION.md
stax/README.md
.pre-commit-config.yaml
Cargo.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kaidokert - I've reviewed your changes - here's some feedback:

  • The enormous parse_chunk_inner state machine should be broken into smaller per-state handler methods to improve readability and maintainability.
  • DirectParser and PullParserFlex duplicate much of the event-dispatch and termination logic—extract their shared patterns into a common utility to reduce divergence.
  • Your custom String enum shadows std::string::String—consider renaming it (e.g. to JsonString) to avoid confusion with the standard type.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The enormous `parse_chunk_inner` state machine should be broken into smaller per-state handler methods to improve readability and maintainability.
- DirectParser and PullParserFlex duplicate much of the event-dispatch and termination logic—extract their shared patterns into a common utility to reduce divergence.
- Your custom `String` enum shadows `std::string::String`—consider renaming it (e.g. to `JsonString`) to avoid confusion with the standard type.

## Individual Comments

### Comment 1
<location> `stax/src/direct_parser.rs:251` </location>
<code_context>
+    }
+
+    /// Process event and update state, but defer complex processing
+    fn process_tokenizer_event(&mut self, event: ujson::Event) -> Result<EventResult, ParseError> {
+        Ok(match event {
+            // Container events
</code_context>

<issue_to_address>
Consider handling unknown or unexpected events explicitly.

Matching all known event variants explicitly and handling unknown events with an error or unreachable!() will help prevent silent failures if new event types are introduced.

Suggested implementation:

```rust
        Ok(match event {
            // Container events
            ujson::Event::ObjectStart => EventResult::Complete(Event::StartObject),
            ujson::Event::ObjectEnd => {
                // Check if we're in the middle of parsing a number - if so, extract it first
                if matches!(self.parser_state.state, crate::shared::State::Number(_)) {
                    log::debug!(
                        "DirectParser: ObjectEnd while in Number state - extracting number first"
                    );
                    // Extract the number first, then we'll emit EndObject on the next call
                    self.pending_container_end = Some(PendingContainerEnd::ObjectEnd);
                    EventResult::ExtractNumberFromContainer
                } else {
                    EventResult::Complete(Event::EndObject)
                }
            }
            ujson::Event::ArrayStart => EventResult::Complete(Event::StartArray),
            ujson::Event::ArrayEnd => {
                if matches!(self.parser_state.state, crate::shared::State::Number(_)) {
                    log::debug!(
                        "DirectParser: ArrayEnd while in Number state - extracting number first"
                    );
                    self.pending_container_end = Some(PendingContainerEnd::ArrayEnd);
                    EventResult::ExtractNumberFromContainer
                } else {
                    EventResult::Complete(Event::EndArray)
                }
            }
            ujson::Event::StringValue(s) => EventResult::Complete(Event::String(s)),
            ujson::Event::NumberValue(n) => EventResult::Complete(Event::Number(n)),
            ujson::Event::BoolValue(b) => EventResult::Complete(Event::Bool(b)),
            ujson::Event::NullValue => EventResult::Complete(Event::Null),
            ujson::Event::Key(k) => EventResult::Complete(Event::Key(k)),
            // Explicitly handle unknown or future events
            _ => return Err(ParseError::UnknownEvent),

```

- You must define `ParseError::UnknownEvent` in your `ParseError` enum if it does not already exist.
- If you want to use `unreachable!()` instead of returning an error, replace `return Err(ParseError::UnknownEvent),` with `unreachable!("Unknown ujson::Event variant encountered")`.
- Make sure all current `ujson::Event` variants are matched explicitly above the `_` arm.
</issue_to_address>

### Comment 2
<location> `stax/src/direct_parser.rs:494` </location>
<code_context>
+        crate::number_parser::parse_number_event(&self.direct_buffer, start_pos, from_container_end)
+    }
+    /// Fill buffer from reader
+    fn fill_buffer_from_reader(&mut self) -> Result<(), ParseError> {
+        if let Some(fill_slice) = self.direct_buffer.get_fill_slice() {
+            let bytes_read = self
</code_context>

<issue_to_address>
Potential for infinite loop if reader returns Ok(0) but not at EOF.

Add a safeguard to handle cases where Ok(0) is returned but EOF is not reached, or clarify the Reader trait contract to ensure Ok(0) always indicates EOF.
</issue_to_address>

### Comment 3
<location> `stax/src/direct_buffer.rs:103` </location>
<code_context>
+
+    /// Start unescaping and copy existing content from a range in the buffer
+    /// This handles the common case of starting escape processing partway through a string
+    pub fn start_unescaping_with_copy(
+        &mut self,
+        max_escaped_len: usize,
</code_context>

<issue_to_address>
Possible silent truncation when copying spans to scratch buffer.

Consider returning an error if the span exceeds the buffer size, rather than silently truncating the data.
</issue_to_address>

### Comment 4
<location> `stax/src/direct_buffer.rs:162` </location>
<code_context>
+    }
+
+    /// Append a single byte to the unescaped content
+    pub fn append_unescaped_byte(&mut self, byte: u8) -> Result<(), DirectBufferError> {
+        if self.unescaped_len >= self.buffer.len() {
+            return Err(DirectBufferError::BufferFull);
</code_context>

<issue_to_address>
No check for reserved escape space when appending unescaped bytes.

Currently, the method only checks the total buffer length and not the reserved escape space. Please ensure the `escape_reserve` constraint is enforced to prevent overwriting reserved space.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    pub fn append_unescaped_byte(&mut self, byte: u8) -> Result<(), DirectBufferError> {
        if self.unescaped_len >= self.buffer.len() {
            return Err(DirectBufferError::BufferFull);
=======
    pub fn append_unescaped_byte(&mut self, byte: u8) -> Result<(), DirectBufferError> {
        if self.unescaped_len >= self.buffer.len() - self.escape_reserve {
            return Err(DirectBufferError::BufferFull);
>>>>>>> REPLACE

</suggested_fix>

### Comment 5
<location> `stax/src/json_number.rs:104` </location>
<code_context>
+    }
+
+    /// Check if this number would be a float (has decimal point or exponent).
+    pub fn is_float(&self) -> bool {
+        !self.is_integer()
+    }
</code_context>

<issue_to_address>
is_float() may return true for unsupported float types.

In cases where float support is disabled, clarify or document that `is_float()` only means the value is not an integer, not that a float is actually supported.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    /// Check if this number would be a float (has decimal point or exponent).
    pub fn is_float(&self) -> bool {
        !self.is_integer()
    }
=======
    /// Returns true if this number is not an integer (i.e., has a decimal point or exponent).
    ///
    /// Note: This does not guarantee that float values are supported or enabled in this build.
    /// It only indicates that the number is not an integer, regardless of float support.
    pub fn is_float(&self) -> bool {
        !self.is_integer()
    }
>>>>>>> REPLACE

</suggested_fix>

### Comment 6
<location> `stax/src/number_parser.rs:39` </location>
<code_context>
+    let current_pos = extractor.current_position();
+
+    // Determine if we should exclude a delimiter from the number
+    let number_end = if from_container_end || (!extractor.is_empty()) {
+        // Came from container end OR not at EOF - number was terminated by delimiter, exclude it
+        current_pos.saturating_sub(1)
</code_context>

<issue_to_address>
Delimiter exclusion logic may not handle all edge cases.

Please review the conditions for `number_end` to ensure correct handling when the buffer is empty but a delimiter is present, particularly at EOF and after delimiters.

Suggested implementation:

```rust
    // Determine if we should exclude a delimiter from the number
    let number_end = if from_container_end {
        // Came from container end - number was terminated by delimiter, exclude it
        current_pos.saturating_sub(1)
    } else if extractor.is_empty() {
        // Buffer is empty, check if the last character was a delimiter
        if extractor.last_char_was_delimiter() {
            // At EOF and last char was delimiter - exclude it
            current_pos.saturating_sub(1)
        } else {
            // At EOF and last char was not delimiter - use full span
            current_pos
        }
    } else {
        // Not at EOF - number was terminated by delimiter, exclude it
        current_pos.saturating_sub(1)
    };

```

- This change assumes the existence of a method `last_char_was_delimiter()` on the `NumberExtractor` trait or struct. If it does not exist, you will need to implement it to check if the last character read was a delimiter.
- If your delimiter logic is more complex, you may need to adjust the check accordingly.
</issue_to_address>

### Comment 7
<location> `tokenizer/src/bitstack/mod.rs:31` </location>
<code_context>
+        *self = (self.clone() << 1u8) | T::from(bit as u8);
+    }
+
+    fn pop(&mut self) -> Option<bool> {
+        let bit = (self.clone() & T::from(1)) != T::from(0);
+        *self = self.clone() >> 1u8;
</code_context>

<issue_to_address>
BitStack::pop() always returns Some, even if the stack is empty.

Currently, `pop` does not indicate when the stack is empty, which may cause bugs. Please update the implementation to track stack depth and return `None` when the stack is empty.
</issue_to_address>

### Comment 8
<location> `stax/src/slice_input_buffer.rs:22` </location>
<code_context>
+}
+
+impl<'a> InputBuffer for SliceInputBuffer<'a> {
+    fn is_past_end(&self) -> bool {
+        self.pos > self.data.len()
+    }
</code_context>

<issue_to_address>
is_past_end() may not behave as intended when pos == data.len().

Consider updating the condition to self.pos >= self.data.len() to align with typical buffer behavior and prevent off-by-one errors.
</issue_to_address>

### Comment 9
<location> `stax/tests/api_test.rs:5` </location>
<code_context>
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_coe2_no_escapes() {
+        let input = b"hello world";
</code_context>

<issue_to_address>
No tests for invalid JSON or error propagation in the new API.

Please add tests for malformed JSON inputs to ensure the parser correctly handles and reports errors for invalid cases.
</issue_to_address>

### Comment 10
<location> `tokenizer/tests/array_bitstack_test.rs:3` </location>
<code_context>
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_coe2_no_escapes() {
+        let input = b"hello world";
</code_context>

<issue_to_address>
No tests for popping from an empty ArrayBitStack.

Please add a test to confirm that popping from an empty ArrayBitStack returns None without panicking, ensuring proper handling of underflow.
</issue_to_address>

### Comment 11
<location> `stax/README.md:9` </location>
<code_context>
+built in memory, please use serde-json with no_std.
+
+However - pull parsing is useful when you need to process large streams within
+constained memory, without building the entire document, and just picking
+elements from the dataset that the application needs.
+
</code_context>

<issue_to_address>
Typo: 'constained' should be 'constrained'.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
constained memory, without building the entire document, and just picking
=======
constrained memory, without building the entire document, and just picking
>>>>>>> REPLACE

</suggested_fix>

### Comment 12
<location> `DESIGN.md:156` </location>
<code_context>
+So that the input can come no-copy from any source with low buffering
+
+Note std::io has Read trait, but unfortunately that's not available in core::, so probably have to
+make our own, and auto-implent it for arrays and slices or for anything that looks like AsRef<[u8]>
+
+## 7. TODO: Working with returned values
</code_context>

<issue_to_address>
Typo: 'auto-implent' should be 'auto-implement'.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
make our own, and auto-implent it for arrays and slices or for anything that looks like AsRef<[u8]>
=======
make our own, and auto-implement it for arrays and slices or for anything that looks like AsRef<[u8]>
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
stax/README.md (2)

9-9: Typo: 'constained' should be 'constrained'.


59-59: Documentation inconsistency with default features.

The documentation states that "full float and int64 support is enabled" by default, but the actual default features in Cargo.toml only include int64 support, not float support.

DESIGN.md (1)

156-156: Typo: 'auto-implent' should be 'auto-implement'.

stax/src/flex_parser.rs (2)

415-415: Potential stack overflow from unbounded recursion.

The recursive call could cause stack overflow with deeply nested or malformed JSON input. This issue is already tracked in GitHub issue #2.


183-185: Preserve tokenizer error information instead of discarding it.

The original tokenizer error is being discarded and replaced with a generic ParseError::TokenizerError. This loses valuable debugging information.

Consider mapping the tokenizer error properly:

-        if let Err(_tokenizer_error) = res {
-            return Err(ParseError::TokenizerError);
+        if let Err(tokenizer_error) = res {
+            return Err(ParseError::Tokenizer(tokenizer_error));
         }
stax/src/direct_parser.rs (1)

407-407: Consider explicit handling of unknown events

The catch-all pattern _ => EventResult::Continue could mask unexpected or future event types. Consider matching all known event variants explicitly and handling unknown events with an error or unreachable macro for better debugging.

🧹 Nitpick comments (3)
stax/src/flex_parser.rs (1)

196-423: Consider refactoring the large next_event method for better maintainability.

The next_event method is over 200 lines long and handles many different token types. This makes it difficult to understand and maintain.

Consider extracting the token handling logic into separate methods:

  • handle_container_tokens for object/array start/end
  • handle_string_key_tokens for string and key processing
  • handle_number_tokens for number parsing
  • handle_value_tokens for bool/null values
  • handle_escape_tokens for escape sequences

This would make the main method more readable and each handler easier to test independently.

stax/src/escape_processor.rs (1)

276-280: Consider documenting surrogate pair handling for full Unicode support.

The comment correctly notes that emoji like U+1F60A require surrogate pairs in JSON. While the current implementation handles Basic Multilingual Plane (BMP) characters correctly, characters outside the BMP need surrogate pair decoding.

Would you like me to create an issue to track adding full Unicode support with surrogate pair handling? This would involve:

  • Detecting high surrogate (U+D800-U+DBFF) and low surrogate (U+DC00-U+DFFF) ranges
  • Combining surrogate pairs into the final codepoint
  • Adding tests for emoji and other non-BMP characters
stax/src/direct_parser.rs (1)

500-503: Consider preserving reader error details

Mapping all reader errors to ParseError::ReaderError loses important error information that could be valuable for debugging I/O issues. Consider adding a variant that wraps the original error or provides more specific error context.

-                .map_err(|_| ParseError::ReaderError)?;
+                .map_err(ParseError::ReaderError)?;

This would require updating ParseError::ReaderError to wrap the original error type.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 480285b and 49379ab.

📒 Files selected for processing (7)
  • DESIGN.md (1 hunks)
  • stax/README.md (1 hunks)
  • stax/src/direct_buffer.rs (1 hunks)
  • stax/src/direct_parser.rs (1 hunks)
  • stax/src/escape_processor.rs (1 hunks)
  • stax/src/flex_parser.rs (1 hunks)
  • stax/src/shared.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • stax/src/shared.rs
  • stax/src/direct_buffer.rs
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
DESIGN.md (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/escape_processor.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/direct_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/flex_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/README.md (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
🧬 Code Graph Analysis (1)
stax/src/flex_parser.rs (7)
stax/src/json_number.rs (5)
  • fmt (124-141)
  • as_str (81-86)
  • as_int (57-65)
  • as_f64 (70-77)
  • is_float (104-106)
tokenizer/src/tokenizer/mod.rs (4)
  • fmt (232-238)
  • new (22-28)
  • new (222-228)
  • new (248-254)
stax/src/shared.rs (6)
  • new (87-92)
  • unicode_escape_bounds (169-174)
  • invalid_unicode_length (266-268)
  • end_position_excluding_delimiter (184-186)
  • state_mismatch (224-233)
  • number_start_from_current (145-147)
stax/src/escape_processor.rs (2)
  • new (161-166)
  • process_escape_token (51-55)
stax/src/slice_input_buffer.rs (2)
  • new (40-42)
  • current_pos (36-38)
stax/src/copy_on_escape.rs (1)
  • new (35-46)
stax/src/number_parser.rs (2)
  • new (95-101)
  • parse_number_event_simple (63-81)
🪛 LanguageTool
DESIGN.md

[uncategorized] ~152-~152: The preposition ‘to’ seems more likely in this position.
Context: ... input IMPORTANT!!! More: In addition of taking just slice [u8] as input, we sho...

(AI_HYDRA_LEO_REPLACE_OF_TO)


[grammar] ~162-~162: It appears that an article is missing.
Context: ... like println! is convenient and easy. Same should be done with Number, but it's a ...

(MISSING_ARTICLE)


[uncategorized] ~169-~169: A determiner appears to be missing. Consider inserting it.
Context: ...ting should do sensible default things. Most tricky is number formatting. The object...

(AI_EN_LECTOR_MISSING_DETERMINER)

stax/README.md

[uncategorized] ~45-~45: Did you mean: “By default,”?
Context: ...state, one bit for every nesting level. By default this is a 32-bit int, but can be change...

(BY_DEFAULT_COMMA)


[uncategorized] ~50-~50: The preposition ‘as’ seems more likely in this position.
Context: ...nt64 ( default ) - numbers are returned in int64 values * int32 - integers are re...

(AI_HYDRA_LEO_REPLACE_IN_AS)


[uncategorized] ~53-~53: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...pport is included. * float-error - Any floating point input will yield an error, to reduce fl...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~59-~59: Did you mean: “By default,”?
Context: ... Please see examples/no_float_demo.rs By default full float and int64 support is enabled...

(BY_DEFAULT_COMMA)

🔇 Additional comments (7)
stax/src/escape_processor.rs (1)

1-488: Well-structured escape processing module with comprehensive test coverage.

The module provides a clean separation of concerns with:

  • Pure functions for escape processing that can be reused across components
  • Clear documentation with usage examples
  • Comprehensive test coverage including edge cases
  • Proper error handling with appropriate error types
stax/src/direct_parser.rs (6)

9-82: Well-designed trait and type definitions

The Reader trait contract is clearly documented with strong EOF semantics, and the internal enums provide good type safety for state management. The DirectParser struct layout is comprehensive and well-organized.


115-252: Complex but well-structured main parsing loop

The next_event() method handles multiple concerns effectively: pending events, buffer management, EOF detection, and tokenizer integration. The state transitions appear correct and the Unicode escape handling shows attention to detail.


411-496: Well-structured content extraction with proper error handling

The extraction methods properly handle both unescaped and borrowed content scenarios, with consistent patterns across string, key, and number types. The delegation to unified number parsing logic is a good design choice.


542-544: Verify error handling strategy for Unicode escapes

The code ignores errors from unicode_escape_collector.add_hex_digit() with the assumption that "error will be handled by tokenizer". Ensure this approach doesn't mask important validation failures that could lead to inconsistent state.


557-699: Well-implemented escape processing and state management

The escape processing methods show good separation of concerns and proper integration with the DirectBuffer. The state management for unescaped content resets is handled correctly with the ProcessingState enum.


701-1396: Comprehensive and well-structured test suite

The test suite provides excellent coverage of core functionality, edge cases, and error scenarios. The comparison test between DirectParser and FlexParser ensures consistency across parser implementations. The tests are well-organized and use appropriate assertions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
stax/README.md (1)

59-59: Documentation inconsistency with default features

The documentation states "full float and int64 support is enabled" by default, but this conflicts with the actual default features configuration which only includes int64 without the float feature.

stax/src/direct_parser.rs (2)

500-503: Reader error information is lost

The Reader error is being discarded and mapped to a generic ParseError::ReaderError, losing important error details that could help with debugging I/O issues.


407-407: Consider explicit handling of unknown events

The catch-all pattern _ => EventResult::Continue could mask unexpected events. Consider matching all known event variants explicitly to improve error detection.

🧹 Nitpick comments (2)
stax/README.md (2)

50-55: Fix markdown list indentation

The bullet points are incorrectly indented according to markdownlint. Lists should start at column 0 without leading spaces.

- * int64 ( default ) - numbers are returned in int64 values
- * int32 - integers are returned as int32, to avoid 64-bit math on constrained targets, e.g. Cortex-M0
- * float - full float support is included.
- * float-error - Any floating point input will yield an error, to reduce float math dependency
- * float-skip - Float values are skipped.
- * float-truncate - float values are truncated to integers. Scientific notation will generate an error
+* int64 ( default ) - numbers are returned in int64 values
+* int32 - integers are returned as int32, to avoid 64-bit math on constrained targets, e.g. Cortex-M0
+* float - full float support is included.
+* float-error - Any floating point input will yield an error, to reduce float math dependency
+* float-skip - Float values are skipped.
+* float-truncate - float values are truncated to integers. Scientific notation will generate an error

45-45: Add comma for better readability

Consider adding a comma after "By default" for improved grammatical flow.

-By default this is a 32-bit int, but can be changed
+By default, this is a 32-bit int, but can be changed
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49379ab and c503369.

📒 Files selected for processing (5)
  • stax/Cargo.toml (1 hunks)
  • stax/README.md (1 hunks)
  • stax/src/direct_buffer.rs (1 hunks)
  • stax/src/direct_parser.rs (1 hunks)
  • stax/src/json_number.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • stax/Cargo.toml
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/direct_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/direct_buffer.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/README.md (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
🪛 LanguageTool
stax/README.md

[uncategorized] ~45-~45: Did you mean: “By default,”?
Context: ...state, one bit for every nesting level. By default this is a 32-bit int, but can be change...

(BY_DEFAULT_COMMA)


[uncategorized] ~50-~50: The preposition ‘as’ seems more likely in this position.
Context: ...nt64 ( default ) - numbers are returned in int64 values * int32 - integers are re...

(AI_HYDRA_LEO_REPLACE_IN_AS)


[uncategorized] ~53-~53: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...pport is included. * float-error - Any floating point input will yield an error, to reduce fl...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 markdownlint-cli2 (0.17.2)
stax/README.md

50-50: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


51-51: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


52-52: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


53-53: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


54-54: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


55-55: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

🔇 Additional comments (5)
stax/src/json_number.rs (1)

1-347: Excellent implementation of configurable number parsing

This module demonstrates excellent design with comprehensive feature flag support, proper error handling, and extensive test coverage. The conditional compilation logic correctly handles different number parsing strategies based on build configuration.

stax/src/direct_parser.rs (1)

84-103: Well-structured constructor with clear state initialization

The new method properly initializes all state components and provides clear documentation of the state machine setup.

stax/src/direct_buffer.rs (3)

121-124: Proper error handling for buffer overflow

Good improvement from past reviews - the code now returns BufferFull error instead of silently truncating data when the span exceeds buffer capacity.


167-176: Escape reserve constraint properly enforced

The append_unescaped_byte method correctly checks against available_space which accounts for the escape reserve, addressing the previous concern about not enforcing the escape reserve constraint.


514-531: Clean trait implementation for number extraction

The NumberExtractor trait implementation provides a clean abstraction for number parsing integration with appropriate error mapping.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
tokenizer/src/tokenizer/mod.rs (1)

859-865: Trailing comma check for objects has been properly implemented

The past review comment about missing trailing comma checks for objects has been addressed. The implementation correctly detects and reports trailing commas in objects.

stax/src/direct_parser.rs (1)

524-526: Error handling improvements properly implemented

The past review comments about error handling have been addressed:

  • Reader errors are now properly wrapped in ParseError::ReaderError
  • Errors from append_byte_to_escape_buffer are correctly propagated

These changes prevent silent failures and improve error transparency.

Also applies to: 684-687

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c503369 and 54313c9.

📒 Files selected for processing (8)
  • stax/src/direct_parser.rs (1 hunks)
  • stax/src/slice_input_buffer.rs (1 hunks)
  • stax/tests/api_test_errors.rs (1 hunks)
  • stax/tests/configurable_numbers.rs (1 hunks)
  • stax/tests/debug_root_numbers.rs (1 hunks)
  • tokenizer/src/bitstack/mod.rs (1 hunks)
  • tokenizer/src/tokenizer/mod.rs (1 hunks)
  • tokenizer/tests/array_bitstack_test.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • tokenizer/tests/array_bitstack_test.rs
  • stax/tests/configurable_numbers.rs
  • tokenizer/src/bitstack/mod.rs
  • stax/src/slice_input_buffer.rs
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/tests/debug_root_numbers.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/tests/api_test_errors.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
tokenizer/src/tokenizer/mod.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/direct_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
🧬 Code Graph Analysis (1)
stax/tests/debug_root_numbers.rs (1)
stax/src/flex_parser.rs (1)
  • new_with_buffer (74-85)
🔇 Additional comments (6)
stax/tests/debug_root_numbers.rs (3)

1-2: LGTM! Clear imports and purpose.

The file comment clearly describes its purpose for debugging root-level number parsing, and the imports are appropriate for the testing functionality.


4-32: Well-implemented debug helper with good safety mechanisms.

The test_json helper function is well-designed with:

  • Clear debug output showing input and parsed events
  • Proper error handling for parsing failures
  • Safety mechanism to prevent infinite loops (10-event limit)
  • Appropriate scratch buffer allocation for test purposes

The implementation follows good practices for debug/test utilities.


34-48: Comprehensive test coverage for JSON primitives and structures.

The test function covers a good range of JSON inputs:

  • All primitive types (numbers, strings, booleans, null)
  • Both simple and complex structured data (objects and arrays)
  • Comparison between root-level and nested values

This provides valuable debugging coverage for the JSON parser's handling of different value types.

stax/src/direct_parser.rs (3)

9-22: Well-designed Reader trait with clear contract

The Reader trait has a well-documented contract that clearly specifies the EOF behavior, addressing previous concerns about potential infinite loops. The documentation explicitly states that returning 0 must indicate true end of stream.


49-61: Excellent state machine design with enforced invariants

The ProcessingState enum provides a robust state machine that enforces logical invariants. The separation between Active and Finished states prevents invalid state transitions and makes the code more maintainable.


814-880: Comprehensive test coverage with excellent edge case handling

The test suite thoroughly covers critical edge cases:

  • Pending state handling for numbers at container boundaries
  • Rapid consecutive container endings
  • Parser consistency verification between implementations
  • Various escape sequence scenarios

These tests ensure robust parser behavior in complex scenarios.

Also applies to: 904-969, 999-1042

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
stax/src/number_parser.rs (1)

40-47: Improved delimiter exclusion logic addresses previous concerns.

The delimiter handling logic now properly considers both container end context and EOF conditions, which addresses the previous feedback about edge case handling. The logic correctly excludes delimiters when terminated by container end or when not at EOF.

stax/src/flex_parser.rs (1)

391-411: EscapeSequence event handlers serve an important architectural purpose.

Based on the codebase learning, these handlers exist to avoid catch-all patterns in match statements, even though this parser variant doesn't process EscapeSequence events. The other parser variant (direct parser) does need them, making this a sound architectural decision.

tokenizer/src/tokenizer/mod.rs (1)

861-867: The object trailing comma check is properly implemented.

The past review comment about missing trailing comma check for objects is no longer applicable. The code correctly detects and reports trailing commas in objects.

🧹 Nitpick comments (4)
stax/src/config_check.rs (1)

24-29: Remove redundant validation check.

This check for all three float behavior features being enabled simultaneously is redundant since the pairwise checks above already prevent this scenario. If any two features are mutually exclusive, then having all three enabled is impossible.

Consider removing this redundant check:

-#[cfg(all(
-    feature = "float-skip",
-    feature = "float-error",
-    feature = "float-truncate"
-))]
-compile_error!("Cannot enable multiple float behavior features: choose only one of 'float-skip', 'float-error', or 'float-truncate'");
tokenizer/src/bitstack/mod.rs (1)

105-185: Consider adding empty stack behavior tests.

While the current tests are comprehensive, they don't verify edge case behavior when the stack is empty, such as what happens when pop() or top() are called on a freshly created stack.

Add tests for empty stack edge cases:

+    #[test]
+    fn test_empty_stack_behavior() {
+        let mut bitstack = 0u8;
+        // Test behavior on empty stack
+        let first_pop = bitstack.pop();
+        let first_top = bitstack.top();
+        
+        let mut array_bitstack: ArrayBitStack<2, u8> = ArrayBitStack::default();
+        let array_pop = array_bitstack.pop();
+        let array_top = array_bitstack.top();
+        
+        // Document the expected behavior for empty stacks
+        // (currently returns LSB of zero-initialized data)
+    }
tokenizer/src/tokenizer/mod.rs (1)

397-1050: Consider refactoring the parse_chunk_inner method for better maintainability.

The parse_chunk_inner method is over 650 lines long with a massive match statement. While the implementation appears correct, this complexity could make maintenance and debugging more difficult.

Consider extracting logical groups of match arms into separate helper methods, such as:

impl<T: BitStack + core::fmt::Debug, D: BitStackCore> Tokenizer<T, D> {
    fn handle_number_state(&mut self, state: &Num, data: u8, pos: usize, callback: &mut dyn FnMut(Event, usize)) -> Result<State, Error> {
        // Extract number parsing logic here
    }
    
    fn handle_string_state(&mut self, state: &String, key: bool, data: u8, pos: usize, callback: &mut dyn FnMut(Event, usize)) -> Result<State, Error> {
        // Extract string parsing logic here
    }
    
    fn handle_structural_tokens(&mut self, data: u8, pos: usize, callback: &mut dyn FnMut(Event, usize)) -> Result<State, Error> {
        // Extract object/array start logic here
    }
}
stax/src/slice_input_buffer.rs (1)

52-55: Remove unused method or add usage.

The slice_to_current method is never used in the codebase.

Either remove this method if it's not needed, or ensure it's used where appropriate:

-    /// Gets a slice from start position to current position - 1.
-    /// Useful for extracting tokens that end at the current position.
-    pub fn slice_to_current(&self, start: usize) -> &'a [u8] {
-        &self.data[start..self.pos.saturating_sub(1)]
-    }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54313c9 and 2161020.

📒 Files selected for processing (18)
  • .github/workflows/build.yaml (1 hunks)
  • stax/src/config_check.rs (1 hunks)
  • stax/src/copy_on_escape.rs (1 hunks)
  • stax/src/direct_buffer.rs (1 hunks)
  • stax/src/direct_parser.rs (1 hunks)
  • stax/src/escape_processor.rs (1 hunks)
  • stax/src/flex_parser.rs (1 hunks)
  • stax/src/json_number.rs (1 hunks)
  • stax/src/json_string.rs (1 hunks)
  • stax/src/lib.rs (1 hunks)
  • stax/src/number_parser.rs (1 hunks)
  • stax/src/shared.rs (1 hunks)
  • stax/src/slice_input_buffer.rs (1 hunks)
  • tokenizer/src/bin/main.rs (1 hunks)
  • tokenizer/src/bitstack/mod.rs (1 hunks)
  • tokenizer/src/lib.rs (1 hunks)
  • tokenizer/src/tokenizer/mod.rs (1 hunks)
  • tokenizer/tests/array_bitstack_test.rs (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • stax/src/json_string.rs
🚧 Files skipped from review as they are similar to previous changes (5)
  • tokenizer/src/bin/main.rs
  • tokenizer/src/lib.rs
  • tokenizer/tests/array_bitstack_test.rs
  • stax/src/lib.rs
  • stax/src/direct_buffer.rs
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/config_check.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.845Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.
stax/src/number_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/flex_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/copy_on_escape.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/escape_processor.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
tokenizer/src/tokenizer/mod.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/direct_parser.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/shared.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/slice_input_buffer.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
🪛 actionlint (1.7.7)
.github/workflows/build.yaml

13-13: the runner of "actions/checkout@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


16-16: the runner of "actions-rs/toolchain@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


23-23: the runner of "actions-rs/cargo@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


39-39: the runner of "actions/checkout@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


42-42: the runner of "actions-rs/toolchain@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 GitHub Check: Check
stax/src/escape_processor.rs

[warning] 142-142:
associated function parse_unicode_escape_from_str is never used

stax/src/shared.rs

[warning] 246-246:
associated function validate_buffer_bounds is never used

stax/src/slice_input_buffer.rs

[warning] 53-53:
method slice_to_current is never used

🔇 Additional comments (23)
stax/src/config_check.rs (3)

1-12: LGTM! Well-structured compile-time validation.

The module documentation is clear and the integer width validation correctly prevents conflicting feature combinations with an informative error message.


14-23: LGTM! Comprehensive pairwise validation.

The pairwise validation of float behavior features is thorough and covers all possible conflicting combinations with clear error messages.


31-39: LGTM! Logical feature conflict prevention.

The validation correctly prevents the main 'float' feature from being combined with float-behavior features, with clear explanations that the behavior features are only intended for when float parsing is disabled.

stax/src/shared.rs (2)

103-189: Excellent use of saturating arithmetic for safety.

The ContentRange utility struct demonstrates good safety practices by using saturating_sub() to prevent underflow in position calculations. This prevents potential panics when dealing with boundary conditions in JSON parsing.


191-276: Well-designed error handling utilities.

The ParserErrorHandler provides consistent error creation patterns and UTF-8 validation. The predefined error messages in state_mismatch work well for no_std environments where dynamic formatting isn't available.

tokenizer/src/bitstack/mod.rs (1)

63-78: Good resolution of hardcoded bit shift issues.

The implementation now correctly calculates bit width dynamically using core::mem::size_of::<T>() * 8, which addresses the previous critical issues with hardcoded shifts that only worked for u8 types.

stax/src/number_parser.rs (2)

6-24: Well-designed abstraction for unified number parsing.

The NumberExtractor trait provides a clean abstraction that allows the same parsing logic to work with different buffer implementations (FlexParser and DirectParser), promoting code reuse and consistency.


85-166: Comprehensive test coverage with realistic scenarios.

The tests effectively cover different parsing scenarios including simple parsing, container end delimiter exclusion, and EOF handling. The mock implementation provides good isolation for testing the parsing logic.

stax/src/flex_parser.rs (3)

30-87: Excellent dual constructor design for different use cases.

The two constructors (new and new_with_buffer) provide a clean API that optimizes for the common case (no escapes) while supporting the complex case (with escapes). The zero-length internal buffer is a clever optimization.


200-425: Robust event processing with comprehensive state management.

The next_event method demonstrates solid parser design with proper state transitions, error handling, and delegation to specialized components like CopyOnEscape and UnicodeEscapeCollector. The pattern matching is comprehensive and handles all tokenizer events appropriately.


428-770: Thorough test coverage validates parser behavior.

The test suite comprehensively covers different JSON constructs (objects, arrays, numbers, booleans, nulls), string handling (both borrowed and unescaped), escape sequences, and Unicode escapes. The tests effectively validate both the fast path (zero-copy) and slow path (unescaping) optimizations.

stax/src/copy_on_escape.rs (1)

10-175: Well-implemented copy-on-escape mechanism with proper buffer management.

The implementation correctly handles:

  • Zero-copy optimization for strings without escapes
  • Proper bounds checking to prevent buffer overflows
  • UTF-8 validation for safety
  • Clever global scratch position tracking to enable buffer reuse across multiple strings

The comprehensive test coverage demonstrates the robustness of the implementation.

stax/src/direct_parser.rs (5)

11-84: Well-designed trait and type definitions.

The Reader trait contract is clearly documented, and the enum types provide comprehensive state management for the streaming parser. The ProcessingState enum effectively enforces logical invariants.


117-254: Solid implementation of the main parsing state machine.

The next_event() method properly handles complex scenarios including pending container ends, EOF detection, and tokenizer integration. The state transitions are logical and the event processing delegation is well-structured.


257-433: Comprehensive event processing with proper separation of concerns.

The method handles all known ujson::Event variants explicitly (addressing previous feedback about catch-all patterns) and correctly implements the complex logic for container ends that require pending number extraction. Escape sequence handling is properly delegated to specialized methods.


435-520: Well-implemented content extraction with proper state validation.

The extraction methods correctly validate parser state, handle both borrowed and unescaped content variants, and use unified number parsing logic for consistency. Error handling provides meaningful context through ParserErrorHandler.


580-1422: Excellent escape processing implementation and comprehensive test coverage.

The escape processing correctly handles both simple and Unicode escape sequences using shared components. The state management ensures proper cleanup of unescaped content. The test suite is exemplary, covering edge cases, error recovery, and parser consistency validation.

stax/src/escape_processor.rs (3)

142-142: False positive: Function is used in tests

The static analysis warning about parse_unicode_escape_from_str being unused is incorrect - it's used in the test module at lines 312 and 315.


10-149: Well-designed escape processing utilities

The EscapeProcessor implementation is excellent:

  • Comprehensive handling of all JSON escape sequences
  • Proper validation and error handling
  • Modular design with static methods for reusability
  • The previous error type issue has been properly addressed

151-212: Robust Unicode escape collector implementation

The UnicodeEscapeCollector provides a safe, stateful way to collect Unicode escape sequences:

  • Proper bounds checking prevents buffer overflow
  • Clear API with reset functionality
  • Good separation of concerns between collection and processing
stax/src/json_number.rs (3)

14-112: Excellent configurable number representation design

The NumberResult and JsonNumber enums provide a well-thought-out abstraction:

  • Clear handling of different number types based on feature flags
  • Zero-allocation design with borrowed/copied string variants
  • The suggested documentation for is_float() has been properly implemented
  • Comprehensive API for accessing parsed values and raw strings

171-210: Robust float handling for embedded scenarios

The float parsing implementation without float support is well-designed:

  • Correctly errors on scientific notation in truncate mode (avoiding float math)
  • Safe integer extraction from decimal numbers
  • Clear behavior for each configuration mode
  • The previous undefined variable issue has been fixed

128-147: Smart Display implementation preserves precision

The Display implementation uses an intelligent strategy:

  • Shows parsed values when they fit
  • Falls back to raw strings for overflow/unsupported cases
  • This ensures no loss of precision for large numbers

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
.github/workflows/build.yaml (2)

33-36: matrix key is misspelled → job will receive no feature flag

The first include entry sets fatures instead of features, so
${{ matrix.features }} expands to an empty string → the build runs only the “No features” variant and silently skips the default-feature build.

-          - name: Default
-            fatures: ""
+          - name: Default
+            features: ""

48-50: Step name says “test” but only builds; tests never run

The CI never executes cargo test, defeating the purpose of the “Tests” job.

-      - name: Run cargo test
-        working-directory: stax
-        run: cargo build ${{ matrix.features }}
+      - name: Run cargo tests
+        working-directory: stax
+        run: cargo test ${{ matrix.features }}
🧹 Nitpick comments (1)
.github/workflows/build.yaml (1)

22-26: Opportunity: cache Cargo registry & target for faster builds

A simple actions/cache step can cut CI times dramatically:

      - uses: actions/cache@v4
        with:
          path: |
            ~/.cargo/registry
            ~/.cargo/git
            target
          key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2161020 and a6adcac.

📒 Files selected for processing (3)
  • .github/workflows/build.yaml (1 hunks)
  • stax/src/flex_parser.rs (1 hunks)
  • stax/src/shared.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.754Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/shared.rs (3)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.754Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.845Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.
stax/src/flex_parser.rs (2)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.754Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
🪛 actionlint (1.7.7)
.github/workflows/build.yaml

16-16: the runner of "actions-rs/toolchain@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


39-39: the runner of "actions/checkout@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


42-42: the runner of "actions-rs/toolchain@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🔇 Additional comments (9)
stax/src/flex_parser.rs (5)

44-73: Well-designed constructor with clear constraints.

The new method properly handles the zero-allocation case with good documentation about when to use it versus new_with_buffer. The mutable reference to an empty slice for the internal buffer is a clever approach.


177-203: Appropriate error handling for tokenizer integration.

The error handling properly converts tokenizer errors to ParseError::TokenizerError and handles end-of-data scenarios correctly. The callback-based event collection is well-implemented.


214-412: Complex but well-structured main parsing loop.

The next_event method effectively handles all JSON event types with proper state transitions. The use of EventResult enum for control flow is clean, and the match expressions are comprehensive.

The loop-based structure properly avoids the stack overflow issues that could occur with recursive implementations.


334-341: Proper handling of EscapeSequence events.

These handlers correctly implement the design pattern of avoiding catch-all matches while not processing events that aren't needed by this parser variant, consistent with the architectural decision explained in the retrieved learnings.


415-757: Comprehensive test coverage.

The test suite thoroughly covers the parser functionality including edge cases, escape handling, Unicode processing, and the copy-on-escape optimization. The tests demonstrate both the zero-copy and unescaping paths effectively.

stax/src/shared.rs (4)

6-29: Well-designed Event enum with proper lifetime management.

The Event enum comprehensively covers all JSON parsing events with appropriate lifetime parameters that support both borrowed and owned string data through the String and JsonNumber types.


31-65: Comprehensive error handling with conditional compilation.

The ParseError enum covers all necessary error cases including tokenizer errors, buffer constraints, UTF-8 validation, and specialized cases like Unicode handling. The conditional compilation for float features is properly implemented.


107-189: Safe and well-documented utility functions.

The ContentRange utility functions use saturating_sub() consistently to prevent underflow and are well-documented with clear parameter descriptions. These functions provide a clean abstraction for boundary calculations across different parsing contexts.


195-251: Consistent error handling utilities.

The ParserErrorHandler provides good abstractions for common error creation patterns, including UTF-8 validation and state mismatch handling with predefined messages for no_std compatibility.

@kaidokert
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an impressive initial implementation, establishing a solid foundation for a streaming JSON parser. The code is well-structured, extensively documented, and includes a comprehensive set of features and tests. The design choices, such as zero-allocation, configurable number handling, and the "copy-on-escape" strategy, are well-thought-out and clearly explained in the design documents.

My review has identified a few areas for improvement, including a critical bug related to infinite recursion and some opportunities to enhance code clarity and maintainability. Please see the detailed comments below.

Overall, this is excellent work. I'm looking forward to seeing this project evolve.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
stax/src/direct_parser.rs (1)

25-38: Address the misleading 'static lifetimes in EventResult.

The Complete variant uses 'static lifetimes which is misleading since Event is generic over lifetimes 'a and 'b. While this works because the variant is only used for lifetime-independent events (containers, booleans, null), it obscures the actual lifetime relationships and could cause confusion during maintenance.

Consider making EventResult generic over the buffer lifetime to improve type safety and clarity:

-enum EventResult {
+enum EventResult<'b> {
     /// Event processing complete, return this event
-    Complete(Event<'static, 'static>),
+    Complete(Event<'b, 'b>),
     /// Continue processing, no event to return yet
     Continue,
     // ... rest unchanged
 }

This change would require updating the method signatures but would make the lifetime relationships explicit and prevent potential issues if the usage patterns change in the future.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eae1257 and 3eee892.

📒 Files selected for processing (4)
  • stax/src/direct_buffer.rs (1 hunks)
  • stax/src/direct_parser.rs (1 hunks)
  • stax/src/flex_parser.rs (1 hunks)
  • stax/tests/configurable_numbers.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • stax/tests/configurable_numbers.rs
  • stax/src/flex_parser.rs
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.754Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
stax/src/direct_parser.rs (2)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.754Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
stax/src/direct_buffer.rs (2)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:29.968Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.754Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
🧬 Code Graph Analysis (1)
stax/src/direct_parser.rs (9)
stax/examples/direct_parser_demo.rs (2)
  • read (28-48)
  • new (16-22)
stax/src/json_number.rs (2)
  • fmt (129-146)
  • as_str (83-88)
stax/src/json_string.rs (2)
  • fmt (44-46)
  • as_str (18-23)
stax/src/direct_buffer.rs (1)
  • new (48-59)
stax/src/slice_input_buffer.rs (2)
  • new (44-46)
  • current_pos (40-42)
stax/src/flex_parser.rs (3)
  • new (60-73)
  • next (200-205)
  • next_event (209-386)
stax/src/shared.rs (8)
  • new (89-94)
  • quote_position_from_current (159-161)
  • number_start_from_current (147-149)
  • state_mismatch (226-235)
  • bytes_to_utf8_str (203-205)
  • string_content_bounds (117-121)
  • string_content_bounds_before_escape (131-138)
  • invalid_unicode_escape (238-240)
stax/src/escape_processor.rs (3)
  • new (152-157)
  • hex_count (198-200)
  • process_escape_token (53-57)
stax/src/number_parser.rs (2)
  • new (97-103)
  • parse_number_event (33-61)
🔇 Additional comments (13)
stax/src/direct_parser.rs (7)

132-242: Excellent main parsing loop implementation.

The core parsing logic is well-architected with clear separation of concerns:

  • Proper buffer filling and end-of-data handling
  • Clean integration with tokenizer and DirectBuffer
  • Appropriate error propagation
  • Logical flow control with deferred event extraction

The state management and event processing patterns are consistent with the overall parser design.


194-202: Well-implemented Unicode escape sequence handling.

The Unicode escape processing correctly handles the tokenizer events:

  • Proper initialization and hex digit collection on Begin events
  • Correct completion and processing on End events
  • Good integration with the shared UnicodeEscapeCollector

The special case handling in the main loop ensures hex digits are captured at the right tokenizer state transitions.

Also applies to: 351-357


529-630: Robust escape sequence processing implementation.

The escape handling methods demonstrate excellent architecture:

  • Proper state management through ProcessingState enum
  • Clean integration with DirectBuffer for unescaping
  • Unified use of EscapeProcessor for simple escapes
  • Comprehensive Unicode escape handling with UnicodeEscapeCollector
  • Appropriate error handling and validation

The separation of concerns between different escape types and the deferred processing approach aligns well with the parser's overall design.


459-471: Good integration with unified number parsing.

The number extraction correctly delegates to the shared number_parser module, ensuring consistent behavior across parser implementations. The from_container_end context parameter properly handles delimiter interpretation for numbers terminated by container ends.


674-1437: Comprehensive and well-structured test suite.

The test coverage is excellent with:

  • Complete coverage of JSON constructs (objects, arrays, strings, numbers, booleans, nulls)
  • Proper edge case testing (nested containers, empty containers, error conditions)
  • Appropriate feature flag handling for different build configurations
  • Valuable parser comparison tests ensuring consistency with FlexParser
  • Thorough escape sequence and Unicode testing

The conditional compilation based on feature flags ensures tests work correctly across all supported configurations.


473-486: Excellent error handling and propagation.

The error handling throughout the parser is well-designed:

  • Proper error mapping from Reader to ParseError::ReaderError
  • Consistent error propagation from DirectBuffer operations
  • Meaningful error messages for state validation
  • Defensive error handling for unexpected tokenizer states

The error handling patterns ensure robustness without sacrificing performance.

Also applies to: 238-240


1-672: Excellent overall architecture and implementation.

The DirectParser demonstrates outstanding software engineering with:

Architecture:

  • Clean separation between tokenization, buffer management, and event processing
  • Elegant state machine design with ProcessingState enum
  • Proper abstraction layers and interfaces

Integration:

  • Excellent reuse of shared utilities (EscapeProcessor, UnicodeEscapeCollector, number_parser)
  • Consistent patterns with other parser implementations
  • Good integration with ujson tokenizer

Performance:

  • Zero-copy string extraction when possible
  • Efficient single-buffer design with DirectBuffer
  • Minimal allocations and smart escape processing

Robustness:

  • Comprehensive error handling and edge case coverage
  • Defensive programming against invalid states
  • Thorough test coverage with feature flag awareness

This implementation successfully provides a streaming JSON parser that complements the existing PullParserFlex while maintaining consistency and code quality standards.

stax/src/direct_buffer.rs (6)

33-59: Well-designed buffer structure and initialization.

The DirectBuffer design demonstrates excellent architectural decisions:

  • Clear field layout with comprehensive documentation
  • Smart escape reserve calculation (10% or 64 bytes minimum)
  • Good separation of concerns between tokenization position and data boundaries
  • Zero-copy design principles clearly documented

The constructor properly initializes all fields and calculates appropriate escape reserve for different buffer sizes.


62-81: Safe and efficient buffer reading operations.

The buffer reading methods demonstrate good safety practices:

  • Proper bounds checking before accessing buffer data
  • Appropriate error handling with meaningful error types
  • Safe arithmetic using saturating_sub to prevent underflow
  • Clear and simple interfaces for position management

85-101: Robust buffer filling operations for Reader integration.

The buffer filling methods provide safe integration with streaming input:

  • get_fill_slice properly returns available space or None when full
  • mark_filled includes validation to prevent buffer overruns
  • Clean interface that works well with the Reader trait contract
  • Appropriate error handling for invalid operations

105-178: Excellent escape processing support with proper safety guarantees.

The escape processing implementation demonstrates careful attention to safety:

  • start_unescaping_with_copy properly handles content copying with bounds checking
  • append_unescaped_byte correctly enforces escape reserve constraints
  • Proper error handling prevents buffer overruns and invalid states
  • Clean lifecycle management with clear_unescaped for cleanup

The implementation addresses past review concerns about escape reserve enforcement and silent truncation.


516-533: Correct NumberExtractor implementation.

The NumberExtractor trait implementation is now properly implemented:

  • get_number_slice correctly delegates to get_string_slice with appropriate error mapping
  • current_position returns the tokenize_pos field directly (no recursion)
  • is_empty properly compares tokenize_pos >= data_end (no recursion)

This addresses the infinite recursion issue mentioned in past reviews and provides the correct interface for unified number parsing.


216-514: Comprehensive test coverage with excellent edge case handling.

The test suite demonstrates thorough validation:

  • Complete coverage of buffer operations (filling, advancing, reading)
  • Proper testing of error conditions and boundary cases
  • Excellent edge case testing (minimal buffers, buffer full scenarios)
  • Thorough validation of escape reserve functionality
  • Good testing of the escape processing lifecycle

The tests include important scenarios like tiny buffers and escape reserve constraints that ensure robustness in real-world usage.

@kaidokert kaidokert merged commit 5c6d386 into main Jun 29, 2025
22 checks passed
@kaidokert kaidokert deleted the pr_one branch June 29, 2025 01:53
This was referenced Jul 15, 2025
@sourcery-ai sourcery-ai bot mentioned this pull request Jul 26, 2025
@coderabbitai coderabbitai bot mentioned this pull request Jul 26, 2025
@sourcery-ai sourcery-ai bot mentioned this pull request Jul 27, 2025
@sourcery-ai sourcery-ai bot mentioned this pull request Aug 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant