Skip to content

Conversation

kaidokert
Copy link
Owner

@kaidokert kaidokert commented Jul 26, 2025

Summary by Sourcery

Introduce a new no_std SAX-style push parser along with extensive conformance, stress, and edge-case tests, and enhance error handling in the core library.

New Features:

  • Add a SAX-style PushParser implementation with handler-based API and associated PushParseError type
  • Export PushParser, PushParseError, PushParserHandler, ChunkReader and DefaultConfig in the public API

Enhancements:

  • Extend ParseError with Utf8 and InputBufferFull variants and add Fromujson::Error conversion

Tests:

  • Add comprehensive conformance tests for PushParser and StreamParser against the JSON Checker suite
  • Add stress tests for PushParser covering various buffer sizes, chunk patterns, and unicode edge cases
  • Add minimal reproduction tests for InvalidSliceBounds issue
  • Add copy-on-escape optimization tests for borrowed vs unescaped string lifetimes
  • Add dedicated tests for escape sequence and unicode processing in PushParser

Summary by CodeRabbit

  • New Features

    • Introduced a new incremental JSON push parser with event-driven parsing and efficient memory management.
    • Added enhanced error types for UTF-8 errors and input buffer capacity issues.
    • Extended the public API with new parser interfaces and handler traits for customizable event processing.
  • Bug Fixes

    • Improved robustness in handling buffer boundaries and Unicode escape sequences during JSON parsing.
  • Tests

    • Added comprehensive conformance, unit, edge case, and stress tests for the push parser, including escape sequence handling, buffer isolation, and Unicode processing.
    • Developed new test utilities and handlers to support thorough parser validation across diverse scenarios.

Copy link

coderabbitai bot commented Jul 26, 2025

"""

Walkthrough

A new push-based, event-driven JSON parser (PushParser) is introduced to the picojson crate, along with its handler trait and error types. The public API is extended to include these components. The parser supports incremental, no_std parsing, robust escape sequence handling, and emits events to user-defined handlers. Extensive tests and conformance suites are added for validation, including stress, escape, copy-on-escape, and buffer boundary tests. Minor updates are made to error handling and a test float assertion.

Changes

File(s) Change Summary
picojson/src/lib.rs Added new module push_parser and re-exported PushParser, PushParserHandler, and PushParseError.
picojson/src/parse_error.rs Modified ParseError enum: removed InvalidUtf8, added Utf8 and InputBufferFull variants; implemented From<ujson::Error>.
picojson/src/push_parser.rs New file: Implements PushParser, handler trait, error types, event emission, escape and buffer management, and incremental parsing.
picojson/src/stream_parser.rs Modified test to assert float equality against core::f64::consts::PI.
picojson/tests/json_checker_tests.rs Extended conformance tests for PushParser and StreamParser; added handlers, test functions, and failure test macros.
picojson/tests/push_parser.rs New file: Comprehensive unit tests for PushParser covering events, escapes, keys, numbers, and buffer/lifetime edge cases.
picojson/tests/push_parser_copy_on_escape.rs New file: Tests for copy-on-escape optimization, buffer isolation, and borrowed vs. unescaped string handling in PushParser.
picojson/tests/push_parser_escapes.rs New file: Tests for escape sequence handling in strings and keys for PushParser (and comparison with SliceParser).
picojson/tests/push_parser_invalidslicebounds_repro.rs New file: Diagnostic tests reproducing buffer boundary and Unicode escape issues in PushParser.
picojson/tests/push_parser_stress_test.rs New file: Stress tests for PushParser with varying buffer sizes, chunking, Unicode escapes, and document validation scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PushParser
    participant Handler

    User->>PushParser: new(handler, buffer)
    loop For each data chunk
        User->>PushParser: write(data)
        PushParser->>Handler: handle_event(Event)
        Handler-->>PushParser: Result
    end
    User->>PushParser: finish()
    PushParser->>Handler: handle_event(Event::End)
    Handler-->>PushParser: Result
    User->>PushParser: destroy()
    PushParser-->>User: handler
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90 minutes

Poem

In the warren where JSON hops,
A pushy parser now never stops.
With escapes and buffers, it leaps with glee,
Handling chunks as swift as can be.
Tests abound like carrots in spring,
Ensuring each byte does the right thing.
🥕✨ Hooray for parsing—let the rabbits sing!
"""

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch push3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

sourcery-ai bot commented Jul 26, 2025

Reviewer's Guide

This PR integrates a new SAX-style PushParser into picojson alongside enhanced error reporting, exports, and a greatly expanded test suite for both PushParser and StreamParser conformance, stress, and edge‐case scenarios. It updates error variants, exposes the new push-based API, and adds comprehensive tests covering unicode escapes, buffer boundaries, and copy-on-escape behavior.

Class diagram for new PushParser and related types

classDiagram
    class PushParserHandler {
        <<trait>>
        +handle_event(event: Event) Result<(), E>
    }
    class PushParser {
        -handler: H
        -stream_buffer: StreamBuffer
        -core: ParserCore
        -unicode_escape_collector: UnicodeEscapeCollector
        -state: ParserState
        -escape_state: EscapeState
        -position_offset: usize
        -current_position: usize
        -token_start_pos: usize
        -using_unescaped_buffer: bool
        +new(handler: H, buffer: &mut [u8])
        +write(data: &[u8])
        +finish()
        +destroy() -> H
    }
    class PushParseError {
        Parse(ParseError)
        Handler(E)
    }
    class ParserState {
        Idle
        ParsingString
        ParsingKey
        ParsingNumber
    }
    class EscapeState {
        None
        InEscapeSequence
        InUnicodeEscape
    }
    PushParser o-- PushParserHandler : handler
    PushParser o-- StreamBuffer
    PushParser o-- ParserCore
    PushParser o-- UnicodeEscapeCollector
    PushParser --> ParserState
    PushParser --> EscapeState
    PushParserHandler <|.. PushParser : uses
    PushParser ..> PushParseError : errors
    PushParseError ..> ParseError
    PushParseError ..> Handler : generic
Loading

Class diagram for updated ParseError enum

classDiagram
    class ParseError {
        TokenizerError(ujson::Error)
        Utf8(core::str::Utf8Error)
        ScratchBufferFull
        InputBufferFull
        InvalidUtf8(core::str::Utf8Error)
        NumberParseError
        UnexpectedState
    }
    ParseError <|-- PushParseError : used in
Loading

File-Level Changes

Change Details Files
Extend ParseError to cover UTF-8 and buffer overflow cases and map underlying tokenizer errors
  • Add Utf8 and InputBufferFull error variants
  • Refine ScratchBufferFull documentation
  • Implement Fromujson::Error for ParseError
src/parse_error.rs
Expose PushParser API in library root
  • Import and re-export PushParser, PushParseError, PushParserHandler
  • Add new push_parser module declaration
src/lib.rs
Implement new no_std SAX-style PushParser
  • Define PushParserHandler trait and internal ParserState/EscapeState machines
  • Implement write/finish/destroy methods with HRTB support
  • Handle escape sequences, Unicode escapes, and switch to an unescaped buffer
  • Map and propagate handler and parse errors
src/push_parser.rs
Augment existing json_checker tests with PushParser and StreamParser conformance
  • Import ChunkReader, DefaultConfig, PushParser and related types
  • Define run_push_parser_test and run_stream_parser_test helpers
  • Add pass1/pass2/pass3 tests, failure cases, known deviations and unicode trace tests
tests/json_checker_tests.rs
Add comprehensive new test modules for PushParser edge cases and stress testing
  • Create push_parser.rs with unit tests for HRTB, borrow vs unescaped, escape handling, debug event snapshots
  • Add push_parser_stress_test.rs covering buffer sizes, chunk patterns, unicode edge cases, document validation
  • Include minimal repro and copy-on-escape isolation tests to pinpoint InvalidSliceBounds issues
  • Add dedicated escape and unicode processing comparison tests with SliceParser reference
tests/push_parser.rs
tests/push_parser_stress_test.rs
tests/push_parser_invalidslicebounds_repro.rs
tests/push_parser_copy_on_escape.rs
tests/push_parser_escapes.rs
Minor adjustments in StreamParser tests
  • Allow clippy::approx_constant in numeric comparison test
src/stream_parser.rs

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @kaidokert, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new PushParser to the picojson library, providing a SAX-style, event-driven JSON parsing mechanism optimized for no_std environments. This new parser is designed to handle JSON input incrementally, making it suitable for streaming data, and includes robust handling for string escapes and unicode characters.

Highlights

  • New Push-Based JSON Parser: Implemented PushParser and PushParserHandler to offer a SAX-style, event-driven interface for parsing JSON, specifically designed for no_std compatibility and incremental data processing.
  • Improved Error Reporting: Expanded the ParseError enum with new variants (Utf8, InputBufferFull) and refined existing ones to provide more precise error information during parsing.
  • Optimized String Handling: Introduced a 'copy-on-escape' mechanism within the PushParser to efficiently manage string content, allowing for zero-copy String::Borrowed where no escapes are present and allocating only when String::Unescaped content needs to be processed.
  • Extensive Test Coverage: Added a suite of new test files to thoroughly validate the PushParser's correctness, robustness, and performance across various JSON structures, escape sequences, and input chunking patterns.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kaidokert - I've reviewed your changes - here's some feedback:

  • The new push-parser tests are very comprehensive but extremely repetitive—consider factoring common setup and assertions into helper macros or parameterized test functions to improve readability and maintainability.
  • Currently handler errors in run_push_parser_test are all mapped to ParseError::InvalidNumber, which could obscure the real cause—consider introducing a dedicated error variant or preserving the original handler error message.
  • The push_parser.rs file is quite large; you might break it into smaller modules (e.g. escape handling, state machine, buffer management) to make the implementation easier to navigate and review.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new push-parser tests are very comprehensive but extremely repetitive—consider factoring common setup and assertions into helper macros or parameterized test functions to improve readability and maintainability.
- Currently handler errors in run_push_parser_test are all mapped to `ParseError::InvalidNumber`, which could obscure the real cause—consider introducing a dedicated error variant or preserving the original handler error message.
- The `push_parser.rs` file is quite large; you might break it into smaller modules (e.g. escape handling, state machine, buffer management) to make the implementation easier to navigate and review.

## Individual Comments

### Comment 1
<location> `picojson/src/push_parser.rs:72` </location>
<code_context>
+    }
+
+    /// Processes a chunk of input data.
+    pub fn write<'input, E>(&mut self, data: &'input [u8]) -> Result<(), PushParseError<E>>
+    where
+        H: for<'a, 'b> PushParserHandler<'a, 'b, E>,
</code_context>

<issue_to_address>
Event storage size is fixed and may not be sufficient for all tokenizer outputs.

Since event_storage is limited to two events, additional events may be lost if the tokenizer emits more. Consider using a Vec or ring buffer, or clearly document the two-event assumption.
</issue_to_address>

### Comment 2
<location> `picojson/src/push_parser.rs:629` </location>
<code_context>
+    }
+}
+
+fn create_tokenizer_callback(
+    event_storage: &mut [Option<(ujson::Event, usize)>; 2],
+    position_offset: usize,
</code_context>

<issue_to_address>
Callback ignores the actual event position from the tokenizer.

The callback uses position_offset instead of the _pos argument from the tokenizer, which may cause incorrect event position tracking. Please use _pos to ensure accurate event positions.
</issue_to_address>

### Comment 3
<location> `picojson/src/push_parser.rs:174` </location>
<code_context>
+    }
+
+    /// Finishes parsing and flushes any remaining events.
+    pub fn finish<E>(&mut self) -> Result<(), PushParseError<E>>
+    where
+        H: for<'a, 'b> PushParserHandler<'a, 'b, E>,
</code_context>

<issue_to_address>
No check for incomplete string or key parsing at finish.

Currently, finish only checks for unfinished numbers. Please add checks to ensure the parser is not left in ParsingString or ParsingKey states to prevent incomplete JSON from being accepted.
</issue_to_address>

### Comment 4
<location> `picojson/tests/push_parser.rs:17` </location>
<code_context>
         }
+
+        // Debug test for tracing SliceParser unicode escape processing
+        #[test]
+        fn test_slice_parser_unicode_trace() {
+            // Test the problematic sequence from pass1.json line 45
</code_context>

<issue_to_address>
Missing tests for invalid or malformed Unicode escape sequences.

Please add tests for cases like missing digits, invalid hex characters, or truncated escapes to verify the parser correctly returns errors.
</issue_to_address>

### Comment 5
<location> `picojson/tests/push_parser_copy_on_escape.rs:7` </location>
<code_context>
         }
+
+        // Debug test for tracing SliceParser unicode escape processing
+        #[test]
+        fn test_slice_parser_unicode_trace() {
+            // Test the problematic sequence from pass1.json line 45
</code_context>

<issue_to_address>
Missing test for mixed borrowed and unescaped strings in the same object.

Please add a test case where an object includes both borrowed and unescaped strings to verify correct parser behavior in mixed scenarios.
</issue_to_address>

### Comment 6
<location> `picojson/tests/push_parser_escapes.rs:35` </location>
<code_context>
         }
+
+        // Debug test for tracing SliceParser unicode escape processing
+        #[test]
+        fn test_slice_parser_unicode_trace() {
+            // Test the problematic sequence from pass1.json line 45
</code_context>

<issue_to_address>
Missing test for invalid escape sequences in keys.

Please add tests for cases like incomplete \u escapes or unsupported escape characters to verify the parser correctly returns errors.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

}
}

fn create_tokenizer_callback(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Callback ignores the actual event position from the tokenizer.

The callback uses position_offset instead of the _pos argument from the tokenizer, which may cause incorrect event position tracking. Please use _pos to ensure accurate event positions.

}

/// Finishes parsing and flushes any remaining events.
pub fn finish<E>(&mut self) -> Result<(), PushParseError<E>>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): No check for incomplete string or key parsing at finish.

Currently, finish only checks for unfinished numbers. Please add checks to ensure the parser is not left in ParsingString or ParsingKey states to prevent incomplete JSON from being accepted.

Comment on lines +17 to +26
#[test]
fn test_clean_push_parser_compiles() {
let mut buffer = [0u8; 256];
let handler = SimpleHandler;
let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);

// This should compile without lifetime issues using HRTB + tokenizer + event array
parser.write(b"true").unwrap(); // Valid JSON
parser.finish::<()>().unwrap();
let _handler = parser.destroy();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Missing tests for invalid or malformed Unicode escape sequences.

Please add tests for cases like missing digits, invalid hex characters, or truncated escapes to verify the parser correctly returns errors.

Comment on lines +7 to +8
#[test]
fn test_borrowed_vs_unescaped_simple() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Missing test for mixed borrowed and unescaped strings in the same object.

Please add a test case where an object includes both borrowed and unescaped strings to verify correct parser behavior in mixed scenarios.

Comment on lines +35 to +44
#[test]
fn test_string_with_actual_escapes() {
// Test that escape sequences in strings are properly processed
let json_string = r#"{"message": "Hello\nWorld\t!"}"#;
let json = json_string.as_bytes();

let handler = EventCollector::new();
let mut buffer = [0u8; 256];
let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Missing test for invalid escape sequences in keys.

Please add tests for cases like incomplete \u escapes or unsupported escape characters to verify the parser correctly returns errors.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new PushParser, a SAX-style JSON parser, along with extensive tests. The implementation is comprehensive but has a few critical issues. The most significant is an inefficient byte-by-byte processing loop in the write method, which should be refactored to handle chunks for better performance. Additionally, several error conditions, particularly in Unicode escape handling, are silently ignored, which could lead to incorrect parsing or hide errors in the input. The new tests are a great addition, but many of them print results instead of using assertions, and some test helpers use patterns that could make debugging difficult. My review focuses on improving correctness, performance, and test quality.

Comment on lines +78 to +88
for (local_pos, &byte) in data.iter().enumerate() {
// Update current position to absolute position
self.current_position = self.position_offset + local_pos;
let mut append_byte = true;

// Tokenizer generates events that drive state transitions.
{
let mut callback =
create_tokenizer_callback(&mut event_storage, self.current_position);
self.core.tokenizer.parse_chunk(&[byte], &mut callback)?;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The write method processes the input data slice byte-by-byte, calling self.core.tokenizer.parse_chunk(&[byte], ...) in a loop. This is highly inefficient as parse_chunk is optimized for larger slices. This implementation will have significant performance overhead due to repeated function calls and loop iterations for every single byte of input.

The method should be refactored to process the entire data chunk at once. This will require adjusting the event handling logic to work with offsets within the chunk rather than iterating byte by byte.

}
}
}
Err(_e) => {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Result returned by self.unicode_escape_collector.add_hex_digit(byte) is matched, but the Err case is ignored. If an invalid hexadecimal digit is encountered within a \uXXXX escape sequence, the parser will silently ignore it and continue, potentially producing incorrect output or masking an error in the input JSON. This should be treated as a parsing error.

                            Err(e) => return Err(e.into()),

}
}
}
Err(_e) => {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Result from self.unicode_escape_collector.process_to_utf8(...) is ignored in the Err case. If the collected hex digits form an invalid Unicode scalar value (e.g., a lone surrogate), the error will be silently dropped. This should be propagated as a ParseError to ensure correctness.

                                                Err(e) => return Err(e.into()),

}
}
}
Err(_e) => {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Result returned by self.unicode_escape_collector.add_hex_digit(byte) is matched, but the Err case is ignored. If an invalid hexadecimal digit is encountered within a \uXXXX escape sequence, the parser will silently ignore it and continue, potentially producing incorrect output or masking an error in the input JSON. This should be treated as a parsing error.

                                    Err(e) => return Err(e.into()),

}
}
}
Err(_e) => {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Result from self.unicode_escape_collector.process_to_utf8(...) is ignored in the Err case. If the collected hex digits form an invalid Unicode scalar value (e.g., a lone surrogate), the error will be silently dropped. This should be propagated as a ParseError.

                                        Err(e) => return Err(e.into()),

Comment on lines 586 to 619
fn test_unicode_escapes_not_yet_implemented() {
// Debug handler that captures strings and keys to test Unicode escape processing
struct UnicodeEscapeTestHandler {
events: Vec<std::string::String>,
}

impl<'a, 'b> PushParserHandler<'a, 'b, ()> for UnicodeEscapeTestHandler {
fn handle_event(&mut self, event: Event<'a, 'b>) -> Result<(), ()> {
let event_desc = match event {
Event::StartObject => "StartObject".to_string(),
Event::EndObject => "EndObject".to_string(),
Event::Key(k) => format!("Key({})", k.as_ref()),
Event::String(s) => format!("String({})", s.as_ref()),
Event::EndDocument => "EndDocument".to_string(),
_ => "Other".to_string(),
};
self.events.push(event_desc);
Ok(())
}
}

let mut buffer = [0u8; 256];
let handler = UnicodeEscapeTestHandler { events: Vec::new() };
let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);

// Test string with Unicode escape sequence (\u0041 should become 'A')
parser.write(br#"{"key": "\u0041"}"#).unwrap();
parser.finish::<()>().unwrap();
let handler = parser.destroy();

println!("Unicode escape events: {:?}", handler.events);

// This test verifies that Unicode escapes are working
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test test_unicode_escapes_not_yet_implemented has a confusing name, as it appears to test functionality that is implemented. Furthermore, the test body is empty except for a comment. This test should either be completed with assertions to verify the behavior or be removed to avoid confusion.

Comment on lines +90 to +102
fn test_slice_parser_comparison() {
// Test the same JSON with SliceParser to see how it handles escapes
let json_string = r#"{"message": "Hello\nWorld\t!"}"#;
let mut scratch = [0u8; 256];
let mut parser = picojson::SliceParser::with_buffer(json_string, &mut scratch);

println!("SliceParser results:");
while let Ok(event) = parser.next_event() {
match event {
picojson::Event::EndDocument => break,
_ => println!(" {:?}", event),
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test test_slice_parser_comparison only prints the output from SliceParser and does not contain any assertions. Tests should programmatically verify behavior rather than relying on manual inspection of printed output. Please add assertions to validate the expected events.

Comment on lines +165 to +319
fn test_debug_unicode_processing() {
use picojson::{Event, PullParser, SliceParser};

// Debug the exact unicode processing for \u0041 -> A
let json = r#"{"k":"\u0041"}"#;
println!("Testing unicode: {} (should produce 'A')", json);

// First test with SliceParser (known working)
println!("\n=== SliceParser (working reference) ===");
let mut buffer = [0u8; 64];
let mut parser = SliceParser::with_buffer(json, &mut buffer);

loop {
match parser.next_event() {
Ok(Event::EndDocument) => break,
Ok(event) => println!("SliceParser event: {:?}", event),
Err(e) => {
println!("SliceParser error: {:?}", e);
break;
}
}
}

// Now test with PushParser (broken)
println!("\n=== PushParser (broken) ===");
let mut buffer = [0u8; 64];
let handler = ReproHandler::new();
let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);

let result = parser.write(json.as_bytes());
match result {
Ok(()) => {
let finish_result = parser.finish();
match finish_result {
Ok(()) => {
let handler = parser.destroy();
println!("PushParser events: {:?}", handler.events);

// Check if we got the expected "A" character
let expected_char = "A";
let found_char = handler
.events
.iter()
.find(|event| event.starts_with("String("))
.map(|event| &event[7..event.len() - 1]); // Extract content between String( and )

if let Some(actual) = found_char {
if actual == expected_char {
println!(
"✅ Unicode correctly processed: {} -> {}",
"\\u0041", actual
);
} else {
println!(
"❌ Unicode incorrectly processed: {} -> {} (expected {})",
"\\u0041", actual, expected_char
);
}
}
}
Err(e) => {
println!("❌ FINISH ERROR: {:?}", e);
}
}
}
Err(e) => {
println!("❌ WRITE ERROR: {:?}", e);
}
}
}

#[test]
fn test_reproduce_invalidslicebounds_minimal_case() {
// Try the most minimal case that still has unescaped content
let test_cases = vec![
(r#"{"k":""}"#, "empty string"),
(r#"{"k":"a"}"#, "single char"),
(r#"{"k":"ab"}"#, "two chars"),
(r#"{"k":"\n"}"#, "simple escape"),
(r#"{"k":"\u0041"}"#, "single unicode"),
(r#"{"k":"\u0123"}"#, "single unicode hex"),
(r#"{"k":"\u0123\u4567"}"#, "double unicode"),
];

for (json, description) in test_cases {
println!("Testing: {} - {}", description, json);
let json_bytes = json.as_bytes();

// Use a small buffer to trigger the issue
let mut buffer = [0u8; 32];
let handler = ReproHandler::new();
let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);

let result = parser.write(json_bytes);
match result {
Ok(()) => {
let finish_result = parser.finish();
match finish_result {
Ok(()) => {
let handler = parser.destroy();
println!(
"✅ SUCCESS: {} - Events = {:?}",
description, handler.events
);
}
Err(e) => {
println!("❌ FINISH ERROR: {} - {:?}", description, e);
}
}
}
Err(e) => {
println!("❌ WRITE ERROR: {} - {:?}", description, e);
return; // Stop at first error to identify the minimal trigger
}
}
println!();
}
}

#[test]
fn test_reproduce_invalidslicebounds_progressive_size() {
// Test with progressively smaller buffer sizes to find the exact trigger point
let json_content = br#"{"key": "\u0123\u4567"}"#;

for buffer_size in (8..=64).rev() {
println!("Trying buffer size: {}", buffer_size);
let mut buffer = vec![0u8; buffer_size];
let handler = ReproHandler::new();
let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);

let result = parser.write(json_content);
match result {
Ok(()) => {
let finish_result = parser.finish();
match finish_result {
Ok(()) => {
let handler = parser.destroy();
println!(
"SUCCESS at buffer size {}: Events = {:?}",
buffer_size, handler.events
);
}
Err(e) => {
println!("FINISH ERROR at buffer size {}: {:?}", buffer_size, e);
break;
}
}
}
Err(e) => {
println!("WRITE ERROR at buffer size {}: {:?}", buffer_size, e);
break;
}
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Several tests in this file (test_debug_unicode_processing, test_reproduce_invalidslicebounds_minimal_case, test_reproduce_invalidslicebounds_progressive_size) print their results to the console instead of using assertions. While useful for debugging, tests committed to the repository should programmatically verify behavior using assert!, assert_eq!, or panic! on failure. This ensures they can be run automatically in CI. Please convert these println!-based checks into assertions.

Comment on lines 65 to 68
Event::Key(k) => Event::Key(picojson::String::Borrowed(k.as_ref().to_string().leak())),
Event::String(s) => {
Event::String(picojson::String::Borrowed(s.as_ref().to_string().leak()))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The StressTestHandler uses to_string().leak() to give the event data a 'static lifetime. While this is a common and often acceptable pattern for simplifying test code, it does leak memory. This could become an issue if the test suite grows very large or is run under memory-constrained conditions (like with valgrind).

A comment explaining why leak() is used here would be helpful for future maintainers and to prevent this pattern from being accidentally copied into production code.

Comment on lines 361 to 558
fn test_push_parser_stress_chunk_patterns() {
println!("=== PushParser Chunk Pattern Stress Test ===");
println!(
"NOTE: Multi-call scenarios may fail - Phase 4 (numbers, streaming, robustness) is pending"
);
let scenarios = get_push_parser_test_scenarios();

// Test patterns: Large chunks should work, tiny chunks may fail due to Phase 4 limitations
let chunk_patterns: &[&[usize]] = &[
&[50], // Large chunks - should work with copy-on-escape
&[10], // Medium chunks - may work
// NOTE: Byte-by-byte patterns disabled due to Phase 4 limitations
// &[1], // Byte-by-byte - Phase 4 not complete
// &[2], // Two bytes at a time - Phase 4 not complete
// &[3, 1, 2], // Variable small chunks - Phase 4 not complete
// &[1, 5, 1], // Mixed tiny and small - Phase 4 not complete
// &[7, 1, 1, 10], // Irregular pattern - Phase 4 not complete
];

for scenario in &scenarios {
println!("--- Testing Scenario: {} ---", scenario.name);
let buffer_size = scenario.min_buffer_size + 10; // Adequate buffer

for &pattern in chunk_patterns {
let result = test_push_parsing_with_config(scenario, buffer_size, pattern);

match result {
Ok(()) => {
println!("✅ [P={:?}] SUCCESS", pattern);
}
Err(e) => {
// For now, just log failures instead of panicking
// TODO: Remove this when Phase 4 is complete
println!(
"⚠️ [P={:?}] EXPECTED FAILURE for scenario '{}' - {} (Phase 4 pending)",
pattern, scenario.name, e
);
}
}
}
}
}

#[test]
fn test_push_parser_stress_critical_matrix() {
println!("=== PushParser Critical Size Matrix Test ===");
println!(
"NOTE: Multi-call chunking may fail - Phase 4 (numbers, streaming, robustness) is pending"
);
let scenarios = get_push_parser_test_scenarios();

for scenario in &scenarios {
println!("--- Testing Scenario: {} ---", scenario.name);
let critical_buffer_sizes: Vec<usize> =
(scenario.min_buffer_size.saturating_sub(2)..=scenario.min_buffer_size + 5).collect();
// Use only larger chunk patterns that are more likely to work
let chunk_patterns: &[&[usize]] = &[&[50]]; // Only large chunks for now

for &buffer_size in &critical_buffer_sizes {
for &pattern in chunk_patterns {
let result = test_push_parsing_with_config(scenario, buffer_size, pattern);
let expected_success =
should_succeed_push_parser(buffer_size, scenario.min_buffer_size);

match (result.is_ok(), expected_success) {
(true, true) => {
println!("✅ [B={}, P={:?}] SUCCESS (expected)", buffer_size, pattern);
}
(false, false) => {
println!("✅ [B={}, P={:?}] FAIL (expected)", buffer_size, pattern);
}
(true, false) => {
// With copy-on-escape optimization, we might succeed with smaller buffers
println!("✅ [B={}, P={:?}] Unexpected SUCCESS - copy-on-escape working better than expected", buffer_size, pattern);
}
(false, true) => {
// For now, log instead of panic for Phase 4 limitations
println!("⚠️ [B={}, P={:?}] EXPECTED FAILURE for scenario '{}' - {} (Phase 4 pending)",
buffer_size, pattern, scenario.name, result.unwrap_err());
}
}
}
}
}
}

#[test]
fn test_push_parser_stress_unicode_edge_cases() {
println!("=== PushParser Unicode Edge Cases Stress Test ===");

let unicode_scenarios = vec![
TestScenario {
name: "Consecutive Unicode",
json: br#"["\u0123\u4567\u89AB\uCDEF"]"#,
expected_events: vec![
Event::StartArray,
Event::String(picojson::String::Unescaped("ģ䕧覫췯")), // Consecutive unicode
Event::EndArray,
Event::EndDocument,
],
min_buffer_size: 25, // Unicode processing buffer for consecutive escapes
},
TestScenario {
name: "Unicode at Chunk Boundary",
json: br#"["\u0041XYZ"]"#,
expected_events: vec![
Event::StartArray,
Event::String(picojson::String::Unescaped("AXYZ")), // \u0041 = A
Event::EndArray,
Event::EndDocument,
],
min_buffer_size: 15, // Unicode + normal text processing
},
TestScenario {
name: "Empty Key with Unicode Value",
json: br#"{"": "\u2603"}"#, // Snowman character
expected_events: vec![
Event::StartObject,
Event::Key("".into()),
Event::String(picojson::String::Unescaped("☃")), // \u2603 = ☃
Event::EndObject,
Event::EndDocument,
],
min_buffer_size: 12, // Empty key + unicode value processing
},
];

for scenario in &unicode_scenarios {
println!("--- Testing Unicode Scenario: {} ---", scenario.name);

// Test specifically challenging chunk patterns for unicode
let unicode_chunk_patterns: &[&[usize]] = &[
&[1], // Byte-by-byte (challenges unicode boundaries)
&[6, 1], // Split unicode escapes
&[3, 2, 1], // Irregular splits
];

let buffer_size = scenario.min_buffer_size + 5;

for &pattern in unicode_chunk_patterns {
let result = test_push_parsing_with_config(scenario, buffer_size, pattern);

match result {
Ok(()) => {
println!("✅ [P={:?}] Unicode SUCCESS", pattern);
}
Err(e) => {
panic!(
"❌ [P={:?}] Unicode FAILURE for scenario '{}' - {}",
pattern, scenario.name, e
);
}
}
}
}
}

#[test]
fn test_push_parser_stress_document_validation() {
println!("=== PushParser Document Validation Stress Test ===");

// Test incomplete documents that should fail
let invalid_scenarios: Vec<(&str, &[u8], &str)> = vec![
("Unclosed Array", b"[\"hello\"", "array not closed"),
(
"Unclosed Object",
b"{\"key\": \"value\"",
"object not closed",
),
("Extra Comma", b"{\"key\": \"value\",}", "trailing comma"),
("Missing Value", b"{\"key\":}", "missing value"),
];

for (name, json, _description) in &invalid_scenarios {
println!("--- Testing Invalid: {} ---", name);

let buffer_size = 50; // Adequate buffer
let chunk_patterns: &[&[usize]] = &[&[1], &[3], &[10]];

for &pattern in chunk_patterns {
let mut buffer = vec![0u8; buffer_size];
let handler = StressTestHandler::new(&[]);
let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);
let mut writer = ChunkedWriter::new(json, pattern);

let result = writer.write_to_parser(&mut parser);

if result.is_ok() {
panic!(
"❌ [P={:?}] Expected FAILURE for '{}', but got SUCCESS",
pattern, name
);
} else {
println!("✅ [P={:?}] Correctly FAILED for '{}'", pattern, name);
}
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The stress tests in this file have inconsistent failure handling. Some tests panic! on failure (e.g., test_push_parser_stress_unicode_edge_cases), while others print a ⚠️ warning and continue (e.g., test_push_parser_stress_chunk_patterns). For CI integration and clarity, all tests should fail explicitly on unexpected results, usually by panicking. Please unify the behavior to panic! on unexpected failures. The comments about 'Phase 4 pending' can be kept to explain why a test might be expected to fail for now.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (8)
picojson/tests/push_parser_escapes.rs (1)

89-103: Consider adding assertions to the comparison test.

While the debug output is useful for manual verification, adding assertions to compare the PushParser and SliceParser outputs would make this test more robust and catch regressions automatically.

 #[test]
 fn test_slice_parser_comparison() {
     // Test the same JSON with SliceParser to see how it handles escapes
     let json_string = r#"{"message": "Hello\nWorld\t!"}"#;
     let mut scratch = [0u8; 256];
     let mut parser = picojson::SliceParser::with_buffer(json_string, &mut scratch);

+    let mut slice_events = Vec::new();
     println!("SliceParser results:");
     while let Ok(event) = parser.next_event() {
         match event {
             picojson::Event::EndDocument => break,
-            _ => println!("  {:?}", event),
+            event => {
+                println!("  {:?}", event);
+                slice_events.push(format!("{:?}", event));
+            }
         }
     }
+    
+    // Could add assertions here to compare with PushParser output
+    assert!(!slice_events.is_empty(), "SliceParser should produce events");
 }
picojson/tests/json_checker_tests.rs (1)

89-105: Use underscore prefix for intentionally unused variable

         loop {
             match parser.next_event() {
                 Ok(Event::EndDocument) => break,
-                Ok(event) => {
+                Ok(_event) => {
                     event_count += 1;
                 }
                 Err(e) => return Err(e),
             }
         }
picojson/tests/push_parser.rs (4)

344-412: Debug tests could validate parsed content

These debug tests only verify that parsing completes without errors but don't validate the actual parsed content. Consider capturing and asserting on the events for more thorough testing.


485-492: Remove or implement the incomplete logging code

The event counting handler has an incomplete logging implementation.

             fn handle_event(&mut self, _event: Event<'input, 'scratch>) -> Result<(), ()> {
                 self.event_count += 1;
-                if self.event_count % 10 == 0 {
-                    // Log every 10 events
-                }
                 Ok(())
             }

586-619: Remove duplicate unicode escape test

This test duplicates the functionality of test_unicode_escapes (lines 250-294) with a confusing name suggesting it's not implemented.

Consider removing this duplicate test or clearly differentiating its purpose from the earlier unicode escape test.


677-679: Remove redundant imports

These imports are already present at the module level.

     #[test]
     fn test_single_slash_escape() {
-        use picojson::{DefaultConfig, Event, PushParser, PushParserHandler};
-
         struct Handler {
picojson/tests/push_parser_invalidslicebounds_repro.rs (1)

205-210: Improve event content extraction

The manual string extraction from event description is fragile and could break if the format changes.

-                    let found_char = handler
-                        .events
-                        .iter()
-                        .find(|event| event.starts_with("String("))
-                        .map(|event| &event[7..event.len() - 1]); // Extract content between String( and )
+                    let found_char = handler
+                        .events
+                        .iter()
+                        .find_map(|event| {
+                            if let Some(content) = event.strip_prefix("String(") {
+                                content.strip_suffix(")")
+                            } else {
+                                None
+                            }
+                        });
picojson/tests/push_parser_stress_test.rs (1)

369-378: Re-enable chunk pattern tests when Phase 4 is complete

Most chunk patterns are disabled due to Phase 4 limitations, reducing test coverage. These should be re-enabled when the parser robustness improvements are complete.

Would you like me to create an issue to track the Phase 4 parser robustness improvements and the re-enabling of these tests?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1213716 and 5de540e.

📒 Files selected for processing (10)
  • picojson/src/lib.rs (1 hunks)
  • picojson/src/parse_error.rs (2 hunks)
  • picojson/src/push_parser.rs (1 hunks)
  • picojson/src/stream_parser.rs (1 hunks)
  • picojson/tests/json_checker_tests.rs (5 hunks)
  • picojson/tests/push_parser.rs (1 hunks)
  • picojson/tests/push_parser_copy_on_escape.rs (1 hunks)
  • picojson/tests/push_parser_escapes.rs (1 hunks)
  • picojson/tests/push_parser_invalidslicebounds_repro.rs (1 hunks)
  • picojson/tests/push_parser_stress_test.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
picojson/src/stream_parser.rs (4)

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

picojson/src/lib.rs (6)

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via use statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the crate:: path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

picojson/src/parse_error.rs (3)

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

picojson/tests/push_parser_escapes.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser_copy_on_escape.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

picojson/src/push_parser.rs (7)

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser_invalidslicebounds_repro.rs (7)

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

picojson/tests/json_checker_tests.rs (7)

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser_stress_test.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

🔇 Additional comments (26)
picojson/src/stream_parser.rs (1)

759-759: LGTM! Appropriate Clippy lint suppression.

The #[allow(clippy::approx_constant)] attribute correctly suppresses the warning for this test case. The value 3.14 here represents a literal JSON number being parsed, not a mathematical approximation of π, so using core::f64::consts::PI would be semantically incorrect.

picojson/src/lib.rs (1)

101-102: LGTM! Clean API extension for the new push parser.

The module declaration and public re-exports follow consistent patterns with the existing codebase. The new PushParser, PushParserHandler, and PushParseError types provide a complete push-based parsing API that complements the existing pull parsers.

picojson/src/parse_error.rs (3)

14-15: LGTM! Useful UTF-8 error variant addition.

The new Utf8(core::str::Utf8Error) variant provides detailed UTF-8 validation error information, which is particularly valuable for incremental parsing scenarios where UTF-8 validation may occur across buffer boundaries.


18-19: LGTM! Complementary buffer error variant.

The InputBufferFull variant nicely complements the existing ScratchBufferFull, providing clear distinction between different types of buffer capacity issues in the parsing pipeline.


80-84: LGTM! Clean error conversion implementation.

The From<ujson::Error> implementation provides a clean conversion path from the tokenizer layer, following standard Rust error handling patterns and enabling seamless error propagation.

picojson/tests/push_parser_copy_on_escape.rs (3)

1-53: LGTM! Well-structured test for zero-copy optimization.

This test effectively verifies that the push parser correctly performs zero-copy string borrowing when no escape sequences are present. The handler implementation properly checks the String enum variants, and the assertions clearly validate the expected optimization behavior.


55-101: LGTM! Comprehensive escape sequence handling test.

This test properly validates that strings containing escape sequences are correctly identified as requiring unescaping and copying. The test covers both keys and values, ensuring consistent behavior across different string contexts.


103-170: LGTM! Important buffer isolation verification.

This test addresses a critical concern in incremental parsing - ensuring that buffer content doesn't accumulate or leak between different string events. The byte-level content verification provides strong assurance that the parser maintains proper buffer boundaries.

picojson/tests/push_parser_escapes.rs (3)

5-33: LGTM! Clean event collection handler implementation.

The EventCollector provides a clear way to capture and verify parser events. The handler implementation properly converts events to readable debug strings, making test failures easy to diagnose.


35-60: LGTM! Comprehensive multi-escape sequence test.

This test effectively validates that multiple escape sequences (\n, \t) within a single string are correctly processed and converted to their actual character representations.


105-158: LGTM! Thorough key escape sequence testing.

These tests properly validate that escape sequences work correctly in JSON keys, not just values. Testing both newline and quote escapes in keys ensures comprehensive coverage of the escape processing logic.

picojson/tests/json_checker_tests.rs (3)

17-20: LGTM!

The imports are well-organized and all imported types are utilized in the implementation.


324-349: LGTM!

The macro generates comprehensive failure tests for PushParser, maintaining consistency with SliceParser tests.


375-394: LGTM!

The known deviation tests ensure PushParser aligns with SliceParser behavior for modern JSON standards (RFC 7159) and deep nesting support.

picojson/tests/push_parser.rs (1)

9-15: LGTM!

Simple handler implementation appropriate for basic compilation tests.

picojson/tests/push_parser_invalidslicebounds_repro.rs (2)

9-38: LGTM!

Well-structured handler for diagnostic purposes that properly handles event lifetime by converting to owned strings.


40-162: LGTM!

Comprehensive test scenarios for reproducing buffer boundary issues with clear error reporting.

picojson/src/push_parser.rs (7)

1-11: LGTM!

Clear module documentation and appropriate imports.


12-27: LGTM!

Well-defined state enums that clearly represent the parser's state machine.


98-156: LGTM!

Well-structured escape sequence handling with proper UTF-8 conversion for Unicode escapes.


348-375: LGTM!

Robust number parsing that handles various termination scenarios and validates slice bounds.


489-543: LGTM!

Sophisticated Unicode escape handling that correctly processes the final hex digit and manages buffer positions to exclude the escape sequence from the output.


596-627: LGTM!

Well-designed error handling with clear separation between parser and handler errors, and comprehensive error type conversions.


629-647: Intentional fixed-size event storage for embedded environments

This implementation uses a fixed-size array of 2 events and silently drops events when full, which aligns with the design for embedded/constrained environments requiring zero-allocation guarantee and deterministic memory usage.

picojson/tests/push_parser_stress_test.rs (2)

228-237: Unicode test expects incorrect output

The test expects raw Unicode escape sequences ("\\u0041\\u0042\\u0043") instead of the parsed result ("ABC"). The comment suggests Unicode processing still has issues.

This should be tracked as a known issue or the test should be updated when Unicode processing is fixed.


518-558: LGTM!

Comprehensive validation tests for invalid JSON documents with good coverage of common syntax errors.

Comment on lines 52 to 87
fn run_push_parser_test(json_content: &str) -> Result<usize, ParseError> {
let mut buffer = [0u8; 2048]; // Larger buffer for pass1.json
let handler = ConformanceTestHandler {
event_count: 0,
error: None,
};
let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);

match parser.write(json_content.as_bytes()) {
Ok(()) => {}
Err(e) => {
return Err(match e {
PushParseError::Parse(parse_err) => parse_err,
PushParseError::Handler(_handler_err) => {
// Convert handler error to a parse error for consistency
ParseError::InvalidNumber // Use any available ParseError variant
}
});
}
}

match parser.finish::<String>() {
Ok(()) => {}
Err(e) => {
return Err(match e {
PushParseError::Parse(parse_err) => parse_err,
PushParseError::Handler(_handler_err) => {
ParseError::InvalidNumber // Use any available ParseError variant
}
});
}
}

let handler = parser.destroy();
Ok(handler.event_count)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Improve error handling and remove unused field initialization

  1. The error conversion uses ParseError::InvalidNumber as a placeholder for handler errors, which is misleading
  2. The handler initialization includes the unused error field
 fn run_push_parser_test(json_content: &str) -> Result<usize, ParseError> {
     let mut buffer = [0u8; 2048]; // Larger buffer for pass1.json
     let handler = ConformanceTestHandler {
         event_count: 0,
-        error: None,
     };
     let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);

     match parser.write(json_content.as_bytes()) {
         Ok(()) => {}
         Err(e) => {
             return Err(match e {
                 PushParseError::Parse(parse_err) => parse_err,
                 PushParseError::Handler(_handler_err) => {
-                    // Convert handler error to a parse error for consistency
-                    ParseError::InvalidNumber // Use any available ParseError variant
+                    // Handler errors shouldn't occur in conformance tests
+                    unreachable!("ConformanceTestHandler never returns errors")
                 }
             });
         }
     }

Consider adding a more appropriate ParseError variant for handler errors if they need to be propagated.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In picojson/tests/json_checker_tests.rs around lines 52 to 87, the error
handling converts handler errors to the misleading ParseError::InvalidNumber
variant, and the handler struct initializes an unused error field. Remove the
unused error field from the ConformanceTestHandler initialization and update the
error conversion logic to use a more appropriate or custom ParseError variant
for handler errors to accurately represent the error source.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (7)
picojson/tests/json_checker_tests.rs (4)

40-42: Remove unused error field

The error field in ConformanceTestHandler is never used and should be removed.


63-66: Improve error handling for handler errors

Converting PushParseError::Handler to an arbitrary ParseError::InvalidNumber is misleading when debugging test failures.


76-79: Improve error handling for handler errors during finish

Converting PushParseError::Handler from finish() to an arbitrary ParseError::InvalidNumber is misleading when debugging test failures.


158-177: Enhance test to validate parsed content

The test only verifies that parsing doesn't panic but doesn't validate the actual parsed string content.

picojson/src/push_parser.rs (1)

78-88: Inefficient byte-by-byte processing

The write method processes input data byte-by-byte, which is highly inefficient as parse_chunk is optimized for larger slices.

picojson/tests/push_parser_stress_test.rs (1)

399-406: Inconsistent failure handling in stress tests

This test prints a warning instead of panicking on failure, which is inconsistent with other tests. For CI integration, tests should fail explicitly.

picojson/tests/push_parser.rs (1)

342-583: Comprehensive debugging and robustness tests for complex JSON scenarios.

These tests provide excellent coverage of real-world JSON parsing scenarios and help ensure parser stability with complex inputs. The investigative nature and minimal assertions are appropriate for robustness testing.

However, based on previous review feedback, there's still missing coverage for invalid or malformed Unicode escape sequences.

Please add tests for cases like missing digits, invalid hex characters, or truncated escapes to verify the parser correctly returns errors, as mentioned in previous reviews.

🧹 Nitpick comments (1)
picojson/tests/push_parser.rs (1)

250-294: Good Unicode escape sequence validation with minor comment inconsistency.

The test correctly validates Unicode escape processing (\u0041 → 'A'). However, the comment on line 289 shows "\u0041" when it should show "\u0041" to match the actual escape sequence being tested.

-                "String(A)".to_string(), // \\u0041 should be converted to 'A'
+                "String(A)".to_string(), // \u0041 should be converted to 'A'
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5de540e and 2344e76.

📒 Files selected for processing (6)
  • picojson/src/parse_error.rs (4 hunks)
  • picojson/src/push_parser.rs (1 hunks)
  • picojson/src/stream_parser.rs (1 hunks)
  • picojson/tests/json_checker_tests.rs (5 hunks)
  • picojson/tests/push_parser.rs (1 hunks)
  • picojson/tests/push_parser_stress_test.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • picojson/src/stream_parser.rs
  • picojson/src/parse_error.rs
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
picojson/tests/json_checker_tests.rs (7)

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/src/push_parser.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser_stress_test.rs (7)

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run AVR Test Suites
🔇 Additional comments (12)
picojson/src/push_parser.rs (2)

188-193: Good error handling for incomplete parsing states

The finish method now properly checks for incomplete string and key parsing states, ensuring incomplete JSON is rejected.


507-545: Proper error handling for unicode escape sequences

The unicode escape processing now correctly propagates errors for invalid hex digits and invalid unicode scalar values, addressing previous concerns.

picojson/tests/push_parser.rs (10)

1-6: Well-structured test module with appropriate imports.

The module structure is clean with proper licensing, test attributes, and comprehensive imports from the picojson crate.


8-27: Effective minimal test handler for basic functionality verification.

The SimpleHandler provides a clean baseline test that verifies parser compilation, basic operations (write, finish, destroy), and lifecycle management without event processing complexity.


29-64: Excellent HRTB pattern validation test.

This test effectively validates the Higher-Ranked Trait Bounds pattern which is crucial for lifetime management in the push parser. The test correctly verifies both string processing and event counting, ensuring proper integration between the tokenizer and HRTB system.


66-104: Well-designed test for string extraction and zero-copy parsing.

The test effectively validates string content extraction while properly handling lifetimes by using owned strings in the test handler. This approach allows testing the underlying borrowed string mechanism without introducing lifetime complications in the test code.


106-155: Comprehensive debug handler for event stream visibility.

The DebugHandler provides excellent coverage of event types and serves as a valuable debugging tool. The relaxed assertions are appropriate for this type of exploratory test that focuses on event generation rather than strict content validation.


157-202: Thorough validation of object key-value parsing.

The test effectively validates the complete event sequence for object parsing, ensuring keys and values are properly distinguished and emitted in the correct order. The strict assertion provides good coverage of the parser's object handling capabilities.


204-248: Solid validation of basic escape sequence processing.

The test correctly validates that escape sequences like \n are properly converted to their corresponding characters (actual newline). This is fundamental JSON parsing behavior and the test implementation is sound.


296-341: Excellent coverage of mixed escape sequence processing.

This test provides comprehensive validation of complex escape scenarios including Unicode escapes, control characters, and special characters. The expected output correctly demonstrates proper conversion of all escape types, making this a valuable test for parser robustness.


586-640: Thorough validation of number parsing across different formats.

The test effectively validates number parsing for integers, floats, and negative numbers, ensuring proper Number event emission with correct string representation. Good coverage of various number formats in JSON.


642-673: Clean and focused test for forward slash escape sequence.

The test correctly validates that the \/ escape sequence is properly converted to /, which is standard JSON behavior. The focused approach and clear assertion make this an effective unit test.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
picojson/tests/json_checker_tests.rs (1)

88-104: Fix unused variable warning

The event variable on line 97 is unused and should be prefixed with an underscore to suppress the warning.

-                Ok(event) => {
+                Ok(_event) => {
picojson/src/push_parser.rs (1)

72-176: Performance concern documented but implementation is correct

The byte-by-byte processing approach is acknowledged as suboptimal in the TODO comment on line 81-82. While this impacts performance for large inputs, the implementation correctly handles the complex state transitions and escape sequence processing.

Consider implementing chunk-based processing in a future iteration to improve performance for large JSON inputs.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2344e76 and 6375d19.

📒 Files selected for processing (3)
  • picojson/src/push_parser.rs (1 hunks)
  • picojson/tests/json_checker_tests.rs (5 hunks)
  • picojson/tests/push_parser.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
picojson/tests/json_checker_tests.rs (7)

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/src/push_parser.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

🪛 GitHub Actions: Build and test
picojson/tests/json_checker_tests.rs

[error] 172-172: Test failure in 'test_slice_parser_unicode_trace': Should contain \uCAFE decoded as Unicode character.


[error] 276-276: Test failure in 'test_stream_parser_unicode_trace': Should contain \uCAFE decoded as Unicode character.


[warning] 97-97: Unused variable 'event'. Consider prefixing it with an underscore if intentional.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run AVR Test Suites
🔇 Additional comments (11)
picojson/tests/json_checker_tests.rs (3)

39-86: LGTM: Clean handler implementation and appropriate error handling

The ConformanceTestHandler is simple and effective for counting events. The error conversion from PushParseError::Handler to ParseError::ReaderError is appropriate since it represents external (handler) errors.


363-388: Excellent macro-based test generation

The generate_push_parser_fail_tests macro ensures consistent test coverage between SliceParser and PushParser for all expected failure cases. This systematic approach reduces the risk of missing test cases.


414-433: Good consistency validation between parsers

The PushParser known deviation tests ensure that the new parser maintains the same behavior as SliceParser for edge cases like scalar root values and deep nesting. This is crucial for API compatibility.

picojson/tests/push_parser.rs (5)

17-64: Solid foundation tests for HRTB and lifetime management

These tests effectively validate that the complex Higher Ranked Trait Bounds (HRTB) pattern works correctly with the PushParser. The progression from simple compilation test to event capturing demonstrates proper lifetime management.


212-348: Comprehensive escape sequence testing

The tests cover simple escapes (\n), Unicode escapes (\u0041), and consecutive mixed escapes effectively. The progression from basic to complex escape scenarios provides good validation coverage.


681-825: Excellent error case validation

The tests for invalid Unicode escapes (incomplete sequences, invalid hex characters) and invalid escape sequences in keys provide crucial validation of parser error handling. This addresses the robustness requirements for a production parser.


753-801: Smart validation of copy-on-escape optimization

This test effectively validates the parser's optimization strategy of using String::Borrowed for simple strings and String::Unescaped only when escape processing is required. This is crucial for performance in real-world usage.


350-508: Systematic debugging approach for complex inputs

The progressive testing approach from specific problematic lines to larger sections of pass1.json demonstrates good debugging methodology. This systematic isolation helps identify the root cause of parsing issues.

picojson/src/push_parser.rs (3)

119-156: Proper error handling for Unicode escape processing

The Unicode escape processing now correctly handles and propagates errors from both add_hex_digit() and process_to_utf8() calls. This addresses the error handling concerns from previous reviews.


178-210: Comprehensive state validation in finish method

The finish method properly handles all parser states, including appropriate error handling for incomplete strings and keys (lines 193-198). This ensures incomplete JSON documents are properly rejected.


643-658: Correct position tracking in tokenizer callback

The callback correctly converts relative positions to absolute positions using base_position + relative_pos. This ensures accurate event position tracking throughout the parsing process.

Comment on lines +159 to +198
// Debug test for tracing SliceParser unicode escape processing
#[test]
fn test_slice_parser_unicode_trace() {
// Test the problematic sequence from pass1.json line 45
let unicode_sequence = r#""\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;

let mut buffer = [0u8; 1024];
let mut parser = SliceParser::with_buffer(unicode_sequence, &mut buffer);

match parser.next_event() {
Ok(Event::String(s)) => {
let parsed_content = s.as_ref();
// Verify that unicode escapes are properly decoded
assert!(parsed_content.contains('\u{CAFE}'), "Should contain \\uCAFE decoded as Unicode character");
assert!(parsed_content.contains('\u{BABE}'), "Should contain \\uBABE decoded as Unicode character");
assert!(parsed_content.contains('\u{AB98}'), "Should contain \\uAB98 decoded as Unicode character");
assert!(parsed_content.contains('\u{FCDE}'), "Should contain \\uFCDE decoded as Unicode character");
assert!(parsed_content.contains('\u{bcda}'), "Should contain \\ubcda decoded as Unicode character");
assert!(parsed_content.contains('\u{ef4A}'), "Should contain \\uef4A decoded as Unicode character");
// Verify that other escape sequences are also processed
assert!(parsed_content.contains('/'), "Should contain / from \\/ escape");
assert!(parsed_content.contains('\\'), "Should contain \\ from \\\\ escape");
assert!(parsed_content.contains('"'), "Should contain \" from \\\" escape");
assert!(parsed_content.contains('\u{08}'), "Should contain backspace from \\b escape");
assert!(parsed_content.contains('\u{0C}'), "Should contain form feed from \\f escape");
assert!(parsed_content.contains('\n'), "Should contain newline from \\n escape");
assert!(parsed_content.contains('\r'), "Should contain carriage return from \\r escape");
assert!(parsed_content.contains('\t'), "Should contain tab from \\t escape");
}
Ok(other) => {
panic!("Expected String event, got: {:?}", other);
}
Err(e) => {
panic!(
"SliceParser should handle this sequence, got error: {:?}",
e
);
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix double-escaping in Unicode test string

The test is failing because the Unicode escape sequences are double-escaped. The string contains "\\uCAFE\\uBABE..." which represents literal \uCAFE text, not Unicode escape sequences.

For the test to work correctly, use single backslashes:

-            let unicode_sequence = r#""\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;
+            let unicode_sequence = r#""\\/\\\"\uCAFE\uBABE\uAB98\uFCDE\ubcda\uef4A\b\f\n\r\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;

This will allow the parser to properly decode the Unicode escape sequences into their corresponding characters.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Debug test for tracing SliceParser unicode escape processing
#[test]
fn test_slice_parser_unicode_trace() {
// Test the problematic sequence from pass1.json line 45
let unicode_sequence = r#""\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;
let mut buffer = [0u8; 1024];
let mut parser = SliceParser::with_buffer(unicode_sequence, &mut buffer);
match parser.next_event() {
Ok(Event::String(s)) => {
let parsed_content = s.as_ref();
// Verify that unicode escapes are properly decoded
assert!(parsed_content.contains('\u{CAFE}'), "Should contain \\uCAFE decoded as Unicode character");
assert!(parsed_content.contains('\u{BABE}'), "Should contain \\uBABE decoded as Unicode character");
assert!(parsed_content.contains('\u{AB98}'), "Should contain \\uAB98 decoded as Unicode character");
assert!(parsed_content.contains('\u{FCDE}'), "Should contain \\uFCDE decoded as Unicode character");
assert!(parsed_content.contains('\u{bcda}'), "Should contain \\ubcda decoded as Unicode character");
assert!(parsed_content.contains('\u{ef4A}'), "Should contain \\uef4A decoded as Unicode character");
// Verify that other escape sequences are also processed
assert!(parsed_content.contains('/'), "Should contain / from \\/ escape");
assert!(parsed_content.contains('\\'), "Should contain \\ from \\\\ escape");
assert!(parsed_content.contains('"'), "Should contain \" from \\\" escape");
assert!(parsed_content.contains('\u{08}'), "Should contain backspace from \\b escape");
assert!(parsed_content.contains('\u{0C}'), "Should contain form feed from \\f escape");
assert!(parsed_content.contains('\n'), "Should contain newline from \\n escape");
assert!(parsed_content.contains('\r'), "Should contain carriage return from \\r escape");
assert!(parsed_content.contains('\t'), "Should contain tab from \\t escape");
}
Ok(other) => {
panic!("Expected String event, got: {:?}", other);
}
Err(e) => {
panic!(
"SliceParser should handle this sequence, got error: {:?}",
e
);
}
}
}
// Debug test for tracing SliceParser unicode escape processing
#[test]
fn test_slice_parser_unicode_trace() {
// Test the problematic sequence from pass1.json line 45
let unicode_sequence = r#""\\/\\\"\uCAFE\uBABE\uAB98\uFCDE\ubcda\uef4A\b\f\n\r\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;
let mut buffer = [0u8; 1024];
let mut parser = SliceParser::with_buffer(unicode_sequence, &mut buffer);
match parser.next_event() {
Ok(Event::String(s)) => {
let parsed_content = s.as_ref();
// Verify that unicode escapes are properly decoded
assert!(parsed_content.contains('\u{CAFE}'), "Should contain \\uCAFE decoded as Unicode character");
assert!(parsed_content.contains('\u{BABE}'), "Should contain \\uBABE decoded as Unicode character");
assert!(parsed_content.contains('\u{AB98}'), "Should contain \\uAB98 decoded as Unicode character");
assert!(parsed_content.contains('\u{FCDE}'), "Should contain \\uFCDE decoded as Unicode character");
assert!(parsed_content.contains('\u{bcda}'), "Should contain \\ubcda decoded as Unicode character");
assert!(parsed_content.contains('\u{ef4A}'), "Should contain \\uef4A decoded as Unicode character");
// Verify that other escape sequences are also processed
assert!(parsed_content.contains('/'), "Should contain / from \\/ escape");
assert!(parsed_content.contains('\\'), "Should contain \\ from \\\\ escape");
assert!(parsed_content.contains('"'), "Should contain \" from \\\" escape");
assert!(parsed_content.contains('\u{08}'), "Should contain backspace from \\b escape");
assert!(parsed_content.contains('\u{0C}'), "Should contain form feed from \\f escape");
assert!(parsed_content.contains('\n'), "Should contain newline from \\n escape");
assert!(parsed_content.contains('\r'), "Should contain carriage return from \\r escape");
assert!(parsed_content.contains('\t'), "Should contain tab from \\t escape");
}
Ok(other) => {
panic!("Expected String event, got: {:?}", other);
}
Err(e) => {
panic!(
"SliceParser should handle this sequence, got error: {:?}",
e
);
}
}
}
🧰 Tools
🪛 GitHub Actions: Build and test

[error] 172-172: Test failure in 'test_slice_parser_unicode_trace': Should contain \uCAFE decoded as Unicode character.

🤖 Prompt for AI Agents
In picojson/tests/json_checker_tests.rs around lines 159 to 198, the test string
for Unicode escape sequences is double-escaped, causing the parser to interpret
them as literal text rather than actual Unicode characters. Fix this by
replacing the double backslashes before the Unicode sequences with single
backslashes, so the parser correctly decodes the Unicode escapes into their
corresponding characters.

Comment on lines +262 to +302
// Debug test for tracing StreamParser unicode escape processing
#[test]
fn test_stream_parser_unicode_trace() {
// Test the problematic sequence from pass1.json line 45
let unicode_sequence = r#""\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;

let reader = ChunkReader::full_slice(unicode_sequence.as_bytes());
let mut buffer = [0u8; 1024];
let mut parser = StreamParser::<_, DefaultConfig>::new(reader, &mut buffer);

match parser.next_event() {
Ok(Event::String(s)) => {
let parsed_content = s.as_ref();
// Verify that unicode escapes are properly decoded
assert!(parsed_content.contains('\u{CAFE}'), "Should contain \\uCAFE decoded as Unicode character");
assert!(parsed_content.contains('\u{BABE}'), "Should contain \\uBABE decoded as Unicode character");
assert!(parsed_content.contains('\u{AB98}'), "Should contain \\uAB98 decoded as Unicode character");
assert!(parsed_content.contains('\u{FCDE}'), "Should contain \\uFCDE decoded as Unicode character");
assert!(parsed_content.contains('\u{bcda}'), "Should contain \\ubcda decoded as Unicode character");
assert!(parsed_content.contains('\u{ef4A}'), "Should contain \\uef4A decoded as Unicode character");
// Verify that other escape sequences are also processed
assert!(parsed_content.contains('/'), "Should contain / from \\/ escape");
assert!(parsed_content.contains('\\'), "Should contain \\ from \\\\ escape");
assert!(parsed_content.contains('"'), "Should contain \" from \\\" escape");
assert!(parsed_content.contains('\u{08}'), "Should contain backspace from \\b escape");
assert!(parsed_content.contains('\u{0C}'), "Should contain form feed from \\f escape");
assert!(parsed_content.contains('\n'), "Should contain newline from \\n escape");
assert!(parsed_content.contains('\r'), "Should contain carriage return from \\r escape");
assert!(parsed_content.contains('\t'), "Should contain tab from \\t escape");
}
Ok(other) => {
panic!("Expected String event, got: {:?}", other);
}
Err(e) => {
panic!(
"StreamParser should handle this sequence, got error: {:?}",
e
);
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix double-escaping in StreamParser Unicode test string

Same issue as the SliceParser test - the Unicode escape sequences are double-escaped, preventing proper decoding.

-            let unicode_sequence = r#""\\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;
+            let unicode_sequence = r#""\\/\\\"\uCAFE\uBABE\uAB98\uFCDE\ubcda\uef4A\b\f\n\r\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Debug test for tracing StreamParser unicode escape processing
#[test]
fn test_stream_parser_unicode_trace() {
// Test the problematic sequence from pass1.json line 45
let unicode_sequence = r#""\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;
let reader = ChunkReader::full_slice(unicode_sequence.as_bytes());
let mut buffer = [0u8; 1024];
let mut parser = StreamParser::<_, DefaultConfig>::new(reader, &mut buffer);
match parser.next_event() {
Ok(Event::String(s)) => {
let parsed_content = s.as_ref();
// Verify that unicode escapes are properly decoded
assert!(parsed_content.contains('\u{CAFE}'), "Should contain \\uCAFE decoded as Unicode character");
assert!(parsed_content.contains('\u{BABE}'), "Should contain \\uBABE decoded as Unicode character");
assert!(parsed_content.contains('\u{AB98}'), "Should contain \\uAB98 decoded as Unicode character");
assert!(parsed_content.contains('\u{FCDE}'), "Should contain \\uFCDE decoded as Unicode character");
assert!(parsed_content.contains('\u{bcda}'), "Should contain \\ubcda decoded as Unicode character");
assert!(parsed_content.contains('\u{ef4A}'), "Should contain \\uef4A decoded as Unicode character");
// Verify that other escape sequences are also processed
assert!(parsed_content.contains('/'), "Should contain / from \\/ escape");
assert!(parsed_content.contains('\\'), "Should contain \\ from \\\\ escape");
assert!(parsed_content.contains('"'), "Should contain \" from \\\" escape");
assert!(parsed_content.contains('\u{08}'), "Should contain backspace from \\b escape");
assert!(parsed_content.contains('\u{0C}'), "Should contain form feed from \\f escape");
assert!(parsed_content.contains('\n'), "Should contain newline from \\n escape");
assert!(parsed_content.contains('\r'), "Should contain carriage return from \\r escape");
assert!(parsed_content.contains('\t'), "Should contain tab from \\t escape");
}
Ok(other) => {
panic!("Expected String event, got: {:?}", other);
}
Err(e) => {
panic!(
"StreamParser should handle this sequence, got error: {:?}",
e
);
}
}
}
// Debug test for tracing StreamParser unicode escape processing
#[test]
fn test_stream_parser_unicode_trace() {
// Test the problematic sequence from pass1.json line 45
- let unicode_sequence = r#""\\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;
+ let unicode_sequence = r#""\\/\\\"\uCAFE\uBABE\uAB98\uFCDE\ubcda\uef4A\b\f\n\r\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;
let reader = ChunkReader::full_slice(unicode_sequence.as_bytes());
let mut buffer = [0u8; 1024];
let mut parser = StreamParser::<_, DefaultConfig>::new(reader, &mut buffer);
match parser.next_event() {
Ok(Event::String(s)) => {
let parsed_content = s.as_ref();
// Verify that unicode escapes are properly decoded
assert!(parsed_content.contains('\u{CAFE}'), "Should contain \\uCAFE decoded as Unicode character");
assert!(parsed_content.contains('\u{BABE}'), "Should contain \\uBABE decoded as Unicode character");
assert!(parsed_content.contains('\u{AB98}'), "Should contain \\uAB98 decoded as Unicode character");
assert!(parsed_content.contains('\u{FCDE}'), "Should contain \\uFCDE decoded as Unicode character");
assert!(parsed_content.contains('\u{bcda}'), "Should contain \\ubcda decoded as Unicode character");
assert!(parsed_content.contains('\u{ef4A}'), "Should contain \\uef4A decoded as Unicode character");
// Verify that other escape sequences are also processed
assert!(parsed_content.contains('/'), "Should contain / from \\/ escape");
assert!(parsed_content.contains('\\'), "Should contain \\ from \\\\ escape");
assert!(parsed_content.contains('\"'), "Should contain \" from \\\" escape");
assert!(parsed_content.contains('\u{08}'), "Should contain backspace from \\b escape");
assert!(parsed_content.contains('\u{0C}'), "Should contain form feed from \\f escape");
assert!(parsed_content.contains('\n'), "Should contain newline from \\n escape");
assert!(parsed_content.contains('\r'), "Should contain carriage return from \\r escape");
assert!(parsed_content.contains('\t'), "Should contain tab from \\t escape");
}
Ok(other) => {
panic!("Expected String event, got: {:?}", other);
}
Err(e) => {
panic!(
"StreamParser should handle this sequence, got error: {:?}",
e
);
}
}
}
🧰 Tools
🪛 GitHub Actions: Build and test

[error] 276-276: Test failure in 'test_stream_parser_unicode_trace': Should contain \uCAFE decoded as Unicode character.

🤖 Prompt for AI Agents
In picojson/tests/json_checker_tests.rs around lines 262 to 302, the test string
for StreamParser unicode escape sequences is double-escaped, causing improper
decoding. Fix this by removing the extra escaping so that the Unicode escape
sequences are correctly represented as single escapes in the raw string literal,
allowing the parser to decode them properly.

@kaidokert kaidokert closed this Jul 26, 2025
@kaidokert kaidokert deleted the push3 branch August 3, 2025 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant