Taking shape #67

kaidokert · 2025-07-26T06:35:47Z

Summary by Sourcery

Introduce a new no_std SAX-style push parser along with extensive conformance, stress, and edge-case tests, and enhance error handling in the core library.

New Features:

Add a SAX-style PushParser implementation with handler-based API and associated PushParseError type
Export PushParser, PushParseError, PushParserHandler, ChunkReader and DefaultConfig in the public API

Enhancements:

Extend ParseError with Utf8 and InputBufferFull variants and add Fromujson::Error conversion

Tests:

Add comprehensive conformance tests for PushParser and StreamParser against the JSON Checker suite
Add stress tests for PushParser covering various buffer sizes, chunk patterns, and unicode edge cases
Add minimal reproduction tests for InvalidSliceBounds issue
Add copy-on-escape optimization tests for borrowed vs unescaped string lifetimes
Add dedicated tests for escape sequence and unicode processing in PushParser

Summary by CodeRabbit

New Features
- Introduced a new incremental JSON push parser with event-driven parsing and efficient memory management.
- Added enhanced error types for UTF-8 errors and input buffer capacity issues.
- Extended the public API with new parser interfaces and handler traits for customizable event processing.
Bug Fixes
- Improved robustness in handling buffer boundaries and Unicode escape sequences during JSON parsing.
Tests
- Added comprehensive conformance, unit, edge case, and stress tests for the push parser, including escape sequence handling, buffer isolation, and Unicode processing.
- Developed new test utilities and handlers to support thorough parser validation across diverse scenarios.

coderabbitai · 2025-07-26T06:35:56Z

"""

Walkthrough

A new push-based, event-driven JSON parser (PushParser) is introduced to the picojson crate, along with its handler trait and error types. The public API is extended to include these components. The parser supports incremental, no_std parsing, robust escape sequence handling, and emits events to user-defined handlers. Extensive tests and conformance suites are added for validation, including stress, escape, copy-on-escape, and buffer boundary tests. Minor updates are made to error handling and a test float assertion.

Changes

File(s)	Change Summary
picojson/src/lib.rs	Added new module `push_parser` and re-exported `PushParser`, `PushParserHandler`, and `PushParseError`.
picojson/src/parse_error.rs	Modified `ParseError` enum: removed `InvalidUtf8`, added `Utf8` and `InputBufferFull` variants; implemented `From<ujson::Error>`.
picojson/src/push_parser.rs	New file: Implements `PushParser`, handler trait, error types, event emission, escape and buffer management, and incremental parsing.
picojson/src/stream_parser.rs	Modified test to assert float equality against `core::f64::consts::PI`.
picojson/tests/json_checker_tests.rs	Extended conformance tests for `PushParser` and `StreamParser`; added handlers, test functions, and failure test macros.
picojson/tests/push_parser.rs	New file: Comprehensive unit tests for `PushParser` covering events, escapes, keys, numbers, and buffer/lifetime edge cases.
picojson/tests/push_parser_copy_on_escape.rs	New file: Tests for copy-on-escape optimization, buffer isolation, and borrowed vs. unescaped string handling in `PushParser`.
picojson/tests/push_parser_escapes.rs	New file: Tests for escape sequence handling in strings and keys for `PushParser` (and comparison with `SliceParser`).
picojson/tests/push_parser_invalidslicebounds_repro.rs	New file: Diagnostic tests reproducing buffer boundary and Unicode escape issues in `PushParser`.
picojson/tests/push_parser_stress_test.rs	New file: Stress tests for `PushParser` with varying buffer sizes, chunking, Unicode escapes, and document validation scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PushParser
    participant Handler

    User->>PushParser: new(handler, buffer)
    loop For each data chunk
        User->>PushParser: write(data)
        PushParser->>Handler: handle_event(Event)
        Handler-->>PushParser: Result
    end
    User->>PushParser: finish()
    PushParser->>Handler: handle_event(Event::End)
    Handler-->>PushParser: Result
    User->>PushParser: destroy()
    PushParser-->>User: handler

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90 minutes

Poem

In the warren where JSON hops,
A pushy parser now never stops.
With escapes and buffers, it leaps with glee,
Handling chunks as swift as can be.
Tests abound like carrots in spring,
Ensuring each byte does the right thing.
🥕✨ Hooray for parsing—let the rabbits sing!
"""

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch push3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

sourcery-ai · 2025-07-26T06:35:57Z

Reviewer's Guide

This PR integrates a new SAX-style PushParser into picojson alongside enhanced error reporting, exports, and a greatly expanded test suite for both PushParser and StreamParser conformance, stress, and edge‐case scenarios. It updates error variants, exposes the new push-based API, and adds comprehensive tests covering unicode escapes, buffer boundaries, and copy-on-escape behavior.

Class diagram for new PushParser and related types

classDiagram
    class PushParserHandler {
        <<trait>>
        +handle_event(event: Event) Result<(), E>
    }
    class PushParser {
        -handler: H
        -stream_buffer: StreamBuffer
        -core: ParserCore
        -unicode_escape_collector: UnicodeEscapeCollector
        -state: ParserState
        -escape_state: EscapeState
        -position_offset: usize
        -current_position: usize
        -token_start_pos: usize
        -using_unescaped_buffer: bool
        +new(handler: H, buffer: &mut [u8])
        +write(data: &[u8])
        +finish()
        +destroy() -> H
    }
    class PushParseError {
        Parse(ParseError)
        Handler(E)
    }
    class ParserState {
        Idle
        ParsingString
        ParsingKey
        ParsingNumber
    }
    class EscapeState {
        None
        InEscapeSequence
        InUnicodeEscape
    }
    PushParser o-- PushParserHandler : handler
    PushParser o-- StreamBuffer
    PushParser o-- ParserCore
    PushParser o-- UnicodeEscapeCollector
    PushParser --> ParserState
    PushParser --> EscapeState
    PushParserHandler <|.. PushParser : uses
    PushParser ..> PushParseError : errors
    PushParseError ..> ParseError
    PushParseError ..> Handler : generic

Class diagram for updated ParseError enum

classDiagram
    class ParseError {
        TokenizerError(ujson::Error)
        Utf8(core::str::Utf8Error)
        ScratchBufferFull
        InputBufferFull
        InvalidUtf8(core::str::Utf8Error)
        NumberParseError
        UnexpectedState
    }
    ParseError <|-- PushParseError : used in

File-Level Changes

Change	Details	Files
Extend ParseError to cover UTF-8 and buffer overflow cases and map underlying tokenizer errors	Add Utf8 and InputBufferFull error variants Refine ScratchBufferFull documentation Implement Fromujson::Error for ParseError	`src/parse_error.rs`
Expose PushParser API in library root	Import and re-export PushParser, PushParseError, PushParserHandler Add new push_parser module declaration	`src/lib.rs`
Implement new no_std SAX-style PushParser	Define PushParserHandler trait and internal ParserState/EscapeState machines Implement write/finish/destroy methods with HRTB support Handle escape sequences, Unicode escapes, and switch to an unescaped buffer Map and propagate handler and parse errors	`src/push_parser.rs`
Augment existing json_checker tests with PushParser and StreamParser conformance	Import ChunkReader, DefaultConfig, PushParser and related types Define run_push_parser_test and run_stream_parser_test helpers Add pass1/pass2/pass3 tests, failure cases, known deviations and unicode trace tests	`tests/json_checker_tests.rs`
Add comprehensive new test modules for PushParser edge cases and stress testing	Create push_parser.rs with unit tests for HRTB, borrow vs unescaped, escape handling, debug event snapshots Add push_parser_stress_test.rs covering buffer sizes, chunk patterns, unicode edge cases, document validation Include minimal repro and copy-on-escape isolation tests to pinpoint InvalidSliceBounds issues Add dedicated escape and unicode processing comparison tests with SliceParser reference	`tests/push_parser.rs` `tests/push_parser_stress_test.rs` `tests/push_parser_invalidslicebounds_repro.rs` `tests/push_parser_copy_on_escape.rs` `tests/push_parser_escapes.rs`
Minor adjustments in StreamParser tests	Allow clippy::approx_constant in numeric comparison test	`src/stream_parser.rs`

Possibly linked issues

Initial implementation #1: PR introduces new parsers and tests for deeply nested JSON, preventing stack overflow.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

gemini-code-assist

Summary of Changes

Hello @kaidokert, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new PushParser to the picojson library, providing a SAX-style, event-driven JSON parsing mechanism optimized for no_std environments. This new parser is designed to handle JSON input incrementally, making it suitable for streaming data, and includes robust handling for string escapes and unicode characters.

Highlights

New Push-Based JSON Parser: Implemented PushParser and PushParserHandler to offer a SAX-style, event-driven interface for parsing JSON, specifically designed for no_std compatibility and incremental data processing.
Improved Error Reporting: Expanded the ParseError enum with new variants (Utf8, InputBufferFull) and refined existing ones to provide more precise error information during parsing.
Optimized String Handling: Introduced a 'copy-on-escape' mechanism within the PushParser to efficiently manage string content, allowing for zero-copy String::Borrowed where no escapes are present and allocating only when String::Unescaped content needs to be processed.
Extensive Test Coverage: Added a suite of new test files to thoroughly validate the PushParser's correctness, robustness, and performance across various JSON structures, escape sequences, and input chunking patterns.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey @kaidokert - I've reviewed your changes - here's some feedback:

The new push-parser tests are very comprehensive but extremely repetitive—consider factoring common setup and assertions into helper macros or parameterized test functions to improve readability and maintainability.
Currently handler errors in run_push_parser_test are all mapped to ParseError::InvalidNumber, which could obscure the real cause—consider introducing a dedicated error variant or preserving the original handler error message.
The push_parser.rs file is quite large; you might break it into smaller modules (e.g. escape handling, state machine, buffer management) to make the implementation easier to navigate and review.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- The new push-parser tests are very comprehensive but extremely repetitive—consider factoring common setup and assertions into helper macros or parameterized test functions to improve readability and maintainability.
- Currently handler errors in run_push_parser_test are all mapped to `ParseError::InvalidNumber`, which could obscure the real cause—consider introducing a dedicated error variant or preserving the original handler error message.
- The `push_parser.rs` file is quite large; you might break it into smaller modules (e.g. escape handling, state machine, buffer management) to make the implementation easier to navigate and review.

## Individual Comments

### Comment 1
<location> `picojson/src/push_parser.rs:72` </location>
<code_context>
+    }
+
+    /// Processes a chunk of input data.
+    pub fn write<'input, E>(&mut self, data: &'input [u8]) -> Result<(), PushParseError<E>>
+    where
+        H: for<'a, 'b> PushParserHandler<'a, 'b, E>,
</code_context>

<issue_to_address>
Event storage size is fixed and may not be sufficient for all tokenizer outputs.

Since event_storage is limited to two events, additional events may be lost if the tokenizer emits more. Consider using a Vec or ring buffer, or clearly document the two-event assumption.
</issue_to_address>

### Comment 2
<location> `picojson/src/push_parser.rs:629` </location>
<code_context>
+    }
+}
+
+fn create_tokenizer_callback(
+    event_storage: &mut [Option<(ujson::Event, usize)>; 2],
+    position_offset: usize,
</code_context>

<issue_to_address>
Callback ignores the actual event position from the tokenizer.

The callback uses position_offset instead of the _pos argument from the tokenizer, which may cause incorrect event position tracking. Please use _pos to ensure accurate event positions.
</issue_to_address>

### Comment 3
<location> `picojson/src/push_parser.rs:174` </location>
<code_context>
+    }
+
+    /// Finishes parsing and flushes any remaining events.
+    pub fn finish<E>(&mut self) -> Result<(), PushParseError<E>>
+    where
+        H: for<'a, 'b> PushParserHandler<'a, 'b, E>,
</code_context>

<issue_to_address>
No check for incomplete string or key parsing at finish.

Currently, finish only checks for unfinished numbers. Please add checks to ensure the parser is not left in ParsingString or ParsingKey states to prevent incomplete JSON from being accepted.
</issue_to_address>

### Comment 4
<location> `picojson/tests/push_parser.rs:17` </location>
<code_context>
         }
+
+        // Debug test for tracing SliceParser unicode escape processing
+        #[test]
+        fn test_slice_parser_unicode_trace() {
+            // Test the problematic sequence from pass1.json line 45
</code_context>

<issue_to_address>
Missing tests for invalid or malformed Unicode escape sequences.

Please add tests for cases like missing digits, invalid hex characters, or truncated escapes to verify the parser correctly returns errors.
</issue_to_address>

### Comment 5
<location> `picojson/tests/push_parser_copy_on_escape.rs:7` </location>
<code_context>
         }
+
+        // Debug test for tracing SliceParser unicode escape processing
+        #[test]
+        fn test_slice_parser_unicode_trace() {
+            // Test the problematic sequence from pass1.json line 45
</code_context>

<issue_to_address>
Missing test for mixed borrowed and unescaped strings in the same object.

Please add a test case where an object includes both borrowed and unescaped strings to verify correct parser behavior in mixed scenarios.
</issue_to_address>

### Comment 6
<location> `picojson/tests/push_parser_escapes.rs:35` </location>
<code_context>
         }
+
+        // Debug test for tracing SliceParser unicode escape processing
+        #[test]
+        fn test_slice_parser_unicode_trace() {
+            // Test the problematic sequence from pass1.json line 45
</code_context>

<issue_to_address>
Missing test for invalid escape sequences in keys.

Please add tests for cases like incomplete \u escapes or unsupported escape characters to verify the parser correctly returns errors.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

picojson/src/push_parser.rs

sourcery-ai · 2025-07-26T06:36:43Z

picojson/src/push_parser.rs

+    }
+}
+
+fn create_tokenizer_callback(


issue (bug_risk): Callback ignores the actual event position from the tokenizer.

The callback uses position_offset instead of the _pos argument from the tokenizer, which may cause incorrect event position tracking. Please use _pos to ensure accurate event positions.

sourcery-ai · 2025-07-26T06:36:43Z

picojson/src/push_parser.rs

+    }
+
+    /// Finishes parsing and flushes any remaining events.
+    pub fn finish<E>(&mut self) -> Result<(), PushParseError<E>>


issue (bug_risk): No check for incomplete string or key parsing at finish.

Currently, finish only checks for unfinished numbers. Please add checks to ensure the parser is not left in ParsingString or ParsingKey states to prevent incomplete JSON from being accepted.

sourcery-ai · 2025-07-26T06:36:44Z

picojson/tests/push_parser.rs

+    #[test]
+    fn test_clean_push_parser_compiles() {
+        let mut buffer = [0u8; 256];
+        let handler = SimpleHandler;
+        let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);
+
+        // This should compile without lifetime issues using HRTB + tokenizer + event array
+        parser.write(b"true").unwrap(); // Valid JSON
+        parser.finish::<()>().unwrap();
+        let _handler = parser.destroy();


issue (testing): Missing tests for invalid or malformed Unicode escape sequences.

Please add tests for cases like missing digits, invalid hex characters, or truncated escapes to verify the parser correctly returns errors.

sourcery-ai · 2025-07-26T06:36:44Z

picojson/tests/push_parser_copy_on_escape.rs

+#[test]
+fn test_borrowed_vs_unescaped_simple() {


suggestion (testing): Missing test for mixed borrowed and unescaped strings in the same object.

Please add a test case where an object includes both borrowed and unescaped strings to verify correct parser behavior in mixed scenarios.

sourcery-ai · 2025-07-26T06:36:44Z

picojson/tests/push_parser_escapes.rs

+#[test]
+fn test_string_with_actual_escapes() {
+    // Test that escape sequences in strings are properly processed
+    let json_string = r#"{"message": "Hello\nWorld\t!"}"#;
+    let json = json_string.as_bytes();
+
+    let handler = EventCollector::new();
+    let mut buffer = [0u8; 256];
+    let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);
+


issue (testing): Missing test for invalid escape sequences in keys.

Please add tests for cases like incomplete \u escapes or unsupported escape characters to verify the parser correctly returns errors.

gemini-code-assist

Code Review

This pull request introduces a new PushParser, a SAX-style JSON parser, along with extensive tests. The implementation is comprehensive but has a few critical issues. The most significant is an inefficient byte-by-byte processing loop in the write method, which should be refactored to handle chunks for better performance. Additionally, several error conditions, particularly in Unicode escape handling, are silently ignored, which could lead to incorrect parsing or hide errors in the input. The new tests are a great addition, but many of them print results instead of using assertions, and some test helpers use patterns that could make debugging difficult. My review focuses on improving correctness, performance, and test quality.

gemini-code-assist · 2025-07-26T06:37:53Z

picojson/src/push_parser.rs

+        for (local_pos, &byte) in data.iter().enumerate() {
+            // Update current position to absolute position
+            self.current_position = self.position_offset + local_pos;
+            let mut append_byte = true;
+
+            // Tokenizer generates events that drive state transitions.
+            {
+                let mut callback =
+                    create_tokenizer_callback(&mut event_storage, self.current_position);
+                self.core.tokenizer.parse_chunk(&[byte], &mut callback)?;
+            }


The write method processes the input data slice byte-by-byte, calling self.core.tokenizer.parse_chunk(&[byte], ...) in a loop. This is highly inefficient as parse_chunk is optimized for larger slices. This implementation will have significant performance overhead due to repeated function calls and loop iterations for every single byte of input.

The method should be refactored to process the entire data chunk at once. This will require adjusting the event handling logic to work with offsets within the chunk rather than iterating byte by byte.

gemini-code-assist · 2025-07-26T06:37:54Z

picojson/src/push_parser.rs

+                                    }
+                                }
+                            }
+                            Err(_e) => {}


The Result returned by self.unicode_escape_collector.add_hex_digit(byte) is matched, but the Err case is ignored. If an invalid hexadecimal digit is encountered within a \uXXXX escape sequence, the parser will silently ignore it and continue, potentially producing incorrect output or masking an error in the input JSON. This should be treated as a parsing error.

Err(e) => return Err(e.into()),

gemini-code-assist · 2025-07-26T06:37:54Z

picojson/src/push_parser.rs

+                                                        }
+                                                    }
+                                                }
+                                                Err(_e) => {}


The Result from self.unicode_escape_collector.process_to_utf8(...) is ignored in the Err case. If the collected hex digits form an invalid Unicode scalar value (e.g., a lone surrogate), the error will be silently dropped. This should be propagated as a ParseError to ensure correctness.

Err(e) => return Err(e.into()),

gemini-code-assist · 2025-07-26T06:37:54Z

picojson/src/push_parser.rs

+                                            }
+                                        }
+                                    }
+                                    Err(_e) => {}


The Result returned by self.unicode_escape_collector.add_hex_digit(byte) is matched, but the Err case is ignored. If an invalid hexadecimal digit is encountered within a \uXXXX escape sequence, the parser will silently ignore it and continue, potentially producing incorrect output or masking an error in the input JSON. This should be treated as a parsing error.

Err(e) => return Err(e.into()),

gemini-code-assist · 2025-07-26T06:37:54Z

picojson/src/push_parser.rs

+                                                }
+                                            }
+                                        }
+                                        Err(_e) => {}


The Result from self.unicode_escape_collector.process_to_utf8(...) is ignored in the Err case. If the collected hex digits form an invalid Unicode scalar value (e.g., a lone surrogate), the error will be silently dropped. This should be propagated as a ParseError.

Err(e) => return Err(e.into()),

gemini-code-assist · 2025-07-26T06:37:54Z

picojson/tests/push_parser.rs

+    fn test_unicode_escapes_not_yet_implemented() {
+        // Debug handler that captures strings and keys to test Unicode escape processing
+        struct UnicodeEscapeTestHandler {
+            events: Vec<std::string::String>,
+        }
+
+        impl<'a, 'b> PushParserHandler<'a, 'b, ()> for UnicodeEscapeTestHandler {
+            fn handle_event(&mut self, event: Event<'a, 'b>) -> Result<(), ()> {
+                let event_desc = match event {
+                    Event::StartObject => "StartObject".to_string(),
+                    Event::EndObject => "EndObject".to_string(),
+                    Event::Key(k) => format!("Key({})", k.as_ref()),
+                    Event::String(s) => format!("String({})", s.as_ref()),
+                    Event::EndDocument => "EndDocument".to_string(),
+                    _ => "Other".to_string(),
+                };
+                self.events.push(event_desc);
+                Ok(())
+            }
+        }
+
+        let mut buffer = [0u8; 256];
+        let handler = UnicodeEscapeTestHandler { events: Vec::new() };
+        let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);
+
+        // Test string with Unicode escape sequence (\u0041 should become 'A')
+        parser.write(br#"{"key": "\u0041"}"#).unwrap();
+        parser.finish::<()>().unwrap();
+        let handler = parser.destroy();
+
+        println!("Unicode escape events: {:?}", handler.events);
+
+        // This test verifies that Unicode escapes are working
+    }


The test test_unicode_escapes_not_yet_implemented has a confusing name, as it appears to test functionality that is implemented. Furthermore, the test body is empty except for a comment. This test should either be completed with assertions to verify the behavior or be removed to avoid confusion.

gemini-code-assist · 2025-07-26T06:37:55Z

picojson/tests/push_parser_escapes.rs

+fn test_slice_parser_comparison() {
+    // Test the same JSON with SliceParser to see how it handles escapes
+    let json_string = r#"{"message": "Hello\nWorld\t!"}"#;
+    let mut scratch = [0u8; 256];
+    let mut parser = picojson::SliceParser::with_buffer(json_string, &mut scratch);
+
+    println!("SliceParser results:");
+    while let Ok(event) = parser.next_event() {
+        match event {
+            picojson::Event::EndDocument => break,
+            _ => println!("  {:?}", event),
+        }
+    }


The test test_slice_parser_comparison only prints the output from SliceParser and does not contain any assertions. Tests should programmatically verify behavior rather than relying on manual inspection of printed output. Please add assertions to validate the expected events.

gemini-code-assist · 2025-07-26T06:37:55Z

picojson/tests/push_parser_invalidslicebounds_repro.rs

+fn test_debug_unicode_processing() {
+    use picojson::{Event, PullParser, SliceParser};
+
+    // Debug the exact unicode processing for \u0041 -> A
+    let json = r#"{"k":"\u0041"}"#;
+    println!("Testing unicode: {} (should produce 'A')", json);
+
+    // First test with SliceParser (known working)
+    println!("\n=== SliceParser (working reference) ===");
+    let mut buffer = [0u8; 64];
+    let mut parser = SliceParser::with_buffer(json, &mut buffer);
+
+    loop {
+        match parser.next_event() {
+            Ok(Event::EndDocument) => break,
+            Ok(event) => println!("SliceParser event: {:?}", event),
+            Err(e) => {
+                println!("SliceParser error: {:?}", e);
+                break;
+            }
+        }
+    }
+
+    // Now test with PushParser (broken)
+    println!("\n=== PushParser (broken) ===");
+    let mut buffer = [0u8; 64];
+    let handler = ReproHandler::new();
+    let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);
+
+    let result = parser.write(json.as_bytes());
+    match result {
+        Ok(()) => {
+            let finish_result = parser.finish();
+            match finish_result {
+                Ok(()) => {
+                    let handler = parser.destroy();
+                    println!("PushParser events: {:?}", handler.events);
+
+                    // Check if we got the expected "A" character
+                    let expected_char = "A";
+                    let found_char = handler
+                        .events
+                        .iter()
+                        .find(|event| event.starts_with("String("))
+                        .map(|event| &event[7..event.len() - 1]); // Extract content between String( and )
+
+                    if let Some(actual) = found_char {
+                        if actual == expected_char {
+                            println!(
+                                "✅ Unicode correctly processed: {} -> {}",
+                                "\\u0041", actual
+                            );
+                        } else {
+                            println!(
+                                "❌ Unicode incorrectly processed: {} -> {} (expected {})",
+                                "\\u0041", actual, expected_char
+                            );
+                        }
+                    }
+                }
+                Err(e) => {
+                    println!("❌ FINISH ERROR: {:?}", e);
+                }
+            }
+        }
+        Err(e) => {
+            println!("❌ WRITE ERROR: {:?}", e);
+        }
+    }
+}
+
+#[test]
+fn test_reproduce_invalidslicebounds_minimal_case() {
+    // Try the most minimal case that still has unescaped content
+    let test_cases = vec![
+        (r#"{"k":""}"#, "empty string"),
+        (r#"{"k":"a"}"#, "single char"),
+        (r#"{"k":"ab"}"#, "two chars"),
+        (r#"{"k":"\n"}"#, "simple escape"),
+        (r#"{"k":"\u0041"}"#, "single unicode"),
+        (r#"{"k":"\u0123"}"#, "single unicode hex"),
+        (r#"{"k":"\u0123\u4567"}"#, "double unicode"),
+    ];
+
+    for (json, description) in test_cases {
+        println!("Testing: {} - {}", description, json);
+        let json_bytes = json.as_bytes();
+
+        // Use a small buffer to trigger the issue
+        let mut buffer = [0u8; 32];
+        let handler = ReproHandler::new();
+        let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);
+
+        let result = parser.write(json_bytes);
+        match result {
+            Ok(()) => {
+                let finish_result = parser.finish();
+                match finish_result {
+                    Ok(()) => {
+                        let handler = parser.destroy();
+                        println!(
+                            "✅ SUCCESS: {} - Events = {:?}",
+                            description, handler.events
+                        );
+                    }
+                    Err(e) => {
+                        println!("❌ FINISH ERROR: {} - {:?}", description, e);
+                    }
+                }
+            }
+            Err(e) => {
+                println!("❌ WRITE ERROR: {} - {:?}", description, e);
+                return; // Stop at first error to identify the minimal trigger
+            }
+        }
+        println!();
+    }
+}
+
+#[test]
+fn test_reproduce_invalidslicebounds_progressive_size() {
+    // Test with progressively smaller buffer sizes to find the exact trigger point
+    let json_content = br#"{"key": "\u0123\u4567"}"#;
+
+    for buffer_size in (8..=64).rev() {
+        println!("Trying buffer size: {}", buffer_size);
+        let mut buffer = vec![0u8; buffer_size];
+        let handler = ReproHandler::new();
+        let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);
+
+        let result = parser.write(json_content);
+        match result {
+            Ok(()) => {
+                let finish_result = parser.finish();
+                match finish_result {
+                    Ok(()) => {
+                        let handler = parser.destroy();
+                        println!(
+                            "SUCCESS at buffer size {}: Events = {:?}",
+                            buffer_size, handler.events
+                        );
+                    }
+                    Err(e) => {
+                        println!("FINISH ERROR at buffer size {}: {:?}", buffer_size, e);
+                        break;
+                    }
+                }
+            }
+            Err(e) => {
+                println!("WRITE ERROR at buffer size {}: {:?}", buffer_size, e);
+                break;
+            }
+        }
+    }
+}


Several tests in this file (test_debug_unicode_processing, test_reproduce_invalidslicebounds_minimal_case, test_reproduce_invalidslicebounds_progressive_size) print their results to the console instead of using assertions. While useful for debugging, tests committed to the repository should programmatically verify behavior using assert!, assert_eq!, or panic! on failure. This ensures they can be run automatically in CI. Please convert these println!-based checks into assertions.

gemini-code-assist · 2025-07-26T06:37:55Z

picojson/tests/push_parser_stress_test.rs

+            Event::Key(k) => Event::Key(picojson::String::Borrowed(k.as_ref().to_string().leak())),
+            Event::String(s) => {
+                Event::String(picojson::String::Borrowed(s.as_ref().to_string().leak()))
+            }


The StressTestHandler uses to_string().leak() to give the event data a 'static lifetime. While this is a common and often acceptable pattern for simplifying test code, it does leak memory. This could become an issue if the test suite grows very large or is run under memory-constrained conditions (like with valgrind).

A comment explaining why leak() is used here would be helpful for future maintainers and to prevent this pattern from being accidentally copied into production code.

gemini-code-assist · 2025-07-26T06:37:55Z

picojson/tests/push_parser_stress_test.rs

+fn test_push_parser_stress_chunk_patterns() {
+    println!("=== PushParser Chunk Pattern Stress Test ===");
+    println!(
+        "NOTE: Multi-call scenarios may fail - Phase 4 (numbers, streaming, robustness) is pending"
+    );
+    let scenarios = get_push_parser_test_scenarios();
+
+    // Test patterns: Large chunks should work, tiny chunks may fail due to Phase 4 limitations
+    let chunk_patterns: &[&[usize]] = &[
+        &[50], // Large chunks - should work with copy-on-escape
+        &[10], // Medium chunks - may work
+               // NOTE: Byte-by-byte patterns disabled due to Phase 4 limitations
+               // &[1],           // Byte-by-byte - Phase 4 not complete
+               // &[2],           // Two bytes at a time - Phase 4 not complete
+               // &[3, 1, 2],     // Variable small chunks - Phase 4 not complete
+               // &[1, 5, 1],     // Mixed tiny and small - Phase 4 not complete
+               // &[7, 1, 1, 10], // Irregular pattern - Phase 4 not complete
+    ];
+
+    for scenario in &scenarios {
+        println!("--- Testing Scenario: {} ---", scenario.name);
+        let buffer_size = scenario.min_buffer_size + 10; // Adequate buffer
+
+        for &pattern in chunk_patterns {
+            let result = test_push_parsing_with_config(scenario, buffer_size, pattern);
+
+            match result {
+                Ok(()) => {
+                    println!("✅ [P={:?}] SUCCESS", pattern);
+                }
+                Err(e) => {
+                    // For now, just log failures instead of panicking
+                    // TODO: Remove this when Phase 4 is complete
+                    println!(
+                        "⚠️ [P={:?}] EXPECTED FAILURE for scenario '{}' - {} (Phase 4 pending)",
+                        pattern, scenario.name, e
+                    );
+                }
+            }
+        }
+    }
+}
+
+#[test]
+fn test_push_parser_stress_critical_matrix() {
+    println!("=== PushParser Critical Size Matrix Test ===");
+    println!(
+        "NOTE: Multi-call chunking may fail - Phase 4 (numbers, streaming, robustness) is pending"
+    );
+    let scenarios = get_push_parser_test_scenarios();
+
+    for scenario in &scenarios {
+        println!("--- Testing Scenario: {} ---", scenario.name);
+        let critical_buffer_sizes: Vec<usize> =
+            (scenario.min_buffer_size.saturating_sub(2)..=scenario.min_buffer_size + 5).collect();
+        // Use only larger chunk patterns that are more likely to work
+        let chunk_patterns: &[&[usize]] = &[&[50]]; // Only large chunks for now
+
+        for &buffer_size in &critical_buffer_sizes {
+            for &pattern in chunk_patterns {
+                let result = test_push_parsing_with_config(scenario, buffer_size, pattern);
+                let expected_success =
+                    should_succeed_push_parser(buffer_size, scenario.min_buffer_size);
+
+                match (result.is_ok(), expected_success) {
+                    (true, true) => {
+                        println!("✅ [B={}, P={:?}] SUCCESS (expected)", buffer_size, pattern);
+                    }
+                    (false, false) => {
+                        println!("✅ [B={}, P={:?}] FAIL (expected)", buffer_size, pattern);
+                    }
+                    (true, false) => {
+                        // With copy-on-escape optimization, we might succeed with smaller buffers
+                        println!("✅ [B={}, P={:?}] Unexpected SUCCESS - copy-on-escape working better than expected", buffer_size, pattern);
+                    }
+                    (false, true) => {
+                        // For now, log instead of panic for Phase 4 limitations
+                        println!("⚠️ [B={}, P={:?}] EXPECTED FAILURE for scenario '{}' - {} (Phase 4 pending)",
+                            buffer_size, pattern, scenario.name, result.unwrap_err());
+                    }
+                }
+            }
+        }
+    }
+}
+
+#[test]
+fn test_push_parser_stress_unicode_edge_cases() {
+    println!("=== PushParser Unicode Edge Cases Stress Test ===");
+
+    let unicode_scenarios = vec![
+        TestScenario {
+            name: "Consecutive Unicode",
+            json: br#"["\u0123\u4567\u89AB\uCDEF"]"#,
+            expected_events: vec![
+                Event::StartArray,
+                Event::String(picojson::String::Unescaped("ģ䕧覫췯")), // Consecutive unicode
+                Event::EndArray,
+                Event::EndDocument,
+            ],
+            min_buffer_size: 25, // Unicode processing buffer for consecutive escapes
+        },
+        TestScenario {
+            name: "Unicode at Chunk Boundary",
+            json: br#"["\u0041XYZ"]"#,
+            expected_events: vec![
+                Event::StartArray,
+                Event::String(picojson::String::Unescaped("AXYZ")), // \u0041 = A
+                Event::EndArray,
+                Event::EndDocument,
+            ],
+            min_buffer_size: 15, // Unicode + normal text processing
+        },
+        TestScenario {
+            name: "Empty Key with Unicode Value",
+            json: br#"{"": "\u2603"}"#, // Snowman character
+            expected_events: vec![
+                Event::StartObject,
+                Event::Key("".into()),
+                Event::String(picojson::String::Unescaped("☃")), // \u2603 = ☃
+                Event::EndObject,
+                Event::EndDocument,
+            ],
+            min_buffer_size: 12, // Empty key + unicode value processing
+        },
+    ];
+
+    for scenario in &unicode_scenarios {
+        println!("--- Testing Unicode Scenario: {} ---", scenario.name);
+
+        // Test specifically challenging chunk patterns for unicode
+        let unicode_chunk_patterns: &[&[usize]] = &[
+            &[1],       // Byte-by-byte (challenges unicode boundaries)
+            &[6, 1],    // Split unicode escapes
+            &[3, 2, 1], // Irregular splits
+        ];
+
+        let buffer_size = scenario.min_buffer_size + 5;
+
+        for &pattern in unicode_chunk_patterns {
+            let result = test_push_parsing_with_config(scenario, buffer_size, pattern);
+
+            match result {
+                Ok(()) => {
+                    println!("✅ [P={:?}] Unicode SUCCESS", pattern);
+                }
+                Err(e) => {
+                    panic!(
+                        "❌ [P={:?}] Unicode FAILURE for scenario '{}' - {}",
+                        pattern, scenario.name, e
+                    );
+                }
+            }
+        }
+    }
+}
+
+#[test]
+fn test_push_parser_stress_document_validation() {
+    println!("=== PushParser Document Validation Stress Test ===");
+
+    // Test incomplete documents that should fail
+    let invalid_scenarios: Vec<(&str, &[u8], &str)> = vec![
+        ("Unclosed Array", b"[\"hello\"", "array not closed"),
+        (
+            "Unclosed Object",
+            b"{\"key\": \"value\"",
+            "object not closed",
+        ),
+        ("Extra Comma", b"{\"key\": \"value\",}", "trailing comma"),
+        ("Missing Value", b"{\"key\":}", "missing value"),
+    ];
+
+    for (name, json, _description) in &invalid_scenarios {
+        println!("--- Testing Invalid: {} ---", name);
+
+        let buffer_size = 50; // Adequate buffer
+        let chunk_patterns: &[&[usize]] = &[&[1], &[3], &[10]];
+
+        for &pattern in chunk_patterns {
+            let mut buffer = vec![0u8; buffer_size];
+            let handler = StressTestHandler::new(&[]);
+            let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);
+            let mut writer = ChunkedWriter::new(json, pattern);
+
+            let result = writer.write_to_parser(&mut parser);
+
+            if result.is_ok() {
+                panic!(
+                    "❌ [P={:?}] Expected FAILURE for '{}', but got SUCCESS",
+                    pattern, name
+                );
+            } else {
+                println!("✅ [P={:?}] Correctly FAILED for '{}'", pattern, name);
+            }
+        }
+    }
+}


The stress tests in this file have inconsistent failure handling. Some tests panic! on failure (e.g., test_push_parser_stress_unicode_edge_cases), while others print a ⚠️ warning and continue (e.g., test_push_parser_stress_chunk_patterns). For CI integration and clarity, all tests should fail explicitly on unexpected results, usually by panicking. Please unify the behavior to panic! on unexpected failures. The comments about 'Phase 4 pending' can be kept to explain why a test might be expected to fail for now.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (8)

picojson/tests/push_parser_escapes.rs (1)
89-103: Consider adding assertions to the comparison test.

While the debug output is useful for manual verification, adding assertions to compare the PushParser and SliceParser outputs would make this test more robust and catch regressions automatically.
 #[test]
 fn test_slice_parser_comparison() {
     // Test the same JSON with SliceParser to see how it handles escapes
     let json_string = r#"{"message": "Hello\nWorld\t!"}"#;
     let mut scratch = [0u8; 256];
     let mut parser = picojson::SliceParser::with_buffer(json_string, &mut scratch);

+    let mut slice_events = Vec::new();
     println!("SliceParser results:");
     while let Ok(event) = parser.next_event() {
         match event {
             picojson::Event::EndDocument => break,
-            _ => println!("  {:?}", event),
+            event => {
+                println!("  {:?}", event);
+                slice_events.push(format!("{:?}", event));
+            }
         }
     }
+    
+    // Could add assertions here to compare with PushParser output
+    assert!(!slice_events.is_empty(), "SliceParser should produce events");
 }
picojson/tests/json_checker_tests.rs (1)
89-105: Use underscore prefix for intentionally unused variable
         loop {
             match parser.next_event() {
                 Ok(Event::EndDocument) => break,
-                Ok(event) => {
+                Ok(_event) => {
                     event_count += 1;
                 }
                 Err(e) => return Err(e),
             }
         }
picojson/tests/push_parser.rs (4)
344-412: Debug tests could validate parsed content

These debug tests only verify that parsing completes without errors but don't validate the actual parsed content. Consider capturing and asserting on the events for more thorough testing.

485-492: Remove or implement the incomplete logging code

The event counting handler has an incomplete logging implementation.
             fn handle_event(&mut self, _event: Event<'input, 'scratch>) -> Result<(), ()> {
                 self.event_count += 1;
-                if self.event_count % 10 == 0 {
-                    // Log every 10 events
-                }
                 Ok(())
             }
586-619: Remove duplicate unicode escape test

This test duplicates the functionality of test_unicode_escapes (lines 250-294) with a confusing name suggesting it's not implemented.

Consider removing this duplicate test or clearly differentiating its purpose from the earlier unicode escape test.

677-679: Remove redundant imports

These imports are already present at the module level.
     #[test]
     fn test_single_slash_escape() {
-        use picojson::{DefaultConfig, Event, PushParser, PushParserHandler};
-
         struct Handler {
picojson/tests/push_parser_invalidslicebounds_repro.rs (1)
205-210: Improve event content extraction

The manual string extraction from event description is fragile and could break if the format changes.
-                    let found_char = handler
-                        .events
-                        .iter()
-                        .find(|event| event.starts_with("String("))
-                        .map(|event| &event[7..event.len() - 1]); // Extract content between String( and )
+                    let found_char = handler
+                        .events
+                        .iter()
+                        .find_map(|event| {
+                            if let Some(content) = event.strip_prefix("String(") {
+                                content.strip_suffix(")")
+                            } else {
+                                None
+                            }
+                        });
picojson/tests/push_parser_stress_test.rs (1)

369-378: Re-enable chunk pattern tests when Phase 4 is complete

Most chunk patterns are disabled due to Phase 4 limitations, reducing test coverage. These should be re-enabled when the parser robustness improvements are complete.

Would you like me to create an issue to track the Phase 4 parser robustness improvements and the re-enabling of these tests?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1213716 and 5de540e.

📒 Files selected for processing (10)

picojson/src/lib.rs (1 hunks)
picojson/src/parse_error.rs (2 hunks)
picojson/src/push_parser.rs (1 hunks)
picojson/src/stream_parser.rs (1 hunks)
picojson/tests/json_checker_tests.rs (5 hunks)
picojson/tests/push_parser.rs (1 hunks)
picojson/tests/push_parser_copy_on_escape.rs (1 hunks)
picojson/tests/push_parser_escapes.rs (1 hunks)
picojson/tests/push_parser_invalidslicebounds_repro.rs (1 hunks)
picojson/tests/push_parser_stress_test.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (11)

📓 Common learnings

Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/src/stream_parser.rs (4)

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

picojson/src/lib.rs (6)

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via use statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the crate:: path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

picojson/src/parse_error.rs (3)

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

picojson/tests/push_parser_escapes.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser_copy_on_escape.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

picojson/src/push_parser.rs (7)

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser_invalidslicebounds_repro.rs (7)

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

picojson/tests/json_checker_tests.rs (7)

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser_stress_test.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

🔇 Additional comments (26)

picojson/src/stream_parser.rs (1)

759-759: LGTM! Appropriate Clippy lint suppression.

The #[allow(clippy::approx_constant)] attribute correctly suppresses the warning for this test case. The value 3.14 here represents a literal JSON number being parsed, not a mathematical approximation of π, so using core::f64::consts::PI would be semantically incorrect.

picojson/src/lib.rs (1)

101-102: LGTM! Clean API extension for the new push parser.

The module declaration and public re-exports follow consistent patterns with the existing codebase. The new PushParser, PushParserHandler, and PushParseError types provide a complete push-based parsing API that complements the existing pull parsers.

picojson/src/parse_error.rs (3)

14-15: LGTM! Useful UTF-8 error variant addition.

The new Utf8(core::str::Utf8Error) variant provides detailed UTF-8 validation error information, which is particularly valuable for incremental parsing scenarios where UTF-8 validation may occur across buffer boundaries.

18-19: LGTM! Complementary buffer error variant.

The InputBufferFull variant nicely complements the existing ScratchBufferFull, providing clear distinction between different types of buffer capacity issues in the parsing pipeline.

80-84: LGTM! Clean error conversion implementation.

The From<ujson::Error> implementation provides a clean conversion path from the tokenizer layer, following standard Rust error handling patterns and enabling seamless error propagation.

picojson/tests/push_parser_copy_on_escape.rs (3)

1-53: LGTM! Well-structured test for zero-copy optimization.

This test effectively verifies that the push parser correctly performs zero-copy string borrowing when no escape sequences are present. The handler implementation properly checks the String enum variants, and the assertions clearly validate the expected optimization behavior.

55-101: LGTM! Comprehensive escape sequence handling test.

This test properly validates that strings containing escape sequences are correctly identified as requiring unescaping and copying. The test covers both keys and values, ensuring consistent behavior across different string contexts.

103-170: LGTM! Important buffer isolation verification.

This test addresses a critical concern in incremental parsing - ensuring that buffer content doesn't accumulate or leak between different string events. The byte-level content verification provides strong assurance that the parser maintains proper buffer boundaries.

picojson/tests/push_parser_escapes.rs (3)

5-33: LGTM! Clean event collection handler implementation.

The EventCollector provides a clear way to capture and verify parser events. The handler implementation properly converts events to readable debug strings, making test failures easy to diagnose.

35-60: LGTM! Comprehensive multi-escape sequence test.

This test effectively validates that multiple escape sequences (\n, \t) within a single string are correctly processed and converted to their actual character representations.

105-158: LGTM! Thorough key escape sequence testing.

These tests properly validate that escape sequences work correctly in JSON keys, not just values. Testing both newline and quote escapes in keys ensures comprehensive coverage of the escape processing logic.

picojson/tests/json_checker_tests.rs (3)

17-20: LGTM!

The imports are well-organized and all imported types are utilized in the implementation.

324-349: LGTM!

The macro generates comprehensive failure tests for PushParser, maintaining consistency with SliceParser tests.

375-394: LGTM!

The known deviation tests ensure PushParser aligns with SliceParser behavior for modern JSON standards (RFC 7159) and deep nesting support.

picojson/tests/push_parser.rs (1)

9-15: LGTM!

Simple handler implementation appropriate for basic compilation tests.

picojson/tests/push_parser_invalidslicebounds_repro.rs (2)

9-38: LGTM!

Well-structured handler for diagnostic purposes that properly handles event lifetime by converting to owned strings.

40-162: LGTM!

Comprehensive test scenarios for reproducing buffer boundary issues with clear error reporting.

picojson/src/push_parser.rs (7)

1-11: LGTM!

Clear module documentation and appropriate imports.

12-27: LGTM!

Well-defined state enums that clearly represent the parser's state machine.

98-156: LGTM!

Well-structured escape sequence handling with proper UTF-8 conversion for Unicode escapes.

348-375: LGTM!

Robust number parsing that handles various termination scenarios and validates slice bounds.

489-543: LGTM!

Sophisticated Unicode escape handling that correctly processes the final hex digit and manages buffer positions to exclude the escape sequence from the output.

596-627: LGTM!

Well-designed error handling with clear separation between parser and handler errors, and comprehensive error type conversions.

629-647: Intentional fixed-size event storage for embedded environments

This implementation uses a fixed-size array of 2 events and silently drops events when full, which aligns with the design for embedded/constrained environments requiring zero-allocation guarantee and deterministic memory usage.

picojson/tests/push_parser_stress_test.rs (2)

228-237: Unicode test expects incorrect output

The test expects raw Unicode escape sequences ("\\u0041\\u0042\\u0043") instead of the parsed result ("ABC"). The comment suggests Unicode processing still has issues.

This should be tracked as a known issue or the test should be updated when Unicode processing is fixed.

518-558: LGTM!

Comprehensive validation tests for invalid JSON documents with good coverage of common syntax errors.

picojson/tests/json_checker_tests.rs

coderabbitai · 2025-07-26T06:41:29Z

picojson/tests/json_checker_tests.rs

+    fn run_push_parser_test(json_content: &str) -> Result<usize, ParseError> {
+        let mut buffer = [0u8; 2048]; // Larger buffer for pass1.json
+        let handler = ConformanceTestHandler {
+            event_count: 0,
+            error: None,
+        };
+        let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);
+
+        match parser.write(json_content.as_bytes()) {
+            Ok(()) => {}
+            Err(e) => {
+                return Err(match e {
+                    PushParseError::Parse(parse_err) => parse_err,
+                    PushParseError::Handler(_handler_err) => {
+                        // Convert handler error to a parse error for consistency
+                        ParseError::InvalidNumber // Use any available ParseError variant
+                    }
+                });
+            }
+        }
+
+        match parser.finish::<String>() {
+            Ok(()) => {}
+            Err(e) => {
+                return Err(match e {
+                    PushParseError::Parse(parse_err) => parse_err,
+                    PushParseError::Handler(_handler_err) => {
+                        ParseError::InvalidNumber // Use any available ParseError variant
+                    }
+                });
+            }
+        }
+
+        let handler = parser.destroy();
+        Ok(handler.event_count)
+    }


🛠️ Refactor suggestion

⚠️ Potential issue

Improve error handling and remove unused field initialization

The error conversion uses ParseError::InvalidNumber as a placeholder for handler errors, which is misleading

The handler initialization includes the unused error field

fn run_push_parser_test(json_content: &str) -> Result<usize, ParseError> { let mut buffer = [0u8; 2048]; // Larger buffer for pass1.json let handler = ConformanceTestHandler { event_count: 0, - error: None, }; let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer); match parser.write(json_content.as_bytes()) { Ok(()) => {} Err(e) => { return Err(match e { PushParseError::Parse(parse_err) => parse_err, PushParseError::Handler(_handler_err) => { - // Convert handler error to a parse error for consistency - ParseError::InvalidNumber // Use any available ParseError variant + // Handler errors shouldn't occur in conformance tests + unreachable!("ConformanceTestHandler never returns errors") } }); } }

Consider adding a more appropriate ParseError variant for handler errors if they need to be propagated.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In picojson/tests/json_checker_tests.rs around lines 52 to 87, the error handling converts handler errors to the misleading ParseError::InvalidNumber variant, and the handler struct initializes an unused error field. Remove the unused error field from the ConformanceTestHandler initialization and update the error conversion logic to use a more appropriate or custom ParseError variant for handler errors to accurately represent the error source.

picojson/tests/json_checker_tests.rs

picojson/tests/push_parser_stress_test.rs

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (7)

picojson/tests/json_checker_tests.rs (4)

40-42: Remove unused error field

The error field in ConformanceTestHandler is never used and should be removed.

63-66: Improve error handling for handler errors

Converting PushParseError::Handler to an arbitrary ParseError::InvalidNumber is misleading when debugging test failures.

76-79: Improve error handling for handler errors during finish

Converting PushParseError::Handler from finish() to an arbitrary ParseError::InvalidNumber is misleading when debugging test failures.

158-177: Enhance test to validate parsed content

The test only verifies that parsing doesn't panic but doesn't validate the actual parsed string content.

picojson/src/push_parser.rs (1)

78-88: Inefficient byte-by-byte processing

The write method processes input data byte-by-byte, which is highly inefficient as parse_chunk is optimized for larger slices.

picojson/tests/push_parser_stress_test.rs (1)

399-406: Inconsistent failure handling in stress tests

This test prints a warning instead of panicking on failure, which is inconsistent with other tests. For CI integration, tests should fail explicitly.

picojson/tests/push_parser.rs (1)

342-583: Comprehensive debugging and robustness tests for complex JSON scenarios.

These tests provide excellent coverage of real-world JSON parsing scenarios and help ensure parser stability with complex inputs. The investigative nature and minimal assertions are appropriate for robustness testing.

However, based on previous review feedback, there's still missing coverage for invalid or malformed Unicode escape sequences.

Please add tests for cases like missing digits, invalid hex characters, or truncated escapes to verify the parser correctly returns errors, as mentioned in previous reviews.

🧹 Nitpick comments (1)

picojson/tests/push_parser.rs (1)
250-294: Good Unicode escape sequence validation with minor comment inconsistency.

The test correctly validates Unicode escape processing (\u0041 → 'A'). However, the comment on line 289 shows "\u0041" when it should show "\u0041" to match the actual escape sequence being tested.
-                "String(A)".to_string(), // \\u0041 should be converted to 'A'
+                "String(A)".to_string(), // \u0041 should be converted to 'A'

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5de540e and 2344e76.

📒 Files selected for processing (6)

picojson/src/parse_error.rs (4 hunks)
picojson/src/push_parser.rs (1 hunks)
picojson/src/stream_parser.rs (1 hunks)
picojson/tests/json_checker_tests.rs (5 hunks)
picojson/tests/push_parser.rs (1 hunks)
picojson/tests/push_parser_stress_test.rs (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

picojson/src/stream_parser.rs
picojson/src/parse_error.rs

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/json_checker_tests.rs (7)

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/src/push_parser.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser_stress_test.rs (7)

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run AVR Test Suites

🔇 Additional comments (12)

picojson/src/push_parser.rs (2)

188-193: Good error handling for incomplete parsing states

The finish method now properly checks for incomplete string and key parsing states, ensuring incomplete JSON is rejected.

507-545: Proper error handling for unicode escape sequences

The unicode escape processing now correctly propagates errors for invalid hex digits and invalid unicode scalar values, addressing previous concerns.

picojson/tests/push_parser.rs (10)

1-6: Well-structured test module with appropriate imports.

The module structure is clean with proper licensing, test attributes, and comprehensive imports from the picojson crate.

8-27: Effective minimal test handler for basic functionality verification.

The SimpleHandler provides a clean baseline test that verifies parser compilation, basic operations (write, finish, destroy), and lifecycle management without event processing complexity.

29-64: Excellent HRTB pattern validation test.

This test effectively validates the Higher-Ranked Trait Bounds pattern which is crucial for lifetime management in the push parser. The test correctly verifies both string processing and event counting, ensuring proper integration between the tokenizer and HRTB system.

66-104: Well-designed test for string extraction and zero-copy parsing.

The test effectively validates string content extraction while properly handling lifetimes by using owned strings in the test handler. This approach allows testing the underlying borrowed string mechanism without introducing lifetime complications in the test code.

106-155: Comprehensive debug handler for event stream visibility.

The DebugHandler provides excellent coverage of event types and serves as a valuable debugging tool. The relaxed assertions are appropriate for this type of exploratory test that focuses on event generation rather than strict content validation.

157-202: Thorough validation of object key-value parsing.

The test effectively validates the complete event sequence for object parsing, ensuring keys and values are properly distinguished and emitted in the correct order. The strict assertion provides good coverage of the parser's object handling capabilities.

204-248: Solid validation of basic escape sequence processing.

The test correctly validates that escape sequences like \n are properly converted to their corresponding characters (actual newline). This is fundamental JSON parsing behavior and the test implementation is sound.

296-341: Excellent coverage of mixed escape sequence processing.

This test provides comprehensive validation of complex escape scenarios including Unicode escapes, control characters, and special characters. The expected output correctly demonstrates proper conversion of all escape types, making this a valuable test for parser robustness.

586-640: Thorough validation of number parsing across different formats.

The test effectively validates number parsing for integers, floats, and negative numbers, ensuring proper Number event emission with correct string representation. Good coverage of various number formats in JSON.

642-673: Clean and focused test for forward slash escape sequence.

The test correctly validates that the \/ escape sequence is properly converted to /, which is standard JSON behavior. The focused approach and clear assertion make this an effective unit test.

picojson/tests/json_checker_tests.rs

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

picojson/tests/json_checker_tests.rs (1)
88-104: Fix unused variable warning

The event variable on line 97 is unused and should be prefixed with an underscore to suppress the warning.
-                Ok(event) => {
+                Ok(_event) => {
picojson/src/push_parser.rs (1)

72-176: Performance concern documented but implementation is correct

The byte-by-byte processing approach is acknowledged as suboptimal in the TODO comment on line 81-82. While this impacts performance for large inputs, the implementation correctly handles the complex state transitions and escape sequence processing.

Consider implementing chunk-based processing in a future iteration to improve performance for large JSON inputs.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2344e76 and 6375d19.

📒 Files selected for processing (3)

picojson/src/push_parser.rs (1 hunks)
picojson/tests/json_checker_tests.rs (5 hunks)
picojson/tests/push_parser.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/json_checker_tests.rs (7)

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/tests/push_parser.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/src/push_parser.rs (7)

Learnt from: kaidokert
PR: #1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: #44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: #55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: #5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the use tokenizer as ujson; alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: #1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: #55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

🪛 GitHub Actions: Build and test

picojson/tests/json_checker_tests.rs

[error] 172-172: Test failure in 'test_slice_parser_unicode_trace': Should contain \uCAFE decoded as Unicode character.

[error] 276-276: Test failure in 'test_stream_parser_unicode_trace': Should contain \uCAFE decoded as Unicode character.

[warning] 97-97: Unused variable 'event'. Consider prefixing it with an underscore if intentional.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run AVR Test Suites

🔇 Additional comments (11)

picojson/tests/json_checker_tests.rs (3)

39-86: LGTM: Clean handler implementation and appropriate error handling

The ConformanceTestHandler is simple and effective for counting events. The error conversion from PushParseError::Handler to ParseError::ReaderError is appropriate since it represents external (handler) errors.

363-388: Excellent macro-based test generation

The generate_push_parser_fail_tests macro ensures consistent test coverage between SliceParser and PushParser for all expected failure cases. This systematic approach reduces the risk of missing test cases.

414-433: Good consistency validation between parsers

The PushParser known deviation tests ensure that the new parser maintains the same behavior as SliceParser for edge cases like scalar root values and deep nesting. This is crucial for API compatibility.

picojson/tests/push_parser.rs (5)

17-64: Solid foundation tests for HRTB and lifetime management

These tests effectively validate that the complex Higher Ranked Trait Bounds (HRTB) pattern works correctly with the PushParser. The progression from simple compilation test to event capturing demonstrates proper lifetime management.

212-348: Comprehensive escape sequence testing

The tests cover simple escapes (\n), Unicode escapes (\u0041), and consecutive mixed escapes effectively. The progression from basic to complex escape scenarios provides good validation coverage.

681-825: Excellent error case validation

The tests for invalid Unicode escapes (incomplete sequences, invalid hex characters) and invalid escape sequences in keys provide crucial validation of parser error handling. This addresses the robustness requirements for a production parser.

753-801: Smart validation of copy-on-escape optimization

This test effectively validates the parser's optimization strategy of using String::Borrowed for simple strings and String::Unescaped only when escape processing is required. This is crucial for performance in real-world usage.

350-508: Systematic debugging approach for complex inputs

The progressive testing approach from specific problematic lines to larger sections of pass1.json demonstrates good debugging methodology. This systematic isolation helps identify the root cause of parsing issues.

picojson/src/push_parser.rs (3)

119-156: Proper error handling for Unicode escape processing

The Unicode escape processing now correctly handles and propagates errors from both add_hex_digit() and process_to_utf8() calls. This addresses the error handling concerns from previous reviews.

178-210: Comprehensive state validation in finish method

The finish method properly handles all parser states, including appropriate error handling for incomplete strings and keys (lines 193-198). This ensures incomplete JSON documents are properly rejected.

643-658: Correct position tracking in tokenizer callback

The callback correctly converts relative positions to absolute positions using base_position + relative_pos. This ensures accurate event position tracking throughout the parsing process.

coderabbitai · 2025-07-26T07:25:51Z

picojson/tests/json_checker_tests.rs

+        // Debug test for tracing SliceParser unicode escape processing
+        #[test]
+        fn test_slice_parser_unicode_trace() {
+            // Test the problematic sequence from pass1.json line 45
+            let unicode_sequence = r#""\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;
+
+            let mut buffer = [0u8; 1024];
+            let mut parser = SliceParser::with_buffer(unicode_sequence, &mut buffer);
+
+            match parser.next_event() {
+                Ok(Event::String(s)) => {
+                    let parsed_content = s.as_ref();
+                    // Verify that unicode escapes are properly decoded
+                    assert!(parsed_content.contains('\u{CAFE}'), "Should contain \\uCAFE decoded as Unicode character");
+                    assert!(parsed_content.contains('\u{BABE}'), "Should contain \\uBABE decoded as Unicode character");
+                    assert!(parsed_content.contains('\u{AB98}'), "Should contain \\uAB98 decoded as Unicode character");
+                    assert!(parsed_content.contains('\u{FCDE}'), "Should contain \\uFCDE decoded as Unicode character");
+                    assert!(parsed_content.contains('\u{bcda}'), "Should contain \\ubcda decoded as Unicode character");
+                    assert!(parsed_content.contains('\u{ef4A}'), "Should contain \\uef4A decoded as Unicode character");
+                    // Verify that other escape sequences are also processed
+                    assert!(parsed_content.contains('/'), "Should contain / from \\/ escape");
+                    assert!(parsed_content.contains('\\'), "Should contain \\ from \\\\ escape");
+                    assert!(parsed_content.contains('"'), "Should contain \" from \\\" escape");
+                    assert!(parsed_content.contains('\u{08}'), "Should contain backspace from \\b escape");
+                    assert!(parsed_content.contains('\u{0C}'), "Should contain form feed from \\f escape");
+                    assert!(parsed_content.contains('\n'), "Should contain newline from \\n escape");
+                    assert!(parsed_content.contains('\r'), "Should contain carriage return from \\r escape");
+                    assert!(parsed_content.contains('\t'), "Should contain tab from \\t escape");
+                }
+                Ok(other) => {
+                    panic!("Expected String event, got: {:?}", other);
+                }
+                Err(e) => {
+                    panic!(
+                        "SliceParser should handle this sequence, got error: {:?}",
+                        e
+                    );
+                }
+            }
+        }


⚠️ Potential issue

Fix double-escaping in Unicode test string

The test is failing because the Unicode escape sequences are double-escaped. The string contains "\\uCAFE\\uBABE..." which represents literal \uCAFE text, not Unicode escape sequences.

For the test to work correctly, use single backslashes:

- let unicode_sequence = r#""\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#; + let unicode_sequence = r#""\\/\\\"\uCAFE\uBABE\uAB98\uFCDE\ubcda\uef4A\b\f\n\r\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;

This will allow the parser to properly decode the Unicode escape sequences into their corresponding characters.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Debug test for tracing SliceParser unicode escape processing

#[test]

fn test_slice_parser_unicode_trace() {

// Test the problematic sequence from pass1.json line 45

let unicode_sequence = r#""\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;

let mut buffer = [0u8; 1024];

let mut parser = SliceParser::with_buffer(unicode_sequence, &mut buffer);

match parser.next_event() {

Ok(Event::String(s)) => {

let parsed_content = s.as_ref();

// Verify that unicode escapes are properly decoded

assert!(parsed_content.contains('\u{CAFE}'), "Should contain \\uCAFE decoded as Unicode character");

assert!(parsed_content.contains('\u{BABE}'), "Should contain \\uBABE decoded as Unicode character");

assert!(parsed_content.contains('\u{AB98}'), "Should contain \\uAB98 decoded as Unicode character");

assert!(parsed_content.contains('\u{FCDE}'), "Should contain \\uFCDE decoded as Unicode character");

assert!(parsed_content.contains('\u{bcda}'), "Should contain \\ubcda decoded as Unicode character");

assert!(parsed_content.contains('\u{ef4A}'), "Should contain \\uef4A decoded as Unicode character");

// Verify that other escape sequences are also processed

assert!(parsed_content.contains('/'), "Should contain / from \\/ escape");

assert!(parsed_content.contains('\\'), "Should contain \\ from \\\\ escape");

assert!(parsed_content.contains('"'), "Should contain \" from \\\" escape");

assert!(parsed_content.contains('\u{08}'), "Should contain backspace from \\b escape");

assert!(parsed_content.contains('\u{0C}'), "Should contain form feed from \\f escape");

assert!(parsed_content.contains('\n'), "Should contain newline from \\n escape");

assert!(parsed_content.contains('\r'), "Should contain carriage return from \\r escape");

assert!(parsed_content.contains('\t'), "Should contain tab from \\t escape");

}

Ok(other) => {

panic!("Expected String event, got: {:?}", other);

}

Err(e) => {

panic!(

"SliceParser should handle this sequence, got error: {:?}",

e

);

}

}

}

// Debug test for tracing SliceParser unicode escape processing

#[test]

fn test_slice_parser_unicode_trace() {

// Test the problematic sequence from pass1.json line 45

let unicode_sequence = r#""\\/\\\"\uCAFE\uBABE\uAB98\uFCDE\ubcda\uef4A\b\f\n\r\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;

let mut buffer = [0u8; 1024];

let mut parser = SliceParser::with_buffer(unicode_sequence, &mut buffer);

match parser.next_event() {

Ok(Event::String(s)) => {

let parsed_content = s.as_ref();

// Verify that unicode escapes are properly decoded

assert!(parsed_content.contains('\u{CAFE}'), "Should contain \\uCAFE decoded as Unicode character");

assert!(parsed_content.contains('\u{BABE}'), "Should contain \\uBABE decoded as Unicode character");

assert!(parsed_content.contains('\u{AB98}'), "Should contain \\uAB98 decoded as Unicode character");

assert!(parsed_content.contains('\u{FCDE}'), "Should contain \\uFCDE decoded as Unicode character");

assert!(parsed_content.contains('\u{bcda}'), "Should contain \\ubcda decoded as Unicode character");

assert!(parsed_content.contains('\u{ef4A}'), "Should contain \\uef4A decoded as Unicode character");

// Verify that other escape sequences are also processed

assert!(parsed_content.contains('/'), "Should contain / from \\/ escape");

assert!(parsed_content.contains('\\'), "Should contain \\ from \\\\ escape");

assert!(parsed_content.contains('"'), "Should contain \" from \\\" escape");

assert!(parsed_content.contains('\u{08}'), "Should contain backspace from \\b escape");

assert!(parsed_content.contains('\u{0C}'), "Should contain form feed from \\f escape");

assert!(parsed_content.contains('\n'), "Should contain newline from \\n escape");

assert!(parsed_content.contains('\r'), "Should contain carriage return from \\r escape");

assert!(parsed_content.contains('\t'), "Should contain tab from \\t escape");

}

Ok(other) => {

panic!("Expected String event, got: {:?}", other);

}

Err(e) => {

panic!(

"SliceParser should handle this sequence, got error: {:?}",

e

);

}

}

}

🧰 Tools

🪛 GitHub Actions: Build and test

[error] 172-172: Test failure in 'test_slice_parser_unicode_trace': Should contain \uCAFE decoded as Unicode character.

🤖 Prompt for AI Agents

In picojson/tests/json_checker_tests.rs around lines 159 to 198, the test string for Unicode escape sequences is double-escaped, causing the parser to interpret them as literal text rather than actual Unicode characters. Fix this by replacing the double backslashes before the Unicode sequences with single backslashes, so the parser correctly decodes the Unicode escapes into their corresponding characters.

coderabbitai · 2025-07-26T07:25:51Z

picojson/tests/json_checker_tests.rs

+        // Debug test for tracing StreamParser unicode escape processing
+        #[test]
+        fn test_stream_parser_unicode_trace() {
+            // Test the problematic sequence from pass1.json line 45
+            let unicode_sequence = r#""\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;
+
+            let reader = ChunkReader::full_slice(unicode_sequence.as_bytes());
+            let mut buffer = [0u8; 1024];
+            let mut parser = StreamParser::<_, DefaultConfig>::new(reader, &mut buffer);
+
+            match parser.next_event() {
+                Ok(Event::String(s)) => {
+                    let parsed_content = s.as_ref();
+                    // Verify that unicode escapes are properly decoded
+                    assert!(parsed_content.contains('\u{CAFE}'), "Should contain \\uCAFE decoded as Unicode character");
+                    assert!(parsed_content.contains('\u{BABE}'), "Should contain \\uBABE decoded as Unicode character");
+                    assert!(parsed_content.contains('\u{AB98}'), "Should contain \\uAB98 decoded as Unicode character");
+                    assert!(parsed_content.contains('\u{FCDE}'), "Should contain \\uFCDE decoded as Unicode character");
+                    assert!(parsed_content.contains('\u{bcda}'), "Should contain \\ubcda decoded as Unicode character");
+                    assert!(parsed_content.contains('\u{ef4A}'), "Should contain \\uef4A decoded as Unicode character");
+                    // Verify that other escape sequences are also processed
+                    assert!(parsed_content.contains('/'), "Should contain / from \\/ escape");
+                    assert!(parsed_content.contains('\\'), "Should contain \\ from \\\\ escape");
+                    assert!(parsed_content.contains('"'), "Should contain \" from \\\" escape");
+                    assert!(parsed_content.contains('\u{08}'), "Should contain backspace from \\b escape");
+                    assert!(parsed_content.contains('\u{0C}'), "Should contain form feed from \\f escape");
+                    assert!(parsed_content.contains('\n'), "Should contain newline from \\n escape");
+                    assert!(parsed_content.contains('\r'), "Should contain carriage return from \\r escape");
+                    assert!(parsed_content.contains('\t'), "Should contain tab from \\t escape");
+                }
+                Ok(other) => {
+                    panic!("Expected String event, got: {:?}", other);
+                }
+                Err(e) => {
+                    panic!(
+                        "StreamParser should handle this sequence, got error: {:?}",
+                        e
+                    );
+                }
+            }
+        }


⚠️ Potential issue

Fix double-escaping in StreamParser Unicode test string

Same issue as the SliceParser test - the Unicode escape sequences are double-escaped, preventing proper decoding.

- let unicode_sequence = r#""\\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#; + let unicode_sequence = r#""\\/\\\"\uCAFE\uBABE\uAB98\uFCDE\ubcda\uef4A\b\f\n\r\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Debug test for tracing StreamParser unicode escape processing

#[test]

fn test_stream_parser_unicode_trace() {

// Test the problematic sequence from pass1.json line 45

let unicode_sequence = r#""\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;

let reader = ChunkReader::full_slice(unicode_sequence.as_bytes());

let mut buffer = [0u8; 1024];

let mut parser = StreamParser::<_, DefaultConfig>::new(reader, &mut buffer);

match parser.next_event() {

Ok(Event::String(s)) => {

let parsed_content = s.as_ref();

// Verify that unicode escapes are properly decoded

assert!(parsed_content.contains('\u{CAFE}'), "Should contain \\uCAFE decoded as Unicode character");

assert!(parsed_content.contains('\u{BABE}'), "Should contain \\uBABE decoded as Unicode character");

assert!(parsed_content.contains('\u{AB98}'), "Should contain \\uAB98 decoded as Unicode character");

assert!(parsed_content.contains('\u{FCDE}'), "Should contain \\uFCDE decoded as Unicode character");

assert!(parsed_content.contains('\u{bcda}'), "Should contain \\ubcda decoded as Unicode character");

assert!(parsed_content.contains('\u{ef4A}'), "Should contain \\uef4A decoded as Unicode character");

// Verify that other escape sequences are also processed

assert!(parsed_content.contains('/'), "Should contain / from \\/ escape");

assert!(parsed_content.contains('\\'), "Should contain \\ from \\\\ escape");

assert!(parsed_content.contains('"'), "Should contain \" from \\\" escape");

assert!(parsed_content.contains('\u{08}'), "Should contain backspace from \\b escape");

assert!(parsed_content.contains('\u{0C}'), "Should contain form feed from \\f escape");

assert!(parsed_content.contains('\n'), "Should contain newline from \\n escape");

assert!(parsed_content.contains('\r'), "Should contain carriage return from \\r escape");

assert!(parsed_content.contains('\t'), "Should contain tab from \\t escape");

}

Ok(other) => {

panic!("Expected String event, got: {:?}", other);

}

Err(e) => {

panic!(

"StreamParser should handle this sequence, got error: {:?}",

e

);

}

}

}

// Debug test for tracing StreamParser unicode escape processing

#[test]

fn test_stream_parser_unicode_trace() {

// Test the problematic sequence from pass1.json line 45

- let unicode_sequence = r#""\\/\\\"\\uCAFE\\uBABE\\uAB98\\uFCDE\\ubcda\\uef4A\\b\\f\\n\\r\\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;

+ let unicode_sequence = r#""\\/\\\"\uCAFE\uBABE\uAB98\uFCDE\ubcda\uef4A\b\f\n\r\t`1~!@#$%^&*()_+-=[]{}|;:',./<>?""#;

let reader = ChunkReader::full_slice(unicode_sequence.as_bytes());

let mut buffer = [0u8; 1024];

let mut parser = StreamParser::<_, DefaultConfig>::new(reader, &mut buffer);

match parser.next_event() {

Ok(Event::String(s)) => {

let parsed_content = s.as_ref();

// Verify that unicode escapes are properly decoded

assert!(parsed_content.contains('\u{CAFE}'), "Should contain \\uCAFE decoded as Unicode character");

assert!(parsed_content.contains('\u{BABE}'), "Should contain \\uBABE decoded as Unicode character");

assert!(parsed_content.contains('\u{AB98}'), "Should contain \\uAB98 decoded as Unicode character");

assert!(parsed_content.contains('\u{FCDE}'), "Should contain \\uFCDE decoded as Unicode character");

assert!(parsed_content.contains('\u{bcda}'), "Should contain \\ubcda decoded as Unicode character");

assert!(parsed_content.contains('\u{ef4A}'), "Should contain \\uef4A decoded as Unicode character");

// Verify that other escape sequences are also processed

assert!(parsed_content.contains('/'), "Should contain / from \\/ escape");

assert!(parsed_content.contains('\\'), "Should contain \\ from \\\\ escape");

assert!(parsed_content.contains('\"'), "Should contain \" from \\\" escape");

assert!(parsed_content.contains('\u{08}'), "Should contain backspace from \\b escape");

assert!(parsed_content.contains('\u{0C}'), "Should contain form feed from \\f escape");

assert!(parsed_content.contains('\n'), "Should contain newline from \\n escape");

assert!(parsed_content.contains('\r'), "Should contain carriage return from \\r escape");

assert!(parsed_content.contains('\t'), "Should contain tab from \\t escape");

}

Ok(other) => {

panic!("Expected String event, got: {:?}", other);

}

Err(e) => {

panic!(

"StreamParser should handle this sequence, got error: {:?}",

e

);

}

}

}

🧰 Tools

🪛 GitHub Actions: Build and test

[error] 276-276: Test failure in 'test_stream_parser_unicode_trace': Should contain \uCAFE decoded as Unicode character.

🤖 Prompt for AI Agents

In picojson/tests/json_checker_tests.rs around lines 262 to 302, the test string for StreamParser unicode escape sequences is double-escaped, causing improper decoding. Fix this by removing the extra escaping so that the Unicode escape sequences are correctly represented as single escapes in the raw string literal, allowing the parser to decode them properly.

Taking shape

5de540e

gemini-code-assist bot reviewed Jul 26, 2025

View reviewed changes

sourcery-ai bot reviewed Jul 26, 2025

View reviewed changes

gemini-code-assist bot reviewed Jul 26, 2025

View reviewed changes

coderabbitai bot reviewed Jul 26, 2025

View reviewed changes

Address some feedback

2344e76

coderabbitai bot reviewed Jul 26, 2025

View reviewed changes

picojson/tests/json_checker_tests.rs Show resolved Hide resolved

More feedback

6375d19

coderabbitai bot reviewed Jul 26, 2025

View reviewed changes

kaidokert closed this Jul 26, 2025

kaidokert deleted the push3 branch August 3, 2025 23:20

Taking shape #67

Taking shape #67

Uh oh!

Conversation

kaidokert commented Jul 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

sourcery-ai bot commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for new PushParser and related types

Class diagram for updated ParseError enum

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

kaidokert commented Jul 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 26, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

sourcery-ai bot commented Jul 26, 2025 •

edited

Loading