Skip to content

Conversation

kaidokert
Copy link
Owner

@kaidokert kaidokert commented Aug 4, 2025

Summary by Sourcery

Reintroduce a push-based JSON parser into picojson by adding PushParser and related infrastructure, updating ParserCore to handle chunked input, and integrating comprehensive tests and an example to validate and demonstrate the new functionality.

New Features:

  • Introduce PushParser for SAX-style incremental JSON parsing with chunked input support
  • Add StreamParser and ChunkReader to support streaming JSON parsing from reader interfaces
  • Expose PushParserHandler trait and PushParseError for handling parser events and errors

Enhancements:

  • Extend ParserCore to differentiate chunked vs non-chunked parsing and adjust end-of-data behavior
  • Implement PushContentExtractor with selective byte accumulation (copy-on-escape) and Unicode escape handling
  • Add shared utilities: ContentPiece.as_bytes and StreamBuffer.truncate_unescaped_by
  • Refactor event_processor to support PushParser lifecycle and flush semantics

Documentation:

  • Add push_parser_demo example illustrating incremental parsing with PushParser

Tests:

  • Add comprehensive conformance tests for PushParser and StreamParser using JSONChecker pass/fail suites
  • Implement extensive unit tests covering string escapes, Unicode escapes, keys, numbers, and error conditions for PushParser
  • Add stress tests for PushParser to validate buffer size and chunking scenarios, as well as document validation
  • Add tests for InputBufferFull and ScratchBufferFull error variants in StreamParser

Summary by CodeRabbit

  • New Features

    • Introduced a push-based, SAX-style JSON parser supporting incremental, chunked input and event-driven handling.
    • Added public API for push parsing and handler traits, enabling efficient streaming JSON parsing.
    • Provided a demonstration example illustrating incremental JSON parsing with event-driven output.
    • Enhanced parser core to distinguish chunked versus non-chunked input handling.
  • Bug Fixes

    • Improved buffer boundary tracking and error handling for chunked input, preventing invalid slice bounds and ensuring robust parsing across buffer splits.
  • Tests

    • Added extensive test suites for the push parser, covering escape handling, buffer overflow scenarios, conformance, stress testing, and edge cases with chunked and large inputs.
    • Included tests verifying correct handling of escape sequences, Unicode escapes, and mixed borrowed/unescaped strings.
    • Added tests reproducing specific buffer boundary issues and validating error signaling on buffer overflow.

Copy link

coderabbitai bot commented Aug 4, 2025

Walkthrough

This update introduces a comprehensive SAX-style incremental JSON push parser to the picojson crate. It adds new modules and public APIs for push parsing, event-driven handlers, and buffer management. Extensive conformance and stress tests are implemented, along with detailed examples and error handling for chunked and escaped input scenarios.

Changes

Cohort / File(s) Change Summary
Push Parser Core Implementation
picojson/src/push_parser.rs, picojson/src/push_content_builder.rs
Introduces a generic, event-driven push parser (PushParser) with a content extractor for chunked input, escape handling, and event forwarding. Defines handler traits, error types, and robust buffer management for incremental parsing.
Public API Integration
picojson/src/lib.rs
Adds and re-exports the new push parser modules and handler traits to the public API, making push-based parsing available to users.
Parser Core Modifications
picojson/src/event_processor.rs
Updates the core parser logic to distinguish chunked (push) and non-chunked parsers, altering end-of-input control flow and adding constructors for both modes.
Buffer and Content Handling Enhancements
picojson/src/shared.rs, picojson/src/stream_buffer.rs
Adds utility methods for accessing buffer content and truncating unescaped buffers, supporting efficient and safe content extraction in streaming contexts.
Example Usage
picojson/examples/push_parser_demo.rs
Provides a runnable example demonstrating incremental parsing with PushParser, showing event handling and chunked input processing.
Push Parser Conformance and Stress Tests
picojson/tests/push_parser.rs, picojson/tests/push_parser_copy_on_escape.rs, picojson/tests/push_parser_escapes.rs, picojson/tests/push_parser_invalidslicebounds_repro.rs, picojson/tests/push_parser_stress_test.rs
Adds extensive tests covering escape handling, buffer boundaries, copy-on-escape, Unicode, chunked input, event correctness, and error cases for the push parser.
Input Buffer Limit Tests
picojson/tests/input_buffer_full_test.rs
Introduces tests to verify correct error signaling when buffer limits are exceeded during parsing of large or complex JSON inputs.
JSON Checker Integration Tests
picojson/tests/json_checker_tests.rs
Extends the conformance test suite to include the new push parser and stream parser, validating correct behavior on canonical passing, failing, and deviation test cases.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PushParser
    participant Handler
    participant ContentExtractor

    User->>PushParser: write(chunk)
    PushParser->>ContentExtractor: set_chunk(chunk)
    loop While events available
        PushParser->>ContentExtractor: extract next event
        ContentExtractor-->>PushParser: Event or Error
        PushParser->>Handler: handle_event(Event)
        Handler-->>PushParser: Ok/Error
    end
    PushParser->>ContentExtractor: reset_input()
    User->>PushParser: finish()
    PushParser->>Handler: handle_event(EndDocument)
    Handler-->>PushParser: Ok
    PushParser-->>User: Handler (finalized)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90 minutes

Possibly related PRs

Poem

In fields of code, a parser hops,
Through streams and chunks it never stops.
Escapes and buffers, all in stride,
With borrowed strings it does not hide.
Events it prints with bunny cheer—
JSON streams are crystal clear!
🥕

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch push_full3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

sourcery-ai bot commented Aug 4, 2025

Reviewer's Guide

This PR reintroduces the PushParser by extending the core parser to handle chunked input via a new PushContentExtractor and flags in ParserCore, updates library exports accordingly, and complements these changes with extensive conformance, escape, stress, and buffer-isolation tests while enhancing shared utilities and stream buffer capabilities.

Sequence diagram for PushParser chunked input processing

sequenceDiagram
    actor User
    participant PushParser
    participant PushContentExtractor
    participant ParserCore
    participant Handler as "PushParserHandler"
    User->>PushParser: write(chunk)
    PushParser->>PushContentExtractor: set_chunk(chunk)
    loop for each event in chunk
        PushParser->>ParserCore: next_event_impl_with_flags(...)
        ParserCore->>PushContentExtractor: handle_byte_accumulation(byte)
        ParserCore-->>PushParser: Event or EndOfData
        alt Event
            PushParser->>Handler: handle_event(event)
        end
    end
    PushParser->>PushContentExtractor: reset_input()
    PushParser->>PushContentExtractor: add_position_offset(chunk.len())
Loading

Class diagram for PushParser and related types

classDiagram
    class PushParser {
        - PushContentExtractor extractor
        - H handler
        - ParserCore core
        + new(handler, buffer)
        + write(data)
        + finish()
    }
    class PushContentExtractor {
        - StreamBuffer stream_buffer
        - State parser_state
        - UnicodeEscapeCollector unicode_escape_collector
        - bool unescaped_reset_queued
        - usize position_offset
        - usize current_position
        - usize token_start_pos
        - bool using_unescaped_buffer
        - &'input [u8] current_chunk
        - usize chunk_cursor
        - bool in_unicode_escape
        - bool in_simple_escape
        + new(buffer)
        + set_chunk(chunk)
        + reset_input()
        + ...
    }
    class PushParserHandler {
        <<trait>>
        + handle_event(event)
    }
    class ParserCore {
        - Tokenizer tokenizer
        - ParserState parser_state
        - bool in_escape_sequence
        - bool handles_chunked_input
        + new()
        + new_chunked()
        + next_event_impl_with_flags(...)
    }
    class PushParseError {
        <<enum>>
        + Parse(ParseError)
        + Handler(E)
    }
    PushParser --> PushContentExtractor : uses
    PushParser --> PushParserHandler : uses
    PushParser --> ParserCore : uses
    PushContentExtractor --> StreamBuffer : uses
    PushContentExtractor --> UnicodeEscapeCollector : uses
    PushContentExtractor --> State : uses
    ParserCore --> Tokenizer : uses
    ParserCore --> ParserState : uses
    PushParserHandler <|.. JsonEventPrinter : implements
    PushParser o-- PushParseError : error handling
Loading

Class diagram for PushContentExtractor escape and buffer logic

classDiagram
    class PushContentExtractor {
        - StreamBuffer stream_buffer
        - UnicodeEscapeCollector unicode_escape_collector
        - bool in_unicode_escape
        - bool in_simple_escape
        - bool using_unescaped_buffer
        + handle_byte_accumulation(byte)
        + begin_escape_sequence()
        + begin_unicode_escape()
        + process_simple_escape_event(...)
        + copy_partial_content_to_scratch()
        + truncate_unescaped_by(count)
    }
    PushContentExtractor --> StreamBuffer : manages
    PushContentExtractor --> UnicodeEscapeCollector : manages
Loading

Class diagram for ParserCore chunked input support

classDiagram
    class ParserCore {
        - bool handles_chunked_input
        + new()
        + new_chunked()
        + next_event_impl_with_flags(...)
    }
    ParserCore : +handles_chunked_input
    ParserCore : +new_chunked() for PushParser
    ParserCore : +new() for non-chunked parsers
Loading

File-Level Changes

Change Details Files
Implement PushParser with chunked input support
  • Added push_parser.rs defining a SAX-style PushParser struct and error type
  • Created push_content_builder.rs with PushContentExtractor handling escapes and scratch buffer
  • Extended ParserCore to track handles_chunked_input and distinct finish behavior
  • Updated lib.rs to export PushParser, PushParseError, PushParserHandler
src/push_parser.rs
src/push_content_builder.rs
src/event_processor.rs
src/lib.rs
Enhance shared utilities and stream buffer for escape and scratch operations
  • Added as_bytes() to shared ContentPiece for unified byte access
  • Implemented truncate_unescaped_by() in StreamBuffer to support unescaped buffer slicing
src/shared.rs
src/stream_buffer.rs
Add comprehensive PushParser conformance and stress tests
  • Extended json_checker_tests.rs with run_push_parser_test and run_stream_parser_test functions and macros
  • Introduced multiple new test modules covering basic parsing, escapes, copy-on-escape, stress scenarios, and repro for InvalidSliceBounds
  • Added input_buffer_full_test.rs to verify buffer-full error variants
tests/json_checker_tests.rs
tests/push_parser.rs
tests/push_parser_stress_test.rs
tests/push_parser_escapes.rs
tests/push_parser_copy_on_escape.rs
tests/push_parser_invalidslicebounds_repro.rs
tests/input_buffer_full_test.rs
Provide PushParser demo example
  • Added examples/push_parser_demo.rs showcasing incremental chunked parsing and event printing
examples/push_parser_demo.rs

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @kaidokert, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the picojson library by re-introducing a PushParser. This new parser provides a flexible and efficient way to process JSON data incrementally, which is crucial for applications dealing with large or streaming JSON inputs. The changes involve core parser modifications, a new content extraction mechanism, and a thorough set of tests to validate its behavior under diverse conditions.

Highlights

  • New Push Parser: I've re-introduced and implemented a new PushParser for SAX-style, event-driven JSON parsing. This parser allows for incremental processing of JSON data, making it suitable for streaming scenarios where the full input is not available at once.
  • Content Extraction and Buffer Management: A new PushContentExtractor has been added to manage the parsing state and handle content extraction, including complex escape sequences and tokens that span across multiple input chunks. It utilizes a scratch buffer for unescaped content and implements a 'copy-on-escape' optimization to minimize memory allocations.
  • Core Parser Enhancements: The underlying ParserCore has been updated to differentiate between chunked and non-chunked input sources, allowing the PushParser to signal EndOfData when a chunk is exhausted, rather than prematurely ending the document. This enables seamless incremental parsing.
  • Comprehensive Testing: Extensive new test suites have been added, including dedicated tests for the PushParser's functionality, escape handling, copy-on-escape optimization, and robust stress tests covering various buffer sizes, chunking patterns, and edge cases to ensure stability and correctness.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kaidokert - I've reviewed your changes - here's some feedback:

  • Consider refactoring the conformance and stress tests into table-driven or parameterized tests instead of generating dozens of near-duplicate functions to reduce boilerplate and ease future maintenance.
  • The handles_chunked_input flag in ParserCore adds branching for chunked vs non-chunked flows; splitting those behaviors into dedicated parser core implementations could simplify the control flow and make each path more self-contained.
  • The PushContentExtractor and escape-handling logic have grown highly intricate—breaking out escape processing, byte accumulation, and content extraction into smaller, well-documented modules would improve readability and maintainability.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider refactoring the conformance and stress tests into table-driven or parameterized tests instead of generating dozens of near-duplicate functions to reduce boilerplate and ease future maintenance.
- The handles_chunked_input flag in ParserCore adds branching for chunked vs non-chunked flows; splitting those behaviors into dedicated parser core implementations could simplify the control flow and make each path more self-contained.
- The PushContentExtractor and escape-handling logic have grown highly intricate—breaking out escape processing, byte accumulation, and content extraction into smaller, well-documented modules would improve readability and maintainability.

## Individual Comments

### Comment 1
<location> `picojson/src/push_parser.rs:192` </location>
<code_context>
+
+// Implement From<ParseError> for common error types used in tests
+// This needs to be globally accessible for integration tests, not just unit tests
+impl From<ParseError> for () {
+    fn from(_: ParseError) -> Self {}
+}
</code_context>

<issue_to_address>
Empty implementation of From<ParseError> for () may hide errors.

This implementation discards error details, making debugging harder. If required, please document the reasoning.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request re-introduces the push parser. The implementation is thorough, and the addition of a comprehensive test suite, including extensive stress tests, is a fantastic contribution that builds confidence in the new functionality.

I've identified a few areas for improvement, primarily related to error handling in the new example, and a bug in one of the new tests. These are small refinements that will help improve the overall quality and correctness of the codebase. Great work!

let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer);

// Test: simple string followed by escaped string
parser.write(br#"{"simple": "esc\\n"}"#).unwrap();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's an issue in the test_buffer_isolation test case. The input JSON string "esc\n" is parsed as esc followed by a newline character. However, the test asserts that the result is esc followed by a literal backslash and n.

The comment // JSON "esc\\n" should become string "esc\n" suggests the intent was to test a literal backslash. To achieve this, the JSON string needs to be "esc\\n", which requires escaping the backslash in the Rust byte string.

Suggested change
parser.write(br#"{"simple": "esc\\n"}"#).unwrap();
parser.write(br#"{\"simple\": \"esc\\\\n\"}"#).unwrap();

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
picojson/tests/input_buffer_full_test.rs (1)

1-2: Add SPDX license header for consistency.

Other test files in the project include the SPDX license header.

+// SPDX-License-Identifier: Apache-2.0
+
 // Test for InputBufferFull error variant
 use picojson::{ParseError, PullParser, StreamParser};
picojson/src/push_parser.rs (2)

107-119: Consider refactoring the special case handling for Numbers.

The nested conditions and special case for Number state handling adds complexity. The comment mentions "stale flag from previous Key processing" which suggests there might be a state management issue.

Consider extracting this logic into a separate method with clearer documentation about why Numbers need special handling:

-                // Special case: For Numbers, check if the scratch buffer is actually empty
-                // This handles the byte-by-byte case where the flag is stale from previous Key processing
-                if matches!(extractor_state, State::Number(_)) {
-                    let buffer_slice = self.extractor.get_unescaped_slice().unwrap_or(&[]);
-                    let buffer_empty = buffer_slice.is_empty();
-
-                    if buffer_empty {
-                        self.extractor.copy_partial_content_to_scratch()?;
-                    }
-                }
+                self.handle_number_boundary_case(&extractor_state)?;

191-194: Remove or document the generic unit conversion.

The From<ParseError> for () implementation seems like a convenience for tests but could be confusing in production code.

Either remove this if it's only for tests, or add clear documentation:

-// Implement From<ParseError> for common error types used in tests
-// This needs to be globally accessible for integration tests, not just unit tests
+#[cfg(test)]
 impl From<ParseError> for () {
     fn from(_: ParseError) -> Self {}
 }
picojson/tests/push_parser.rs (1)

283-324: Consider validating parsed output

While this test correctly validates that problematic lines from pass1.json parse without errors, it would be more robust to also verify the actual parsed content in the TraceHandler events.

picojson/tests/json_checker_tests.rs (1)

51-71: Error handling uses workaround for non-exported type

The implementation is correct, but using InvalidNumber as a placeholder for handler errors (line 61) is a workaround. Consider exposing a more appropriate error variant for handler errors in the public API.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31328b5 and d07229e.

📒 Files selected for processing (14)
  • picojson/examples/push_parser_demo.rs (1 hunks)
  • picojson/src/event_processor.rs (2 hunks)
  • picojson/src/lib.rs (1 hunks)
  • picojson/src/push_content_builder.rs (1 hunks)
  • picojson/src/push_parser.rs (1 hunks)
  • picojson/src/shared.rs (1 hunks)
  • picojson/src/stream_buffer.rs (1 hunks)
  • picojson/tests/input_buffer_full_test.rs (1 hunks)
  • picojson/tests/json_checker_tests.rs (5 hunks)
  • picojson/tests/push_parser.rs (1 hunks)
  • picojson/tests/push_parser_copy_on_escape.rs (1 hunks)
  • picojson/tests/push_parser_escapes.rs (1 hunks)
  • picojson/tests/push_parser_invalidslicebounds_repro.rs (1 hunks)
  • picojson/tests/push_parser_stress_test.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
📚 Learning: in the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and ful...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Applied to files:

  • picojson/src/stream_buffer.rs
  • picojson/src/lib.rs
  • picojson/tests/push_parser_escapes.rs
  • picojson/examples/push_parser_demo.rs
  • picojson/tests/push_parser_copy_on_escape.rs
  • picojson/tests/push_parser.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/tests/push_parser_invalidslicebounds_repro.rs
  • picojson/tests/input_buffer_full_test.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/json_checker_tests.rs
📚 Learning: in the picojson-rs project, the bitstack trait was redesigned to return bool instead of option...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Applied to files:

  • picojson/src/stream_buffer.rs
  • picojson/src/lib.rs
  • picojson/tests/push_parser_escapes.rs
  • picojson/src/shared.rs
  • picojson/examples/push_parser_demo.rs
  • picojson/tests/push_parser_copy_on_escape.rs
  • picojson/tests/push_parser.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/tests/push_parser_invalidslicebounds_repro.rs
  • picojson/tests/json_checker_tests.rs
📚 Learning: the pullparser trait in picojson-rs provides both next() and next_event() methods. the next() method...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Applied to files:

  • picojson/src/lib.rs
  • picojson/tests/push_parser_escapes.rs
  • picojson/examples/push_parser_demo.rs
  • picojson/tests/push_parser_copy_on_escape.rs
  • picojson/tests/push_parser.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/tests/push_parser_invalidslicebounds_repro.rs
  • picojson/tests/input_buffer_full_test.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/json_checker_tests.rs
📚 Learning: in rust crate organization, functions can be imported into lib.rs via `use` statements not for direc...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.

Applied to files:

  • picojson/src/lib.rs
📚 Learning: in json parsing tests for picojson-rs, when testing unicode escape sequences, raw string literals us...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.

Applied to files:

  • picojson/tests/push_parser_escapes.rs
  • picojson/examples/push_parser_demo.rs
  • picojson/tests/push_parser_copy_on_escape.rs
  • picojson/tests/push_parser.rs
  • picojson/src/push_parser.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/tests/push_parser_invalidslicebounds_repro.rs
  • picojson/tests/input_buffer_full_test.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/json_checker_tests.rs
📚 Learning: in the stax json parser codebase, escapesequence event handlers exist in flex_parser.rs not because ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Applied to files:

  • picojson/tests/push_parser_escapes.rs
  • picojson/examples/push_parser_demo.rs
  • picojson/tests/push_parser_copy_on_escape.rs
  • picojson/tests/push_parser.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/tests/push_parser_invalidslicebounds_repro.rs
  • picojson/tests/input_buffer_full_test.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/json_checker_tests.rs
📚 Learning: in picojson-rs event processing, the sliceparser uses an if/else pattern with process_simple_events(...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Applied to files:

  • picojson/tests/push_parser_escapes.rs
  • picojson/examples/push_parser_demo.rs
  • picojson/tests/push_parser_copy_on_escape.rs
  • picojson/tests/push_parser.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/tests/push_parser_invalidslicebounds_repro.rs
  • picojson/tests/input_buffer_full_test.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/json_checker_tests.rs
📚 Learning: in picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Applied to files:

  • picojson/tests/push_parser_escapes.rs
  • picojson/examples/push_parser_demo.rs
  • picojson/tests/push_parser_copy_on_escape.rs
  • picojson/tests/push_parser.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/tests/push_parser_invalidslicebounds_repro.rs
  • picojson/tests/input_buffer_full_test.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/json_checker_tests.rs
📚 Learning: in picojson-rs sliceparser, is_empty() and is_past_end() serve different purposes: is_empty() return...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Applied to files:

  • picojson/tests/push_parser_escapes.rs
  • picojson/examples/push_parser_demo.rs
  • picojson/tests/push_parser_copy_on_escape.rs
  • picojson/tests/push_parser.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/tests/push_parser_invalidslicebounds_repro.rs
  • picojson/tests/input_buffer_full_test.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/json_checker_tests.rs
🧬 Code Graph Analysis (2)
picojson/src/stream_buffer.rs (1)
picojson/src/push_content_builder.rs (1)
  • truncate_unescaped_by (130-132)
picojson/tests/json_checker_tests.rs (5)
picojson/examples/push_parser_demo.rs (4)
  • handle_event (25-65)
  • parser (111-112)
  • parser (119-120)
  • new (12-17)
picojson/tests/push_parser.rs (22)
  • handle_event (12-14)
  • handle_event (36-49)
  • handle_event (72-82)
  • handle_event (111-123)
  • handle_event (157-168)
  • handle_event (202-213)
  • handle_event (247-257)
  • handle_event (291-302)
  • handle_event (334-346)
  • handle_event (431-442)
  • handle_event (467-482)
  • parser (25-25)
  • parser (58-58)
  • parser (93-93)
  • parser (132-132)
  • parser (177-177)
  • parser (222-222)
  • parser (266-266)
  • parser (496-496)
  • parser (552-552)
  • parser (582-582)
  • parser (688-688)
picojson/src/push_parser.rs (1)
  • new (40-46)
picojson/src/stream_content_builder.rs (1)
  • new (30-39)
picojson/src/chunk_reader.rs (1)
  • full_slice (113-119)
🔇 Additional comments (47)
picojson/src/stream_buffer.rs (1)

258-261: LGTM! Safe buffer truncation implementation.

The method correctly uses saturating_sub to prevent underflow and follows the defensive programming patterns established in this codebase. The implementation is simple, safe, and serves the push parser's content extraction needs effectively.

picojson/src/shared.rs (1)

248-254: LGTM! Clean abstraction for uniform byte access.

The method provides a clean abstraction for accessing underlying bytes regardless of whether content comes from input or scratch buffers. The lifetime annotation is correct and the implementation follows the established pattern of the ContentPiece enum.

picojson/src/lib.rs (1)

101-105: LGTM! Well-structured API extension.

The push parser API additions are properly integrated with the existing module structure and follow established patterns. The exports provide a clean interface for the new SAX-style parsing functionality.

picojson/examples/push_parser_demo.rs (1)

1-131: Excellent demonstration of push parser capabilities!

This example effectively showcases the push parser's SAX-style event handling with clean, educational code. The incremental chunk feeding clearly demonstrates streaming capabilities, error handling follows best practices, and the emoji-enhanced output makes it engaging and easy to follow.

picojson/tests/push_parser_escapes.rs (1)

1-178: Comprehensive escape sequence test coverage!

These tests provide excellent coverage of escape sequence handling, properly distinguishing between literal backslashes and actual escape sequences. The test cases cover all major escape types (basic, quotes, Unicode) and test both keys and values. The EventCollector pattern makes assertions clean and maintainable.

picojson/src/event_processor.rs (3)

25-28: LGTM! Clear documentation for the new field.

The documentation clearly explains the purpose of handles_chunked_input and its effect on parser behavior.


31-49: Well-structured constructors with clear purpose.

The two constructors clearly differentiate between chunked and non-chunked parser initialization. Good use of descriptive names.


104-120: Correct handling of chunked vs non-chunked parser behavior.

The conditional logic properly differentiates between parser types at end of input. For chunked parsers, returning EndOfData allows proper chunk boundary handling, while non-chunked parsers correctly finalize the document.

picojson/tests/push_parser_copy_on_escape.rs (3)

8-52: Well-structured test for borrowed string optimization.

The test correctly verifies that strings without escape sequences are returned as borrowed references, which is an important performance optimization.


158-165: Correct validation of JSON escape sequence decoding.

The test and comment correctly explain that JSON "esc\\n" (double backslash) should decode to the string "esc\n" (literal backslash + n), not a newline character. This is the expected JSON behavior.


84-84: Escape sequences are correct and consistent—LGTM.

The use of \\n and \\t in the raw byte string literal matches other parser tests (e.g. push_parser.rs), confirming that JSON escape sequences are formatted properly across the codebase. Approved.

picojson/tests/input_buffer_full_test.rs (1)

40-86: Good test coverage for buffer overflow scenarios.

The test correctly validates that the parser reports appropriate buffer overflow errors when processing large JSON documents with insufficient buffer space. The error handling for both InputBufferFull and ScratchBufferFull is appropriate.

picojson/tests/push_parser_invalidslicebounds_repro.rs (2)

8-66: Excellent test handler design for event validation.

The ReproHandler provides thorough validation by comparing events against expected sequences and ensuring all events are received. The event_to_string method provides clear event representation for debugging.


125-131: Good chunked parsing test strategy.

Writing in small 8-byte chunks effectively stress-tests the parser's buffer boundary handling, which is crucial for validating the fix for InvalidSliceBounds errors.

picojson/src/push_parser.rs (2)

23-33: Well-designed push parser structure.

The generic parameters are clearly documented, and the structure properly separates concerns between content extraction, event handling, and core parsing logic.


69-71: Correct use of always_accumulate_during_escapes flag.

Setting this flag to true ensures that all hex digits in Unicode escape sequences are properly accumulated, which is essential for correct escape handling in chunked parsing.

picojson/tests/push_parser.rs (13)

17-26: LGTM!

Clean test that validates basic PushParser compilation and usage without lifetime issues.


28-62: LGTM!

Well-structured test validating HRTB pattern integration with tokenizer. The event count assertion correctly expects 2 events (String + EndDocument).


64-101: LGTM!

Correctly tests string extraction with proper lifetime handling using to_owned().


103-147: LGTM!

Comprehensive test for object parsing that correctly validates the complete event sequence including keys and values.


149-192: LGTM!

Correctly tests simple escape sequence processing, verifying that \n in JSON is converted to an actual newline character.


194-237: LGTM!

Properly tests Unicode escape processing, correctly expecting \u0041 to be converted to 'A'.


239-281: LGTM!

Excellent test for consecutive Unicode escapes. The conversion of \uCAFE\uBABE to Korean characters is correct and tests an important edge case.


326-402: LGTM!

Good comprehensive test with proper feature-gated sections for float support. The larger buffer size (2048) is appropriate for the complex JSON structure.


404-457: LGTM!

Excellent test comparing SliceParser and PushParser behavior for empty keys, ensuring consistent handling across parser types.


459-526: LGTM!

Comprehensive number parsing test with proper feature gating for float support. The expected event sequences correctly match the conditional JSON input.


623-645: Remove redundant imports

The imports on line 625 are redundant.

 #[test]
 fn test_invalid_unicode_escape_in_key() {
-    use picojson::{DefaultConfig, PushParser, PushParserHandler};
-
     struct Handler;
⛔ Skipped due to learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

698-723: Remove redundant imports

The imports on line 700 are redundant.

 #[test]
 fn test_invalid_escape_sequences_in_keys() {
-    use picojson::{DefaultConfig, PushParser, PushParserHandler};
-
     struct Handler;
⛔ Skipped due to learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

647-696: Remove redundant imports

The imports on line 649 are redundant except for String enum which is not imported at module level.

 #[test]
 fn test_mixed_borrowed_and_unescaped_strings() {
-    use picojson::{DefaultConfig, Event, PushParser, PushParserHandler, String};
+    use picojson::String;
 
     struct Handler {
         events: Vec<std::string::String>,
     }
⛔ Skipped due to learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
picojson/tests/json_checker_tests.rs (4)

17-20: LGTM!

Clean imports and minimal handler implementation appropriate for conformance testing.

Also applies to: 39-49


73-89: LGTM!

Clean StreamParser test implementation using ChunkReader::full_slice for in-memory testing.


144-226: LGTM!

Comprehensive conformance tests for PushParser and StreamParser that correctly mirror the existing SliceParser tests.


265-290: LGTM!

Fail tests and known deviations correctly extend coverage to PushParser, maintaining consistency with SliceParser behavior.

Also applies to: 316-335

picojson/tests/push_parser_stress_test.rs (6)

13-114: LGTM!

Well-designed event handling infrastructure with OwnedEvent for lifetime-safe comparisons and immediate validation in StressTestHandler.


116-161: LGTM!

ChunkedWriter provides excellent control over data delivery patterns for comprehensive stress testing.


172-307: LGTM!

Excellent test scenario coverage including edge cases like empty strings, Unicode escapes, deeply nested structures, and buffer boundary conditions.


337-405: LGTM!

Sophisticated buffer size calculation that correctly accounts for copy-on-escape optimization and chunk boundary effects.


407-533: LGTM!

Comprehensive stress tests with good coverage of buffer sizes and chunk patterns. The pragmatic handling of copy-on-escape optimization allowing smaller buffers than expected is appropriate.


535-647: LGTM!

Excellent edge case coverage for Unicode handling and document validation. The invalid JSON scenarios appropriately test error detection.

picojson/src/push_content_builder.rs (8)

11-21: LGTM!

Well-designed trait for SAX-style event handling with proper lifetime separation between input data and scratch buffer.


23-68: LGTM!

Comprehensive state tracking structure with proper initialization. The separation of concerns between position tracking, buffer management, and escape processing is clear.


70-138: LGTM!

Clean utility methods with clear single responsibilities for state management.


144-251: LGTM!

Sophisticated escape processing with proper state management. The selective byte accumulation logic correctly handles transitions between normal content and escape sequences.


259-344: LGTM!

Correct ContentExtractor implementation handling both borrowed and unescaped content paths. The position adjustments for PushParser are properly documented.


372-411: LGTM!

Excellent copy-on-escape optimization implementation with proper bounds checking and error handling.


413-452: LGTM!

Correct handling of tokenizer's Unicode escape processing behavior. The critical comment at line 420 properly documents why the first hex digit needs special handling.


455-536: LGTM!

Robust DataSource implementation with correct position translation and chunk boundary handling. The state-based content copying logic properly handles all token types.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
picojson/src/push_parser.rs (1)

143-145: Use the existing From implementation for error construction.

The error is constructed with a fully qualified path when it could use the From implementation.

-            return Err(crate::push_parser::PushParseError::Parse(
-                ParseError::EndOfData,
-            ));
+            return Err(ParseError::EndOfData.into());
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d07229e and 7cd6d6d.

📒 Files selected for processing (9)
  • picojson/examples/push_parser_demo.rs (1 hunks)
  • picojson/src/lib.rs (1 hunks)
  • picojson/src/push_parser.rs (1 hunks)
  • picojson/tests/json_checker_tests.rs (5 hunks)
  • picojson/tests/push_parser.rs (1 hunks)
  • picojson/tests/push_parser_copy_on_escape.rs (1 hunks)
  • picojson/tests/push_parser_escapes.rs (1 hunks)
  • picojson/tests/push_parser_invalidslicebounds_repro.rs (1 hunks)
  • picojson/tests/push_parser_stress_test.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • picojson/src/lib.rs
  • picojson/examples/push_parser_demo.rs
  • picojson/tests/push_parser_copy_on_escape.rs
  • picojson/tests/push_parser_invalidslicebounds_repro.rs
  • picojson/tests/push_parser.rs
  • picojson/tests/push_parser_escapes.rs
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
📚 Learning: in picojson-rs event processing, the sliceparser uses an if/else pattern with process_simple_events(...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Applied to files:

  • picojson/tests/json_checker_tests.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: the pullparser trait in picojson-rs provides both next() and next_event() methods. the next() method...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Applied to files:

  • picojson/tests/json_checker_tests.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: in the picojson-rs project, the bitstack trait was redesigned to return bool instead of option...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Applied to files:

  • picojson/tests/json_checker_tests.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: in the stax json parser codebase, escapesequence event handlers exist in flex_parser.rs not because ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Applied to files:

  • picojson/tests/json_checker_tests.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: in json parsing tests for picojson-rs, when testing unicode escape sequences, raw string literals us...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.

Applied to files:

  • picojson/tests/json_checker_tests.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: in the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and ful...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Applied to files:

  • picojson/tests/json_checker_tests.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: in picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Applied to files:

  • picojson/tests/json_checker_tests.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: in picojson-rs sliceparser, is_empty() and is_past_end() serve different purposes: is_empty() return...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Applied to files:

  • picojson/tests/json_checker_tests.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run AVR Test Suites
🔇 Additional comments (11)
picojson/src/push_parser.rs (3)

23-46: Well-structured push parser implementation.

The generic parameters and lifetime annotations are appropriately designed. The use of ParserCore::new_chunked() correctly distinguishes this parser from non-chunked variants.


157-188: Clean error handling design.

The error enum properly distinguishes between parser and handler errors, with appropriate From implementations for error propagation.


109-118: Ignore the Number‐state workaround suggestion; the unescaped‐content flag is already managed correctly.

The extractor’s has_unescaped_content() flag is cleared via apply_unescaped_reset_if_queued() at the start of each chunk and reset to false when the scratch buffer is cleared in PushContentBuilder. Since numeric tokens don’t generate escapes, the special‐case in push_parser.rs simply detects an empty scratch buffer for chunked numbers and is necessary to handle partial numeric content spanning buffers. No additional flag resets are needed here.

Likely an incorrect or invalid review comment.

picojson/tests/json_checker_tests.rs (3)

39-49: Clean test handler implementation.

The ConformanceTestHandler appropriately focuses on event counting for conformance validation.


51-85: Well-structured test runner functions.

Both run_push_parser_test and run_stream_parser_test follow consistent patterns and properly handle error cases.


140-331: Comprehensive conformance test coverage.

The additions properly extend the test suite to cover PushParser and StreamParser, maintaining consistency with the existing SliceParser tests. The macro usage for fail tests is efficient and maintainable.

picojson/tests/push_parser_stress_test.rs (5)

13-114: Robust test infrastructure design.

The owned event representation and handler implementations provide a solid foundation for stress testing. The immediate validation in StressTestHandler helps pinpoint failures quickly.


116-161: Well-implemented chunked data delivery.

The ChunkedWriter provides excellent control over input chunking patterns for stress testing parser robustness.


172-307: Comprehensive test scenario coverage.

The scenarios effectively cover edge cases including empty strings, long values, deep nesting, Unicode escapes, and complex structures. The documented minimum buffer sizes provide valuable insights into parser requirements.


339-408: Intelligent buffer size calculation logic.

The functions correctly account for copy-on-escape optimization and special requirements for escape processing scenarios. The differentiation based on chunk patterns is appropriate.


410-650: Excellent stress test coverage.

The stress tests systematically validate parser behavior across:

  • Various buffer sizes (1-50 bytes)
  • Multiple chunk patterns (byte-by-byte to large chunks)
  • Unicode edge cases including chunk boundary splits
  • Invalid document scenarios

The tests appropriately handle cases where copy-on-escape optimization performs better than expected (line 524).

@kaidokert kaidokert closed this Aug 4, 2025
@kaidokert kaidokert deleted the push_full3 branch August 4, 2025 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant