Skip to content

Conversation

kaidokert
Copy link
Owner

@kaidokert kaidokert commented Jul 30, 2025

Summary by Sourcery

Introduce a DataSource abstraction to unify borrowed and unescaped JSON content extraction, refactor slice and stream content builders to leverage the new interface, and extend CopyOnEscape to support scratch buffer access.

Enhancements:

  • Add DataSource trait, ContentPiece enum, and get_content_piece helper to abstract content source handling
  • Refactor SliceContentBuilder to use DataSource for string, key, and number extraction
  • Refactor StreamContentBuilder to use DataSource for string and key extraction and remove custom helpers
  • Implement DataSource for SliceContentBuilder and StreamContentBuilder to unify access to input and scratch buffers
  • Extend CopyOnEscape with methods to expose unescaped content availability and scratch buffer slicing

Summary by CodeRabbit

  • New Features

    • Introduced a trait-based abstraction for content extraction, enabling more flexible and robust handling of JSON input, including chunked and incremental parsing.
    • Added a new PushContentExtractor component to support efficient event-driven, chunked JSON parsing.
    • Exposed new methods for accessing and managing unescaped content and Unicode escapes.
  • Refactor

    • Centralized and unified content retrieval logic across parsers using shared helper utilities and a new DataSource trait.
    • Simplified buffer and escape sequence management by delegating responsibilities to dedicated components.
    • Streamlined parser finalization and handler retrieval, reducing boilerplate in both library code and tests.
    • Removed manual state and buffer management from the push parser, delegating to the new content extractor.
    • Updated escape processing to use trait-based interfaces for improved modularity.
  • Bug Fixes

    • Improved handling of edge cases in content range calculations to prevent invalid slice bounds.
  • Tests

    • Updated test suites to use the new parser finalization approach and improved test data consistency.
  • Documentation

    • Enhanced code documentation and clarified public API changes for new and refactored components.

Copy link

sourcery-ai bot commented Jul 30, 2025

Reviewer's Guide

This PR introduces a generic content data source abstraction to unify raw and unescaped JSON content extraction across different parser implementations, refactors existing builders to leverage this trait, and exposes scratch buffer access in CopyOnEscape.

Class diagram for new DataSource trait and related types

classDiagram
    class DataSource {
        <<trait>>
        +get_borrowed_slice(start: usize, end: usize) Result<&[u8], ParseError>
        +get_unescaped_slice() Result<&[u8], ParseError>
        +has_unescaped_content() bool
    }
    class ContentPiece {
        <<enum>>
        +Input(&[u8])
        +Scratch(&[u8])
        +to_string() Result<String, ParseError>
    }
    class SliceContentBuilder {
        +get_borrowed_slice(start, end)
        +get_unescaped_slice()
        +has_unescaped_content()
    }
    class StreamContentBuilder {
        +get_borrowed_slice(start, end)
        +get_unescaped_slice()
        +has_unescaped_content()
    }
    DataSource <|.. SliceContentBuilder
    DataSource <|.. StreamContentBuilder
    class CopyOnEscape {
        +has_unescaped_content() bool
        +get_scratch_buffer_slice(start, end) Result<&[u8], ParseError>
        +get_scratch_range() (usize, usize)
    }
    SliceContentBuilder o-- CopyOnEscape
    class StreamBuffer {
        +get_string_slice(start, end) Result<&[u8], ParseError>
        +get_unescaped_slice() Result<&[u8], ParseError>
        +has_unescaped_content() bool
    }
    StreamContentBuilder o-- StreamBuffer
Loading

Class diagram for get_content_piece helper and ContentExtractor refactor

classDiagram
    class get_content_piece {
        +get_content_piece(source, start_pos, current_pos) Result<ContentPiece, ParseError>
    }
    class ContentExtractor {
        <<trait>>
        +extract_string_content(start_pos)
        +extract_key_content(start_pos)
        +extract_number(start_pos, from_container_end, finished)
    }
    class Event {
        +String(String)
        +Key(String)
        +Number(JsonNumber)
    }
    ContentExtractor <|.. SliceContentBuilder
    ContentExtractor <|.. StreamContentBuilder
    get_content_piece ..> DataSource : uses
    SliceContentBuilder ..> get_content_piece : uses
    StreamContentBuilder ..> get_content_piece : uses
    Event <.. ContentPiece : produces
Loading

File-Level Changes

Change Details Files
Add DataSource trait and ContentPiece enum with helper in shared module
  • Define DataSource trait with methods for borrowed and unescaped content
  • Introduce ContentPiece enum and to_string conversion
  • Implement get_content_piece helper to select between borrowed or scratch slices
picojson/src/shared.rs
Refactor SliceContentBuilder to use DataSource abstraction
  • Update extract_string_content, extract_key_content, extract_number to call get_content_piece
  • Remove manual escape completion logic and bespoke number slicing
  • Implement DataSource trait for SliceContentBuilder
picojson/src/slice_content_builder.rs
Refactor StreamContentBuilder to leverage DataSource trait
  • Drop custom create_unescaped_string/create_borrowed_string helpers
  • Use get_content_piece in extract_string_content and extract_key_content
  • Add DataSource implementation and import adjustments
picojson/src/stream_content_builder.rs
Expose scratch buffer access in CopyOnEscape for DataSource support
  • Add has_unescaped_content, get_scratch_buffer_slice, and get_scratch_range methods
picojson/src/copy_on_escape.rs

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

coderabbitai bot commented Jul 30, 2025

Walkthrough

This change introduces a significant refactor to the picojson parser, especially its push (incremental) parsing infrastructure. It adds a new PushContentExtractor abstraction, a DataSource trait for unified content access, and centralizes escape and content extraction logic. Many parser state and buffer management responsibilities are moved out of PushParser into this new component. Associated tests and examples are updated to use the new APIs and finalization patterns.

Changes

Cohort / File(s) Change Summary
Push Parser Refactor & Content Extraction
picojson/src/push_parser.rs, picojson/src/push_content_builder.rs, picojson/src/lib.rs, picojson/src/stream_content_builder.rs, picojson/src/slice_content_builder.rs, picojson/src/shared.rs, picojson/src/event_processor.rs, picojson/src/escape_processor.rs, picojson/src/copy_on_escape.rs
Major refactor of push parser: introduces PushContentExtractor, moves content extraction and escape logic into this new component, implements DataSource for unified content access, modifies PushParser to delegate to extractor, updates event processor for chunked input, and exposes new/updated public methods and traits for buffer and escape management.
Tests & Examples Update
picojson/tests/push_parser.rs, picojson/tests/push_parser_escapes.rs, picojson/tests/push_parser_copy_on_escape.rs, picojson/tests/push_parser_invalidslicebounds_repro.rs, picojson/tests/push_parser_stress_test.rs, picojson/tests/json_checker_tests.rs, picojson/examples/push_parser_demo.rs
Updates tests and examples to use new parser finalization pattern (finish() returns handler), removes explicit destroy() calls, adapts to new chunked parsing and handler extraction, and adjusts test data and assertions where needed.
CopyOnEscape Enhancements
picojson/src/copy_on_escape.rs
Added three new public methods to CopyOnEscape to expose internal scratch buffer state and usage: checking if unescaped content exists, accessing a slice of the scratch buffer safely, and retrieving the current scratch buffer range.
Shared Content Abstraction
picojson/src/shared.rs
Introduced DataSource trait for abstracting JSON content retrieval, added ContentPiece enum for representing content from input or scratch buffers, updated content range method to handle invalid ranges safely, and added a generic helper function to unify content extraction logic.
SliceContentBuilder Updates
picojson/src/slice_content_builder.rs
Revised content extraction methods to use the DataSource trait and shared helper function, improved lifetime and state handling for unescaped content, simplified number extraction, and implemented DataSource trait for SliceContentBuilder.
StreamContentBuilder Updates
picojson/src/stream_content_builder.rs
Removed private string construction helpers, replaced content extraction with calls to shared helper and DataSource trait methods, updated escape processing to use trait-based approach, and implemented DataSource trait for StreamContentBuilder.
Escape Processor Refactor
picojson/src/escape_processor.rs
Introduced a new type alias for Unicode escape results, added getter and setter for pending high surrogate in UnicodeEscapeCollector, refactored process_unicode_escape_sequence to use a temporary collector and DataSource trait for hex digit access, and updated return types accordingly.
Event Processor Enhancements
picojson/src/event_processor.rs
Added support for chunked input parsing via a new flag, introduced a new constructor for chunked parsers, modified core parsing loop to accumulate bytes during escape sequences when needed, and changed end-of-input handling to differentiate chunked from non-chunked parsers.
SliceInputBuffer Cleanup
picojson/src/slice_input_buffer.rs
Removed the data_len method from SliceInputBuffer.
Library Module Adjustments
picojson/src/lib.rs
Added push_content_builder as a public module and changed the re-export of PushParserHandler to come from this new module instead of push_parser.
Test Simplifications
picojson/tests/push_parser.rs, picojson/tests/push_parser_escapes.rs, picojson/tests/push_parser_copy_on_escape.rs, picojson/tests/push_parser_invalidslicebounds_repro.rs, picojson/tests/json_checker_tests.rs
Simplified parser finalization by replacing calls to parser.destroy() with a single call to parser.finish() returning the handler, streamlining test code without changing logic or assertions.
PushParser Stress Test Updates
picojson/tests/push_parser_stress_test.rs
Renamed and changed signature of ChunkedWriter method to take ownership of parser and return handler, updated test JSON literals and expected values for Unicode escapes, adjusted buffer size parameters, and adapted test calls to new parser API.
Example Formatting Fixes
picojson/examples/push_parser_demo.rs
Adjusted indentation of lines in the main function for better code style and consolidated parser finishing and handler retrieval into a single statement.

Sequence Diagram(s)

PushParser Event Flow with PushContentExtractor

sequenceDiagram
    participant User
    participant PushParser
    participant PushContentExtractor
    participant ParserCore
    participant Handler

    User->>PushParser: write(data)
    PushParser->>PushContentExtractor: provide chunk
    PushContentExtractor->>ParserCore: next_event_impl_with_flags
    ParserCore->>PushContentExtractor: request byte, accumulate, handle escapes
    PushContentExtractor->>ParserCore: provide bytes, manage escape state
    ParserCore->>PushContentExtractor: emit Event
    PushContentExtractor->>Handler: handle_event(Event)
    User->>PushParser: finish()
    PushParser->>PushContentExtractor: finalize, check completion
    PushParser->>Handler: emit EndDocument
    PushParser-->>User: return Handler
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

In fields of bytes, a parser hops,
Refactored now, it never stops!
With chunked escapes and buffers neat,
The PushContentExtractor can’t be beat.
Rabbits dance—no more destroy,
Just finish, hop, and parse with joy!
🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch datasource_intro

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @kaidokert, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a fundamental abstraction for content extraction within the picojson library by defining a DataSource trait. This trait standardizes how different JSON parser implementations access both raw input data and processed, unescaped content. The change centralizes content retrieval logic, leading to more consistent, maintainable, and less duplicated code across the parsing components.

Highlights

  • New Abstraction: DataSource Trait: Introduced a new DataSource trait in picojson/src/shared.rs to provide a unified interface for accessing JSON content. This trait abstracts whether content is borrowed directly from the input or retrieved from a temporary scratch buffer after unescaping.
  • Unified Content Extraction Logic: Added a generic helper function get_content_piece in picojson/src/shared.rs that leverages the DataSource trait. This function centralizes the logic for determining and retrieving the correct content (borrowed or unescaped) for strings and keys, reducing duplication across different parser implementations.
  • Adaptation of Content Builders: The SliceContentBuilder (picojson/src/slice_content_builder.rs) and StreamContentBuilder (picojson/src/stream_content_builder.rs) have been updated to implement the new DataSource trait. Their extract_string_content, extract_key_content, and extract_number methods now utilize this new abstraction, streamlining content retrieval.
  • Enhanced CopyOnEscape: The CopyOnEscape struct in picojson/src/copy_on_escape.rs gained new public methods (has_unescaped_content, get_scratch_buffer_slice, get_scratch_range) to expose its internal state and scratch buffer. This enables CopyOnEscape to be seamlessly integrated with the new DataSource trait for slice-based parsing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kaidokert - I've reviewed your changes - here's some feedback:

  • Rename ContentPiece::to_string to something like into_json_string or into_string to avoid shadowing the standard to_string and make it clear it returns the custom String enum.
  • Consider simplifying the DataSource trait signatures by removing the explicit &'input self/&'scratch self receivers and relying on the return lifetimes, which would reduce complexity and improve readability.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Rename ContentPiece::to_string to something like `into_json_string` or `into_string` to avoid shadowing the standard `to_string` and make it clear it returns the custom String enum.
- Consider simplifying the DataSource trait signatures by removing the explicit `&'input self`/`&'scratch self` receivers and relying on the return lifetimes, which would reduce complexity and improve readability.

## Individual Comments

### Comment 1
<location> `picojson/src/slice_content_builder.rs:74` </location>
<code_context>
+    fn extract_string_content(&mut self, start_pos: usize) -> Result<Event<'_, '_>, ParseError> {
+        // SliceParser-specific: Complete CopyOnEscape processing for unescaped content
+        let current_pos = self.current_position();
+        if self.has_unescaped_content() {
+            let end_pos = ContentRange::end_position_excluding_delimiter(current_pos);
+            self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing
</code_context>

<issue_to_address>
Redundant CopyOnEscape processing if get_content_piece also checks for unescaped content.

Consolidate CopyOnEscape handling to a single location to prevent redundant processing and potential state inconsistencies.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
        // SliceParser-specific: Complete CopyOnEscape processing for unescaped content
        let current_pos = self.current_position();
        if self.has_unescaped_content() {
            let end_pos = ContentRange::end_position_excluding_delimiter(current_pos);
            self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing
        }

        // Use the unified helper function to get the content
        let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?;
        Ok(Event::String(content_piece.to_string()?))
=======
        let current_pos = self.current_position();

        // Use the unified helper function to get the content (including CopyOnEscape handling)
        let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?;
        Ok(Event::String(content_piece.to_string()?))
>>>>>>> REPLACE

</suggested_fix>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 72 to 81
// SliceParser-specific: Complete CopyOnEscape processing for unescaped content
let current_pos = self.current_position();
if self.has_unescaped_content() {
let end_pos = ContentRange::end_position_excluding_delimiter(current_pos);
self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing
}

// Use the unified helper function to get the content
let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?;
Ok(Event::String(content_piece.to_string()?))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Redundant CopyOnEscape processing if get_content_piece also checks for unescaped content.

Consolidate CopyOnEscape handling to a single location to prevent redundant processing and potential state inconsistencies.

Suggested change
// SliceParser-specific: Complete CopyOnEscape processing for unescaped content
let current_pos = self.current_position();
if self.has_unescaped_content() {
let end_pos = ContentRange::end_position_excluding_delimiter(current_pos);
self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing
}
// Use the unified helper function to get the content
let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?;
Ok(Event::String(content_piece.to_string()?))
let current_pos = self.current_position();
// Use the unified helper function to get the content (including CopyOnEscape handling)
let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?;
Ok(Event::String(content_piece.to_string()?))

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a DataSource trait to abstract away content extraction from either the input or a scratch buffer. This is a great refactoring that helps unify the logic between SliceContentBuilder and StreamContentBuilder.

My review includes a couple of suggestions:

  • In picojson/src/shared.rs, the ContentPiece::to_string method can be made more concise and idiomatic.
  • In picojson/src/slice_content_builder.rs, the refactored extract_string_content and extract_key_content methods have become less efficient. I've suggested reverting them to a simpler form that avoids redundant work, as the CopyOnEscape helper already provides the necessary logic.

Overall, the abstraction is a good addition, and with these changes, the implementation will be even better.

Comment on lines 71 to 82
fn extract_string_content(&mut self, start_pos: usize) -> Result<Event<'_, '_>, ParseError> {
// SliceParser-specific: Complete CopyOnEscape processing for unescaped content
let current_pos = self.current_position();
if self.has_unescaped_content() {
let end_pos = ContentRange::end_position_excluding_delimiter(current_pos);
self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing
}

// Use the unified helper function to get the content
let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?;
Ok(Event::String(content_piece.to_string()?))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This new implementation is less efficient than the previous version. When has_unescaped_content() is true, it calls self.copy_on_escape.end_string(end_pos)?, which creates a String that is immediately discarded. Then, get_content_piece is called to get the content again and create another String.

The copy_on_escape.end_string() method already encapsulates the logic of choosing between a borrowed or unescaped string, which is what get_content_piece aims to abstract. For SliceContentBuilder, it's more direct and efficient to just use the result of end_string().

I suggest reverting to a simpler implementation similar to the one before this change.

    fn extract_string_content(&mut self, _start_pos: usize) -> Result<Event<'_, '_>, ParseError> {
        let end_pos = ContentRange::end_position_excluding_delimiter(self.current_position());
        let value_result = self.copy_on_escape.end_string(end_pos)?;
        Ok(Event::String(value_result))
    }

Comment on lines 84 to 95
fn extract_key_content(&mut self, start_pos: usize) -> Result<Event<'_, '_>, ParseError> {
// SliceParser-specific: Complete CopyOnEscape processing for unescaped content
let current_pos = self.current_position();
if self.has_unescaped_content() {
let end_pos = ContentRange::end_position_excluding_delimiter(current_pos);
self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing
}

// Use the unified helper function to get the content
let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?;
Ok(Event::Key(content_piece.to_string()?))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to extract_string_content, this implementation is inefficient due to the redundant creation of a String instance. The call to self.copy_on_escape.end_string(end_pos)? creates a String that is discarded, and then get_content_piece re-processes to create it again.

It would be more efficient to use the result of end_string() directly, as it already handles both borrowed and unescaped cases correctly.

    fn extract_key_content(&mut self, _start_pos: usize) -> Result<Event<'_, '_>, ParseError> {
        let end_pos = ContentRange::end_position_excluding_delimiter(self.current_position());
        let key_result = self.copy_on_escape.end_string(end_pos)?;
        Ok(Event::Key(key_result))
    }

Comment on lines 224 to 235
pub fn to_string(self) -> Result<String<'input, 'scratch>, ParseError> {
match self {
ContentPiece::Input(bytes) => {
let content_str = from_utf8(bytes)?;
Ok(String::Borrowed(content_str))
}
ContentPiece::Scratch(bytes) => {
let content_str = from_utf8(bytes)?;
Ok(String::Unescaped(content_str))
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This method can be made more concise. The logic inside both match arms is very similar. You can use .map() to transform the Result from from_utf8 directly into the desired String variant. This will make the code more idiomatic and reduce duplication.

    pub fn to_string(self) -> Result<String<'input, 'scratch>, ParseError> {
        match self {
            ContentPiece::Input(bytes) => from_utf8(bytes).map(String::Borrowed),
            ContentPiece::Scratch(bytes) => from_utf8(bytes).map(String::Unescaped),
        }
    }

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
picojson/src/shared.rs (1)

231-242: Simplify the to_string method using map().

The logic in both match arms is identical and can be consolidated.

-    pub fn to_string(self) -> Result<String<'input, 'scratch>, ParseError> {
-        match self {
-            ContentPiece::Input(bytes) => {
-                let content_str = from_utf8(bytes)?;
-                Ok(String::Borrowed(content_str))
-            }
-            ContentPiece::Scratch(bytes) => {
-                let content_str = from_utf8(bytes)?;
-                Ok(String::Unescaped(content_str))
-            }
-        }
-    }
+    pub fn to_string(self) -> Result<String<'input, 'scratch>, ParseError> {
+        match self {
+            ContentPiece::Input(bytes) => from_utf8(bytes).map(String::Borrowed),
+            ContentPiece::Scratch(bytes) => from_utf8(bytes).map(String::Unescaped),
+        }
+    }
🧹 Nitpick comments (1)
picojson/src/slice_content_builder.rs (1)

71-82: Consider documenting why CopyOnEscape finalization is necessary here.

While this appears redundant (as noted in past reviews), the end_string() call is necessary to finalize the CopyOnEscape state before extracting content. Consider adding a comment explaining this two-step process to prevent future confusion.

     fn extract_string_content(&mut self, start_pos: usize) -> Result<Event<'_, '_>, ParseError> {
         // SliceParser-specific: Complete CopyOnEscape processing for unescaped content
         let current_pos = self.current_position();
         if self.has_unescaped_content() {
             let end_pos = ContentRange::end_position_excluding_delimiter(current_pos);
-            self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing
+            // Note: end_string() must be called to finalize escape processing state,
+            // even though we discard its return value and re-extract via get_content_piece
+            self.copy_on_escape.end_string(end_pos)?;
         }

         // Use the unified helper function to get the content
         let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?;
         Ok(Event::String(content_piece.to_string()?))
     }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e602cb5 and 6293ac3.

📒 Files selected for processing (7)
  • picojson/examples/push_parser_demo.rs (2 hunks)
  • picojson/src/copy_on_escape.rs (1 hunks)
  • picojson/src/escape_processor.rs (3 hunks)
  • picojson/src/push_parser.rs (5 hunks)
  • picojson/src/shared.rs (1 hunks)
  • picojson/src/slice_content_builder.rs (4 hunks)
  • picojson/src/stream_content_builder.rs (5 hunks)
✅ Files skipped from review due to trivial changes (2)
  • picojson/examples/push_parser_demo.rs
  • picojson/src/copy_on_escape.rs
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
📚 Learning: in json parsing tests for picojson-rs, when testing unicode escape sequences, raw string literals us...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.

Applied to files:

  • picojson/src/escape_processor.rs
  • picojson/src/push_parser.rs
  • picojson/src/slice_content_builder.rs
📚 Learning: in the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and ful...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Applied to files:

  • picojson/src/escape_processor.rs
  • picojson/src/push_parser.rs
  • picojson/src/shared.rs
  • picojson/src/stream_content_builder.rs
📚 Learning: in the stax json parser codebase, escapesequence event handlers exist in flex_parser.rs not because ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Applied to files:

  • picojson/src/escape_processor.rs
  • picojson/src/push_parser.rs
  • picojson/src/shared.rs
  • picojson/src/stream_content_builder.rs
  • picojson/src/slice_content_builder.rs
📚 Learning: the pullparser trait in picojson-rs provides both next() and next_event() methods. the next() method...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Applied to files:

  • picojson/src/push_parser.rs
  • picojson/src/shared.rs
  • picojson/src/stream_content_builder.rs
  • picojson/src/slice_content_builder.rs
📚 Learning: in the picojson-rs project, the bitstack trait was redesigned to return bool instead of option...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Applied to files:

  • picojson/src/push_parser.rs
  • picojson/src/shared.rs
  • picojson/src/stream_content_builder.rs
📚 Learning: in picojson-rs sliceparser, is_empty() and is_past_end() serve different purposes: is_empty() return...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Applied to files:

  • picojson/src/push_parser.rs
  • picojson/src/shared.rs
  • picojson/src/stream_content_builder.rs
  • picojson/src/slice_content_builder.rs
📚 Learning: in picojson-rs event processing, the sliceparser uses an if/else pattern with process_simple_events(...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Applied to files:

  • picojson/src/push_parser.rs
  • picojson/src/stream_content_builder.rs
  • picojson/src/slice_content_builder.rs
📚 Learning: in picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Applied to files:

  • picojson/src/push_parser.rs
  • picojson/src/stream_content_builder.rs
  • picojson/src/slice_content_builder.rs
🧬 Code Graph Analysis (1)
picojson/src/shared.rs (5)
picojson/src/slice_content_builder.rs (3)
  • get_borrowed_slice (177-179)
  • get_unescaped_slice (181-192)
  • has_unescaped_content (194-196)
picojson/src/stream_content_builder.rs (3)
  • get_borrowed_slice (327-331)
  • get_unescaped_slice (333-335)
  • has_unescaped_content (337-339)
picojson/src/push_parser.rs (3)
  • get_borrowed_slice (54-70)
  • get_unescaped_slice (72-76)
  • has_unescaped_content (78-80)
picojson/src/stream_buffer.rs (2)
  • get_unescaped_slice (218-225)
  • has_unescaped_content (243-245)
picojson/src/copy_on_escape.rs (1)
  • has_unescaped_content (210-212)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run AVR Test Suites
🔇 Additional comments (13)
picojson/src/shared.rs (2)

174-213: Well-designed trait abstraction for unified content access.

The DataSource trait provides a clean and flexible interface for accessing both borrowed and unescaped content with proper lifetime management.


257-278: Clean implementation of the content extraction helper.

The function effectively consolidates the content extraction logic using the DataSource trait, providing a consistent interface across all parsers.

picojson/src/escape_processor.rs (2)

270-278: LGTM! Clean accessor methods for surrogate state management.

These methods properly expose the pending high surrogate state for external management.


660-738: Good refactoring to use the DataSource trait abstraction.

The function now properly uses the DataSource trait for buffer access and externalizes surrogate state management, making it reusable across different parser implementations.

picojson/src/push_parser.rs (4)

41-81: Well-implemented DataSource adapter for push parser.

The PushParserDataSource correctly handles position conversion for chunked input and properly delegates to the stream buffer for unescaped content.


230-238: Correct usage of DataSource pattern in finish method.

The empty input data is appropriate here since we're only accessing the unescaped buffer content.


300-327: Clean refactoring using the unified content extraction pattern.

The code correctly handles position calculations and properly uses the get_content_piece helper for consistent string/key extraction.


352-374: Consistent application of the DataSource pattern for number extraction.

The unified approach using get_content_piece simplifies the code and maintains consistency across all content types.

picojson/src/slice_content_builder.rs (2)

97-112: Clean simplification of number extraction.

Good use of the DataSource trait method directly for number bytes extraction.


172-197: Well-implemented DataSource trait with proper safety checks.

The implementation correctly handles both borrowed and unescaped content access with appropriate error handling.

picojson/src/stream_content_builder.rs (3)

198-206: Good refactoring with proper StreamParser-specific handling.

The queue reset mechanism appropriately handles content cleanup for streaming scenarios while using the unified extraction pattern.


218-235: Clean implementation using the DataSource abstraction.

The number extraction correctly handles the end position calculation and uses the trait method appropriately.


320-340: Correct DataSource implementation for streaming context.

The implementation properly handles the unified lifetime scenario for stream parsing and correctly delegates to the StreamBuffer.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6293ac3 and d3d3c4e.

📒 Files selected for processing (10)
  • picojson/examples/push_parser_demo.rs (2 hunks)
  • picojson/src/lib.rs (1 hunks)
  • picojson/src/push_content_builder.rs (1 hunks)
  • picojson/src/push_parser.rs (20 hunks)
  • picojson/tests/json_checker_tests.rs (1 hunks)
  • picojson/tests/push_parser.rs (14 hunks)
  • picojson/tests/push_parser_copy_on_escape.rs (3 hunks)
  • picojson/tests/push_parser_escapes.rs (6 hunks)
  • picojson/tests/push_parser_invalidslicebounds_repro.rs (3 hunks)
  • picojson/tests/push_parser_stress_test.rs (9 hunks)
✅ Files skipped from review due to trivial changes (5)
  • picojson/tests/push_parser_escapes.rs
  • picojson/tests/json_checker_tests.rs
  • picojson/tests/push_parser_invalidslicebounds_repro.rs
  • picojson/tests/push_parser_copy_on_escape.rs
  • picojson/src/lib.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • picojson/examples/push_parser_demo.rs
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
📚 Learning: in the picojson-rs project, the bitstack trait was redesigned to return bool instead of option...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Applied to files:

  • picojson/tests/push_parser.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: in picojson-rs event processing, the sliceparser uses an if/else pattern with process_simple_events(...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Applied to files:

  • picojson/tests/push_parser.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: in the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and ful...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Applied to files:

  • picojson/tests/push_parser.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: the pullparser trait in picojson-rs provides both next() and next_event() methods. the next() method...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Applied to files:

  • picojson/tests/push_parser.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: in picojson-rs sliceparser, is_empty() and is_past_end() serve different purposes: is_empty() return...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Applied to files:

  • picojson/tests/push_parser.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: in the stax json parser codebase, escapesequence event handlers exist in flex_parser.rs not because ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Applied to files:

  • picojson/tests/push_parser.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: in json parsing tests for picojson-rs, when testing unicode escape sequences, raw string literals us...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.

Applied to files:

  • picojson/tests/push_parser.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
📚 Learning: in picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Applied to files:

  • picojson/tests/push_parser.rs
  • picojson/src/push_content_builder.rs
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/push_parser.rs
🧬 Code Graph Analysis (2)
picojson/tests/push_parser_stress_test.rs (3)
picojson/tests/push_parser.rs (9)
  • parser (25-25)
  • parser (58-58)
  • parser (93-93)
  • parser (132-132)
  • parser (177-177)
  • parser (222-222)
  • parser (266-266)
  • parser (496-496)
  • parser (552-552)
picojson/src/push_content_builder.rs (1)
  • new (47-59)
picojson/src/push_parser.rs (1)
  • new (61-69)
picojson/src/push_parser.rs (4)
picojson/src/push_content_builder.rs (2)
  • stream_buffer (109-111)
  • new (47-59)
picojson/src/escape_processor.rs (1)
  • new (200-206)
picojson/src/stream_buffer.rs (1)
  • current_byte (81-89)
picojson/src/parse_error.rs (5)
  • from (41-48)
  • from (52-63)
  • from (67-69)
  • from (73-75)
  • from (79-81)
🪛 GitHub Actions: AVR Tests
picojson/src/push_parser.rs

[error] 12-12: Rust compiler error E0463: can't find crate for std. The avr-none target may not support the standard library. Consider building the standard library from source with cargo build -Zbuild-std.

🔇 Additional comments (17)
picojson/tests/push_parser.rs (1)

25-25: LGTM! Consistent API simplification across all tests.

The refactoring to use parser.finish::<()>().unwrap() instead of separate finish() and destroy() calls is consistently applied across all test functions. This aligns with the new PushParser API where finish consumes the parser and returns the handler directly.

Also applies to: 58-58, 93-93, 132-132, 177-177, 222-222, 266-266, 313-313, 323-323, 401-401, 451-451, 496-496, 552-552, 688-688

picojson/src/push_content_builder.rs (8)

11-21: Well-designed SAX-style event handler trait.

The trait properly separates input and scratch buffer lifetimes, following the established pattern for push-based parsers. The generic error type E provides flexibility for different error handling strategies.


46-59: Proper initialization of content builder state.

All fields are correctly initialized with appropriate default values. The StreamBuffer encapsulation provides clean separation of buffer management concerns.


119-124: Correct use of HRTB for flexible lifetime handling.

The method properly delegates to the handler with appropriate lifetime constraints, ensuring the handler can work with any combination of input and scratch lifetimes.


137-157: Atomic number emission with proper cleanup.

The method correctly ensures the unescaped buffer is cleared after event emission, preventing data leakage between parsing operations. Error handling appropriately wraps both parsing and handler errors.


159-201: Clean delegation to shared content extraction logic.

The method effectively uses the DataSource pattern to avoid code duplication while properly handling position adjustments for quoted content. The unified handling of strings and keys reduces complexity.


204-239: Consistent number emission using DataSource pattern.

The method maintains consistency with string/key extraction by using the DataSource pattern while ensuring atomic buffer cleanup after event emission.


255-381: Proper ContentExtractor adaptation for push parsing.

The implementation correctly handles the differences between pull and push parsing. The next_byte returning None is appropriate since bytes are fed externally. Unicode escape processing and state management are properly implemented.


384-423: Well-implemented DataSource abstraction for chunk processing.

The implementation correctly handles position offset conversion using saturating_sub to prevent underflow. Bounds checking ensures safe slice access, and the abstraction cleanly separates borrowed and unescaped content sources.

picojson/tests/push_parser_stress_test.rs (4)

116-141: Method signature correctly updated for new parser API.

The run method now properly takes ownership of the parser and returns the handler, aligning with the new finish() API that consumes the parser. The generic constraint E: From<ParseError> ensures proper error conversion support.


225-225: Test scenarios correctly updated for escape processing.

The changes properly reflect:

  1. Correct JSON escape sequence format using single backslashes in raw byte string literals
  2. Expected Unescaped string type for content with escape sequences
  3. Appropriate buffer size adjustments for the new escape processing logic

Also applies to: 247-254


304-311: Consistent parser usage pattern with proper ownership transfer.

The immutable parser binding correctly reflects that ownership transfers to ChunkedWriter::run(). The pattern of creating parser → running → verifying handler is consistently applied throughout the tests.

Also applies to: 614-618


528-529: Unicode test expectations properly updated.

The tests correctly expect picojson::String::Unescaped variants for Unicode escape sequences, with properly decoded Unicode characters in the expected values.

Also applies to: 539-540, 551-552

picojson/src/push_parser.rs (4)

40-69: Clean separation of concerns with PushContentBuilder delegation.

The refactored struct properly delegates content management to PushContentBuilder while maintaining high-level state control. This improves modularity and reduces complexity in the parser implementation.


72-161: Proper delegation pattern maintains control flow integrity.

The write method correctly delegates all content operations to PushContentBuilder while preserving the original parsing logic. Position tracking, buffer management, and escape processing are appropriately handled through delegation.


163-216: Finish method correctly implements ownership transfer pattern.

The method properly:

  1. Validates parsing state completeness
  2. Emits the final EndDocument event
  3. Consumes the parser and returns the handler

This API change simplifies usage by eliminating the need for a separate destroy() method.


581-602: Helper functions properly handle tokenizer integration.

The functions correctly:

  1. Convert relative tokenizer positions to absolute positions
  2. Use the established 2-slot event storage pattern
  3. Provide clean iteration over stored events

Comment on lines +11 to +12
#[cfg(any(test, debug_assertions))]
extern crate std;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix AVR build failure by restricting std dependency to tests only.

The AVR target failure occurs because debug_assertions might be enabled in AVR builds, causing the std import. Since these error conversions are only used in tests, restrict the configuration to tests only.

-#[cfg(any(test, debug_assertions))]
+#[cfg(test)]
 extern crate std;

 // ... later in file ...

-#[cfg(any(test, debug_assertions))]
+#[cfg(test)]
 impl From<ParseError> for std::string::String {
     fn from(_: ParseError) -> Self {
         std::string::String::new()
     }
 }

-#[cfg(any(test, debug_assertions))]
+#[cfg(test)]
 impl From<ParseError> for () {
     fn from(_: ParseError) -> Self {
         ()
     }
 }

Also applies to: 606-618

🧰 Tools
🪛 GitHub Actions: AVR Tests

[error] 12-12: Rust compiler error E0463: can't find crate for std. The avr-none target may not support the standard library. Consider building the standard library from source with cargo build -Zbuild-std.

🤖 Prompt for AI Agents
In picojson/src/push_parser.rs at lines 11-12 and also lines 606-618, the std
crate is imported with a cfg attribute that includes debug_assertions, which
causes build failures on AVR targets. Modify the cfg attribute to restrict the
std import to tests only by using #[cfg(test)] instead of #[cfg(any(test,
debug_assertions))]. Apply this change to both the initial import and the code
block at lines 606-618 to fix the AVR build failure.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
picojson/src/push_parser.rs (1)

13-14: Fix AVR build failure by restricting std dependency to tests only.

The AVR target failure occurs because debug_assertions might be enabled in AVR builds, causing the std import. Since these error conversions are only used in tests, restrict the configuration to tests only.

-#[cfg(any(test, debug_assertions))]
+#[cfg(test)]
 extern crate std;

 // ... later in file ...

-#[cfg(any(test, debug_assertions))]
+#[cfg(test)]
 impl From<ParseError> for std::string::String {
     fn from(_: ParseError) -> Self {
         std::string::String::new()
     }
 }

-#[cfg(any(test, debug_assertions))]
+#[cfg(test)]
 impl From<ParseError> for () {
     fn from(_: ParseError) -> Self {
         ()
     }
 }

Also applies to: 197-209

🧹 Nitpick comments (2)
picojson/src/push_content_builder.rs (2)

351-356: Remove or document the empty code block.

The empty else block serves no purpose and should be removed for cleaner code.

-        if self.unicode_escape_collector.is_in_progress() {
-            // The collector is still in progress, which means not all hex digits were processed
-            // This can happen when the End UnicodeEscape event consumes the last hex digit before
-            // the byte accumulator can process it. For now, we'll log this and not treat it as an error.
-        } else {
-        }
+        // The collector might still be in progress if the End UnicodeEscape event consumed
+        // the last hex digit before the byte accumulator could process it.
+        // This is expected behavior for chunked input and not an error.

564-565: Remove empty else block.

-        } else {
-        }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d3d3c4e and 6f7b767.

📒 Files selected for processing (7)
  • picojson/src/event_processor.rs (4 hunks)
  • picojson/src/json_number.rs (2 hunks)
  • picojson/src/json_string.rs (1 hunks)
  • picojson/src/push_content_builder.rs (1 hunks)
  • picojson/src/push_parser.rs (3 hunks)
  • picojson/src/shared.rs (3 hunks)
  • picojson/tests/push_parser_stress_test.rs (10 hunks)
✅ Files skipped from review due to trivial changes (1)
  • picojson/src/json_string.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • picojson/tests/push_parser_stress_test.rs
  • picojson/src/shared.rs
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
📚 Learning: in the picojson-rs project, the bitstack trait was redesigned to return bool instead of option...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Applied to files:

  • picojson/src/json_number.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_content_builder.rs
  • picojson/src/push_parser.rs
📚 Learning: in the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and ful...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Applied to files:

  • picojson/src/json_number.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_content_builder.rs
  • picojson/src/push_parser.rs
📚 Learning: in picojson-rs sliceparser, is_empty() and is_past_end() serve different purposes: is_empty() return...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Applied to files:

  • picojson/src/json_number.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_content_builder.rs
  • picojson/src/push_parser.rs
📚 Learning: in picojson-rs event processing, the sliceparser uses an if/else pattern with process_simple_events(...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Applied to files:

  • picojson/src/event_processor.rs
  • picojson/src/push_content_builder.rs
  • picojson/src/push_parser.rs
📚 Learning: in the stax json parser codebase, escapesequence event handlers exist in flex_parser.rs not because ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Applied to files:

  • picojson/src/event_processor.rs
  • picojson/src/push_content_builder.rs
  • picojson/src/push_parser.rs
📚 Learning: the pullparser trait in picojson-rs provides both next() and next_event() methods. the next() method...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Applied to files:

  • picojson/src/event_processor.rs
  • picojson/src/push_content_builder.rs
  • picojson/src/push_parser.rs
📚 Learning: in picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Applied to files:

  • picojson/src/event_processor.rs
  • picojson/src/push_content_builder.rs
  • picojson/src/push_parser.rs
📚 Learning: in json parsing tests for picojson-rs, when testing unicode escape sequences, raw string literals us...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.

Applied to files:

  • picojson/src/event_processor.rs
  • picojson/src/push_content_builder.rs
  • picojson/src/push_parser.rs
📚 Learning: in rust crate organization, functions can be imported into lib.rs via `use` statements not for direc...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.

Applied to files:

  • picojson/src/push_parser.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run AVR Test Suites
🔇 Additional comments (7)
picojson/src/json_number.rs (1)

28-28: LGTM!

Adding Clone to NumberResult and JsonNumber is appropriate since all contained types implement Clone. This enables duplication of parsed numeric values in the event-driven parsing model.

Also applies to: 50-50

picojson/src/event_processor.rs (6)

25-27: LGTM! Clear documentation for the chunked input flag.

The addition of handles_chunked_input field with its comprehensive documentation clearly explains the behavioral difference between chunked (PushParser) and non-chunked (Slice/Stream) parsers.


31-49: Well-structured constructors for different parser modes.

The separation into new() for non-chunked parsers and new_chunked() for chunked parsers provides clear API boundaries and correctly initializes the handles_chunked_input flag.


51-54: Useful accessor for escape sequence state.

The in_escape_sequence() method provides clean read-only access to the escape sequence state, supporting the refactored escape handling logic.


59-70: Good refactoring with delegation pattern.

The updated next_event_impl maintains backward compatibility while delegating to the new next_event_impl_with_flags method. This provides a clean upgrade path for existing callers.


95-114: Well-designed byte accumulation logic for different parser modes.

The refactored accumulation logic correctly handles the different requirements for PushParser (which needs to accumulate during escape sequences) and other parsers. The detailed comments clearly explain the reasoning behind each branch.


116-133: Correct end-of-input handling for different parser types.

The implementation properly differentiates between chunked parsers (which return EndOfData to signal chunk boundaries) and non-chunked parsers (which finalize the document). The callback lifetime is correctly managed, and any events generated by finish() are properly processed.

Comment on lines +198 to +201
} else {
// Non-hex digit during Unicode escape - this shouldn't happen in valid JSON
self.in_unicode_escape = false;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider returning an error for non-hex digits during Unicode escape.

When a non-hex digit is encountered during Unicode escape processing, the code silently resets the flag. This could mask parsing errors in malformed JSON.

Consider returning a parsing error instead:

} else {
    // Non-hex digit during Unicode escape - this shouldn't happen in valid JSON
-   self.in_unicode_escape = false;
+   return Err(ParseError::InvalidUnicodeEscape);
}
🤖 Prompt for AI Agents
In picojson/src/push_content_builder.rs around lines 198 to 201, the code
currently resets the in_unicode_escape flag silently when encountering a non-hex
digit during Unicode escape processing. Instead, modify the code to return a
parsing error at this point to properly signal malformed JSON input, ensuring
the error is propagated to the caller for handling.

Comment on lines +547 to +551
let mut temp_buffer = [0u8; 1024]; // Should handle most token sizes
let byte_count = partial_slice.len();
if byte_count > temp_buffer.len() {
return Err(ParseError::InputBufferFull);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Document the 1024-byte token size limitation.

The fixed buffer size could cause issues with long strings or keys. Consider documenting this limitation or making it configurable.

Consider adding a constant and documentation:

+    /// Maximum size for a single token when copying across chunk boundaries
+    const MAX_TOKEN_SIZE: usize = 1024;
+
     // Copy bytes to a fixed-size buffer to avoid borrowing conflicts
-    let mut temp_buffer = [0u8; 1024]; // Should handle most token sizes
+    let mut temp_buffer = [0u8; Self::MAX_TOKEN_SIZE];
     let byte_count = partial_slice.len();
     if byte_count > temp_buffer.len() {
-        return Err(ParseError::InputBufferFull);
+        // Token too large for cross-chunk handling
+        return Err(ParseError::InputBufferFull);
     }

Document and parameterize the 1024-byte scratch buffer

The fixed-capacity buffer in copy_partial_content_to_scratch should be defined as a named constant (with documentation) and the empty else block removed to improve clarity and maintainability.

• Introduce a private constant in push_content_builder.rs:

/// Maximum size for a single token when copying across chunk boundaries.
const MAX_TOKEN_SIZE: usize = 1024;

• Replace the anonymous buffer with the named constant:

- let mut temp_buffer = [0u8; 1024]; // Should handle most token sizes
+ let mut temp_buffer = [0u8; Self::MAX_TOKEN_SIZE];

• Remove the empty else {} branch:

@@ -561,7 +561,6 @@
     let byte_count = partial_slice.len();
     if byte_count > temp_buffer.len() {
         // Token too large for cross-chunk handling
         return Err(ParseError::InputBufferFull);
     }
-    else { }
     // continue processing…

• (Optional) If consumers need larger tokens in non-embedded contexts, consider exposing MAX_TOKEN_SIZE as a configurable parameter on the builder or via a feature flag.

🤖 Prompt for AI Agents
In picojson/src/push_content_builder.rs around lines 547 to 551, the 1024-byte
scratch buffer should be replaced with a named private constant MAX_TOKEN_SIZE
with a documentation comment explaining its purpose. Define this constant at the
top of the file or an appropriate location. Replace the fixed-size array length
1024 with MAX_TOKEN_SIZE. Remove the empty else block following the if condition
to improve code clarity. Optionally, consider making MAX_TOKEN_SIZE configurable
if larger tokens are needed in some contexts.

Comment on lines 113 to 123
// Special case: For Numbers, check if the scratch buffer is actually empty
// This handles the byte-by-byte case where the flag is stale from previous Key processing
if matches!(extractor_state, State::Number(_)) {
let buffer_slice = self.extractor.get_unescaped_slice().unwrap_or(&[]);
let buffer_empty = buffer_slice.is_empty();

if buffer_empty {
self.extractor.copy_partial_content_to_scratch()?;
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider improving state management to avoid stale flags.

The special case for numbers suggests that has_unescaped_content() can return stale values from previous token processing. This workaround indicates a state management issue that should be addressed at the root.

Consider resetting the using_unescaped_buffer flag when transitioning between tokens to ensure it accurately reflects the current state, eliminating the need for this special case.

🤖 Prompt for AI Agents
In picojson/src/push_parser.rs around lines 113 to 123, the current code uses a
special case to handle stale flags in has_unescaped_content() for Number states,
indicating a state management issue. To fix this, identify where the
using_unescaped_buffer flag is set and reset it appropriately during token
transitions, ensuring it reflects the current token's state accurately. This
will remove the need for the special case by preventing stale flag values from
previous token processing.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
picojson/src/push_content_builder.rs (2)

198-201: Consider returning an error for non-hex digits during Unicode escape.

When a non-hex digit is encountered during Unicode escape processing, the code silently resets the flag. This could mask parsing errors in malformed JSON.


540-544: Document and parameterize the 1024-byte scratch buffer.

The fixed-capacity buffer should be defined as a named constant with documentation, and the size limitation should be clearly documented.

picojson/src/slice_content_builder.rs (2)

84-95: Same inefficiency in key extraction as in string extraction.

This method has the same double-processing issue where end_string() is called but its result is discarded.

Apply the same fix as for extract_string_content:

 fn extract_key_content(&mut self, start_pos: usize) -> Result<Event<'_, '_>, ParseError> {
-    // SliceParser-specific: Complete CopyOnEscape processing for unescaped content
     let current_pos = self.current_position();
-    if self.has_unescaped_content() {
-        let end_pos = ContentRange::end_position_excluding_delimiter(current_pos);
-        self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing
-    }
-
-    // Use the unified helper function to get the content
-    let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?;
-    Ok(Event::Key(content_piece.into_string()?))
+    let end_pos = ContentRange::end_position_excluding_delimiter(current_pos);
+    let key_result = self.copy_on_escape.end_string(end_pos)?;
+    Ok(Event::Key(key_result))
 }

71-82: Inefficient double processing of unescaped content.

When has_unescaped_content() is true, the code calls end_string() which creates a String that is immediately discarded, then get_content_piece recreates it. This duplicates work unnecessarily.

Consider using the result of end_string() directly:

 fn extract_string_content(&mut self, start_pos: usize) -> Result<Event<'_, '_>, ParseError> {
-    // SliceParser-specific: Complete CopyOnEscape processing for unescaped content
     let current_pos = self.current_position();
-    if self.has_unescaped_content() {
-        let end_pos = ContentRange::end_position_excluding_delimiter(current_pos);
-        self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing
-    }
-
-    // Use the unified helper function to get the content
-    let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?;
-    Ok(Event::String(content_piece.into_string()?))
+    let end_pos = ContentRange::end_position_excluding_delimiter(current_pos);
+    let value_result = self.copy_on_escape.end_string(end_pos)?;
+    Ok(Event::String(value_result))
 }
picojson/src/push_parser.rs (2)

13-14: Fix AVR build failure by restricting std dependency to tests only.

The AVR target failure occurs because debug_assertions might be enabled in AVR builds, causing the std import. Since these error conversions are only used in tests, restrict the configuration to tests only.

Also applies to: 195-205


111-121: Address state management to avoid stale flags.

The special case for numbers indicates that has_unescaped_content() can return stale values from previous token processing. This workaround suggests an underlying state management issue.

🧹 Nitpick comments (1)
picojson/src/event_processor.rs (1)

90-106: Well-designed byte accumulation logic for escape sequences.

The conditional accumulation logic correctly handles the different requirements of chunked parsers during escape sequences while maintaining backward compatibility for other parsers.

Consider extracting the accumulation logic into a helper method for better readability:

-let should_accumulate = if always_accumulate_during_escapes {
-    // For PushParser: accumulate during escapes even when events are generated
-    // This ensures hex digits reach the accumulator even when End UnicodeEscape events consume them
-    // BUT still respect the normal logic when not in escape sequences
-    if self.in_escape_sequence {
-        true // Always accumulate during escape sequences
-    } else {
-        !have_events(&self.parser_state.evts) // Normal behavior outside escapes
-    }
-} else {
-    // For other parsers: only accumulate when no events generated and not in escape
-    !have_events(&self.parser_state.evts) && !self.in_escape_sequence
-};
+let should_accumulate = self.should_accumulate_byte(
+    always_accumulate_during_escapes,
+    &self.parser_state.evts
+);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da9ed78 and 027f84e.

📒 Files selected for processing (9)
  • picojson/src/copy_on_escape.rs (1 hunks)
  • picojson/src/escape_processor.rs (4 hunks)
  • picojson/src/event_processor.rs (4 hunks)
  • picojson/src/push_content_builder.rs (1 hunks)
  • picojson/src/push_parser.rs (3 hunks)
  • picojson/src/shared.rs (2 hunks)
  • picojson/src/slice_content_builder.rs (4 hunks)
  • picojson/src/slice_input_buffer.rs (0 hunks)
  • picojson/src/stream_content_builder.rs (5 hunks)
💤 Files with no reviewable changes (1)
  • picojson/src/slice_input_buffer.rs
🚧 Files skipped from review as they are similar to previous changes (4)
  • picojson/src/copy_on_escape.rs
  • picojson/src/stream_content_builder.rs
  • picojson/src/shared.rs
  • picojson/src/escape_processor.rs
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
📚 Learning: the pullparser trait in picojson-rs provides both next() and next_event() methods. the next() method...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Applied to files:

  • picojson/src/push_content_builder.rs
  • picojson/src/slice_content_builder.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
📚 Learning: in the stax json parser codebase, escapesequence event handlers exist in flex_parser.rs not because ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Applied to files:

  • picojson/src/push_content_builder.rs
  • picojson/src/slice_content_builder.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
📚 Learning: in picojson-rs event processing, the sliceparser uses an if/else pattern with process_simple_events(...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Applied to files:

  • picojson/src/push_content_builder.rs
  • picojson/src/slice_content_builder.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
📚 Learning: in picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Applied to files:

  • picojson/src/push_content_builder.rs
  • picojson/src/slice_content_builder.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
📚 Learning: in the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and ful...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Applied to files:

  • picojson/src/push_content_builder.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
📚 Learning: in the picojson-rs project, the bitstack trait was redesigned to return bool instead of option...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Applied to files:

  • picojson/src/push_content_builder.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
📚 Learning: in picojson-rs sliceparser, is_empty() and is_past_end() serve different purposes: is_empty() return...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Applied to files:

  • picojson/src/push_content_builder.rs
  • picojson/src/slice_content_builder.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
📚 Learning: in json parsing tests for picojson-rs, when testing unicode escape sequences, raw string literals us...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.

Applied to files:

  • picojson/src/push_content_builder.rs
  • picojson/src/slice_content_builder.rs
  • picojson/src/event_processor.rs
  • picojson/src/push_parser.rs
📚 Learning: in rust crate organization, functions can be imported into lib.rs via `use` statements not for direc...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.

Applied to files:

  • picojson/src/push_parser.rs
🧬 Code Graph Analysis (1)
picojson/src/slice_content_builder.rs (6)
picojson/src/push_content_builder.rs (6)
  • extract_string_content (288-303)
  • extract_key_content (305-318)
  • extract_number (320-339)
  • get_borrowed_slice (478-500)
  • get_unescaped_slice (502-506)
  • has_unescaped_content (508-510)
picojson/src/stream_content_builder.rs (6)
  • extract_string_content (198-206)
  • extract_key_content (208-216)
  • extract_number (218-235)
  • get_borrowed_slice (327-331)
  • get_unescaped_slice (333-335)
  • has_unescaped_content (337-339)
picojson/src/slice_input_buffer.rs (1)
  • current_pos (43-45)
picojson/src/shared.rs (5)
  • end_position_excluding_delimiter (153-155)
  • get_content_piece (264-282)
  • get_borrowed_slice (198-202)
  • get_unescaped_slice (209-209)
  • has_unescaped_content (216-216)
picojson/src/escape_processor.rs (1)
  • process_unicode_escape_sequence (684-746)
picojson/src/copy_on_escape.rs (1)
  • has_unescaped_content (210-212)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run AVR Test Suites
🔇 Additional comments (7)
picojson/src/push_content_builder.rs (2)

11-21: Well-designed trait for push parser event handling.

The trait is properly documented with clear generic parameters and follows Rust conventions.


477-511: Correct implementation of DataSource for chunked input.

The implementation properly handles position offset conversions and validates bounds, which is essential for chunked parsing.

picojson/src/event_processor.rs (2)

31-49: Clear separation of chunked vs non-chunked parser construction.

The dual constructor approach makes the parser type explicit and improves code clarity.


111-128: Correct end-of-input handling for different parser types.

The logic properly differentiates between chunked parsers (which need more data) and non-chunked parsers (which finalize parsing), ensuring correct behavior for both use cases.

picojson/src/slice_content_builder.rs (1)

172-197: Well-implemented DataSource trait with proper lifetime management.

The implementation correctly delegates to the underlying buffers and maintains proper lifetime boundaries between input and scratch data.

picojson/src/push_parser.rs (2)

26-49: Excellent refactoring with clear separation of concerns.

The new structure properly delegates content extraction to PushContentExtractor and shares parsing logic through ParserCore, resulting in cleaner and more maintainable code.


133-157: Clean implementation of parser finalization.

The finish method properly validates parsing completion, emits the final event, and returns the handler in a single operation, improving the API ergonomics.

@kaidokert kaidokert closed this Aug 3, 2025
@kaidokert kaidokert deleted the datasource_intro branch August 4, 2025 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant