Even bigger refactor #57

kaidokert · 2025-07-15T14:54:36Z

Consolidate to single parser

Summary by Sourcery

Unify StreamParser and SliceParser event loops by extracting common logic into a new ParserCore, and refactor parsing responsibilities into dedicated content builder and provider modules to eliminate duplication, simplify escape handling, and streamline byte provisioning across both parsers.

New Features:

Introduce ParserCore to centralize the core event processing loop.

Enhancements:

Refactor StreamParser and SliceParser to delegate to ParserCore with StreamContentBuilder and SliceContentBuilder abstractions.
Simplify EscapeHandler and ContentExtractor traits by adding a unified extract_number method and standardizing escape timing.
Remove legacy state machines, processing enums, and duplicated buffer management code in favor of provider-based architecture.
Improve Display implementations for ParseError and JsonNumber with updated formatting macros.

Build:

Allow non_snake_case test names and clean up build.rs code generation attributes.

Tests:

Add debug_unicode_issue test to exercise complex escape sequences in streaming parser.

Chores:

Remove unused log and test-log dependencies from Cargo.toml.

Summary by CodeRabbit

New Features
- Introduced unified parsing core and content builder modules for both slice-based and streaming JSON parsing, improving efficiency and maintainability.
- Added new internal modules and traits to support zero-copy and streaming content extraction.
- Enhanced streaming parser with improved Unicode and escape sequence handling.
Refactor
- Refactored slice and stream parsers to delegate parsing logic to shared core components, reducing code duplication and simplifying state management.
- Updated formatting styles for output consistency.
Bug Fixes
- Improved buffer bounds checking in streaming buffer operations to prevent overflow errors.
Tests
- Added a new debug test for Unicode handling in streaming parsing.
- Removed obsolete and debug test files for cleaner test coverage.
Chores
- Removed unused dependencies and cleaned up configuration files.

…bigger_refactor

sourcery-ai · 2025-07-15T14:54:41Z

Reviewer's Guide

This PR consolidates StreamParser and SliceParser into a shared parsing core by introducing a new ParserCore module, and refactors each parser to delegate the main loop and byte provision to specialized provider/builder types. Common logic for token handling, escape processing, and content extraction has been moved into ParserCore, StreamContentBuilder, and SliceContentBuilder, removing duplication and simplifying parser implementations.

Class diagram for refactored parser architecture

classDiagram
    class ParserCore {
        +Tokenizer
        +ParserState
        +next_event_impl_unified()
        +next_event_impl_unified_with_accumulator()
    }
    class StreamParser {
        +ParserCore parser_core
        +StreamParserProvider provider
    }
    class SliceParser {
        +ParserCore parser_core
        +SliceContentBuilder content_builder
    }
    class StreamParserProvider {
        +StreamContentBuilder content_builder
        +Reader reader
        +bool finished
    }
    class StreamContentBuilder {
        +StreamBuffer stream_buffer
        +State parser_state
        +UnicodeEscapeCollector unicode_escape_collector
        +bool unescaped_reset_queued
        +bool in_unicode_escape
        +bool in_escape_sequence
        +bool finished
    }
    class SliceContentBuilder {
        +SliceInputBuffer buffer
        +CopyOnEscape copy_on_escape
        +State parser_state
        +UnicodeEscapeCollector unicode_escape_collector
        +bool in_escape_sequence
    }
    ParserCore <|-- StreamParser
    ParserCore <|-- SliceParser
    StreamParser o-- StreamParserProvider
    StreamParserProvider o-- StreamContentBuilder
    SliceParser o-- SliceContentBuilder
    StreamContentBuilder o-- StreamBuffer
    StreamContentBuilder o-- UnicodeEscapeCollector
    SliceContentBuilder o-- CopyOnEscape
    SliceContentBuilder o-- UnicodeEscapeCollector
    SliceContentBuilder o-- SliceInputBuffer
    StreamParserProvider o-- Reader

Class diagram for trait relationships in the new parser core

classDiagram
    class ByteProvider {
        <<trait>>
        +next_byte()
    }
    class ContentExtractor {
        <<trait>>
        +parser_state_mut()
        +current_position()
        +begin_string_content()
        +unicode_escape_collector_mut()
        +extract_string_content()
        +extract_key_content()
        +extract_number_content()
        +extract_number()
    }
    class EscapeHandler {
        <<trait>>
        +parser_state()
        +process_unicode_escape_with_collector()
        +handle_simple_escape_char()
        +begin_escape_sequence()
        +begin_unicode_escape()
    }
    class ParserProvider {
        <<trait>>
    }
    ByteProvider <|.. ParserProvider
    ContentExtractor <|.. ParserProvider
    ContentExtractor <|.. StreamContentBuilder
    ContentExtractor <|.. SliceContentBuilder
    EscapeHandler <|.. StreamContentBuilder
    EscapeHandler <|.. SliceContentBuilder
    ByteProvider <|.. StreamContentBuilder
    ByteProvider <|.. SliceContentBuilder

Class diagram for removed and replaced fields in StreamParser and SliceParser

classDiagram
    class StreamParser {
        -Tokenizer tokenizer
        -ParserState parser_state
        -Reader reader
        -StreamBuffer stream_buffer
        -ProcessingState processing_state
        -UnicodeEscapeCollector unicode_escape_collector
        +ParserCore parser_core
        +StreamParserProvider provider
    }
    class SliceParser {
        -Tokenizer tokenizer
        -SliceInputBuffer buffer
        -ParserState parser_state
        -CopyOnEscape copy_on_escape
        -UnicodeEscapeCollector unicode_escape_collector
        +ParserCore parser_core
        +SliceContentBuilder content_builder
    }

Class diagram for new modules: parser_core, stream_content_builder, slice_content_builder

classDiagram
    class parser_core {
        +ParserCore
        +EscapeTiming
    }
    class stream_content_builder {
        +StreamContentBuilder
        +StreamContentBuilderWithFiller
        +StreamContentBuilderWithReader
    }
    class slice_content_builder {
        +SliceContentBuilder
    }
    parser_core ..> StreamContentBuilder : uses
    parser_core ..> SliceContentBuilder : uses
    StreamContentBuilderWithFiller o-- StreamContentBuilder
    StreamContentBuilderWithReader o-- StreamContentBuilder

Class diagram for ParserState changes

classDiagram
    class ParserState {
        -State state
        +[Option<Event>; 2] evts
    }

Class diagram for StreamParserProvider and its role

classDiagram
    class StreamParserProvider {
        +StreamContentBuilder content_builder
        +Reader reader
        +bool finished
        +new()
    }
    StreamParserProvider o-- StreamContentBuilder
    StreamParserProvider o-- Reader

File-Level Changes

Change	Details	Files
Introduce a unified ParserCore to handle shared parsing loop	Add parser_core.rs implementing ParserCore, ParserProvider trait, and EscapeTiming enum Replace duplicated next_event_impl loops in stream_parser.rs and slice_parser.rs with calls to ParserCore Wire ParserCore into both parser implementations	`picojson/src/parser_core.rs` `picojson/src/stream_parser.rs` `picojson/src/slice_parser.rs`
Refactor StreamParser to use ParserCore and a provider abstraction	Remove manual tokenizer, buffer, and state fields in StreamParser struct Add StreamParserProvider to handle reader access and buffer management Update next_event to invoke parser_core.next_event_impl_unified_with_accumulator	`picojson/src/stream_parser.rs`
Extract stream content and escape logic into StreamContentBuilder	Create stream_content_builder.rs with StreamContentBuilder handling buffer compaction, escape processing, and content extraction Implement ContentExtractor and EscapeHandler for StreamContentBuilder and its reader wrapper	`picojson/src/stream_content_builder.rs`
Refactor SliceParser to use ParserCore and a content builder	Remove CopyOnEscape, tokenizer, and state fields from SliceParser struct Add SliceContentBuilder for zero-copy string and number extraction Delegate next_event to parser_core.next_event_impl_unified with OnBegin timing	`picojson/src/slice_parser.rs` `picojson/src/slice_content_builder.rs`
Update event_processor and related utilities for new architecture	Add default begin_unicode_escape and extract_number methods to EscapeHandler and ContentExtractor traits Adjust create_tokenizer_callback signature Tidy parse_error Display impl and json_number formatting Remove unused dev-dependencies and tweak Cargo.toml	`picojson/src/event_processor.rs` `picojson/src/parse_error.rs` `picojson/src/json_number.rs` `Cargo.toml`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2025-07-15T14:54:43Z

Warning

Rate limit exceeded

@kaidokert has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 42 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between cdc45a9 and cc553ac.

📒 Files selected for processing (7)

picojson/src/escape_processor.rs (1 hunks)
picojson/src/event_processor.rs (6 hunks)
picojson/src/slice_content_builder.rs (1 hunks)
picojson/src/slice_parser.rs (4 hunks)
picojson/src/stream_buffer.rs (2 hunks)
picojson/src/stream_content_builder.rs (1 hunks)
picojson/src/stream_parser.rs (3 hunks)

Walkthrough

This change refactors the JSON parser by introducing unified core abstractions (ParserCore, SliceContentBuilder, StreamContentBuilder) for both slice-based and streaming parsing. It removes manual event, escape, and state management from SliceParser and StreamParser, delegating these to the new shared components. Several new modules and traits are added, and related tests are updated or removed.

Changes

File(s)	Change Summary
picojson/Cargo.toml	Removed `log` and `test-log` dependencies.
picojson/build.rs	Suppressed non-snake-case warnings in generated tests; added `#[allow(dead_code)]` to some functions; updated string formatting style.
picojson/src/chunk_reader.rs	Updated `Reader` trait implementation to use anonymous lifetime.
picojson/src/event_processor.rs	Modified `EscapeHandler` and `ContentExtractor` traits; added new method signatures; simplified lifetime annotations; updated test mocks.
picojson/src/json_number.rs picojson/src/parse_error.rs	Updated formatting macros to use Rust's named argument style.
picojson/src/lib.rs	Added module declarations: `parser_core`, `stream_content_builder`, `slice_content_builder`.
picojson/src/number_parser.rs	Added a blank line to a doc comment.
picojson/src/parser_core.rs	New file: Introduced `ParserCore`, unified event loop, and `ParserProvider` trait.
picojson/src/shared.rs	Removed `state` field from `ParserState` struct and its initialization.
picojson/src/slice_content_builder.rs	New file: Added `SliceContentBuilder` struct with trait implementations for content extraction and escape handling.
picojson/src/slice_input_buffer.rs	Added `data_len` method to `SliceInputBuffer`.
picojson/src/slice_parser.rs	Refactored to use `ParserCore` and `SliceContentBuilder`, removed manual event and escape handling.
picojson/src/stream_buffer.rs	Replaced wrapping addition with checked addition in buffer compaction; removed a debug print in tests.
picojson/src/stream_content_builder.rs	New file: Added `StreamContentBuilder` for streaming parsing, with wrappers for reader integration and trait implementations.
picojson/src/stream_parser.rs	Refactored to use `ParserCore` and `StreamParserProvider`; removed manual state, event, and escape management.
picojson/tests/debug_root_numbers.rs picojson/tests/debug_stress_case.rs	Deleted: Removed test files for debugging root numbers and a stress case.
picojson/tests/stream_parser_stress_test.rs	Added `debug_unicode_issue` test for debugging Unicode and escape handling in streaming parser.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Parser (SliceParser/StreamParser)
    participant ParserCore
    participant ContentBuilder (Slice/Stream)
    participant Reader/Buffer

    User->>Parser: next_event()
    Parser->>ParserCore: next_event_impl_unified(...)
    ParserCore->>ContentBuilder: extract content / handle escapes
    ParserCore->>ContentBuilder: request next byte
    ContentBuilder->>Reader/Buffer: fetch/consume byte
    ContentBuilder-->>ParserCore: return byte/content
    ParserCore-->>Parser: Event or Error
    Parser-->>User: Event or Error

Possibly related PRs

kaidokert/picojson-rs#55: Introduces the foundational event_processor module and shared traits that enable the unification and refactor in this PR.
kaidokert/picojson-rs#39: Improves StreamBuffer compaction and position tracking, related to buffer handling changes in this PR.

Poem

A rabbit hopped through parser fields,
Refactoring code for greater yields.
Escapes and buffers, once a mess,
Now unified for parsing success!
With builders new and core so bright,
JSON is parsed just right.
🐇✨

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

gemini-code-assist

Summary of Changes

Hello @kaidokert, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a substantial refactor of the picojson library's parsing architecture. The primary goal is to eliminate code duplication between the SliceParser and StreamParser by consolidating their common event processing logic into a new, unified ParserCore. This change introduces a more modular design, abstracting input handling and content extraction into dedicated builder components, which significantly enhances maintainability and clarity of the parsing pipeline.

Highlights

Unified Parser Core: Introduced a new ParserCore module (picojson/src/parser_core.rs) that encapsulates the shared event processing loop previously duplicated in SliceParser and StreamParser. This centralizes the core parsing logic.
Content Builder Abstraction: Created dedicated content builder modules, SliceContentBuilder (picojson/src/slice_content_builder.rs) and StreamContentBuilder (picojson/src/stream_content_builder.rs), which implement the ContentExtractor and ByteProvider traits. These builders now handle parser-specific input management, string/key/number extraction, and escape sequence processing.
Simplified Parser Structures: The SliceParser and StreamParser structs have been significantly simplified. They now primarily compose the new ParserCore and their respective content builders/providers, delegating most of the parsing responsibilities to these new components.
Refined Escape Handling: The EscapeHandler trait and related functions in picojson/src/event_processor.rs have been updated to better accommodate the distinct escape processing timings required by slice-based (on Begin events for zero-copy) and stream-based (on End events for accumulation) parsing.
Dependency and Linting Cleanup: Removed unused test-log and log dependencies from Cargo.toml and added #[allow(dead_code)] attributes in build.rs for functions only used by the build script, improving code hygiene.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey @kaidokert - I've reviewed your changes - here's some feedback:

The two filler/provider structs (StreamContentBuilderWithFiller and StreamContentBuilderWithReader) have empty impl blocks and duplicate buffer-filling logic—please consolidate or remove them to avoid unused code.
The ContentExtractor impls duplicate number extraction in both extract_number_content and extract_number; consider defaulting one in the trait or merging them to cut down boilerplate.
There are multiple #[allow(dead_code)] / #[allow(non_snake_case)] attributes sprinkled across new modules—please audit and remove any that are no longer necessary.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- The two filler/provider structs (`StreamContentBuilderWithFiller` and `StreamContentBuilderWithReader`) have empty impl blocks and duplicate buffer-filling logic—please consolidate or remove them to avoid unused code.
- The `ContentExtractor` impls duplicate number extraction in both `extract_number_content` and `extract_number`; consider defaulting one in the trait or merging them to cut down boilerplate.
- There are multiple `#[allow(dead_code)]` / `#[allow(non_snake_case)]` attributes sprinkled across new modules—please audit and remove any that are no longer necessary.

## Individual Comments

### Comment 1
<location> `picojson/src/stream_content_builder.rs:500` </location>
<code_context>
+    fn process_unicode_escape_with_collector(&mut self) -> Result<(), ParseError> {
</code_context>

<issue_to_address>
UTF-8 byte copying for escape sequences could be simplified.

Consider iterating directly over the UTF-8 bytes instead of copying them to a fixed-size array. If the copy is required, please document the reason.

Suggested implementation:

```rust
        // Shared Unicode escape processing pattern - process UTF-8 bytes directly to avoid unnecessary copying
        let utf8_bytes_result = {
            let current_pos = self.stream_buffer.current_position();
            let hex_slice_provider = |start, end| {
                self.stream_buffer
                    .get_string_slice(start, end)
                    .map_err(Into::into)

```

```rust
        // Shared Unicode escape processing pattern - process UTF-8 bytes directly to avoid unnecessary copying
        let utf8_bytes_result = {
            let current_pos = self.stream_buffer.current_position();
            let hex_slice_provider = |start, end| {
                self.stream_buffer
                    .get_string_slice(start, end)
                    .map_err(Into::into)
            };
            // Instead of copying to a fixed-size array, iterate directly over the UTF-8 bytes
            // If a copy is required for borrow checker reasons, document it here.
            // For now, we assume direct iteration is possible:
            // Example:
            // for b in utf8_bytes.iter() {
            //     // process each byte
            // }

```

If the rest of the function (not shown) still uses a fixed-size array for UTF-8 bytes, you should refactor it to use direct iteration over the byte slice returned by the string slice provider. If you encounter borrow checker issues that require a copy, add a comment explaining why the copy is necessary.
</issue_to_address>

### Comment 2
<location> `picojson/tests/stream_parser_stress_test.rs:217` </location>
<code_context>
+        }
     }

     #[test]
</code_context>

<issue_to_address>
New test added for debugging Unicode and escape handling in StreamParser.

Please add assertions to verify the parsed content and character codes, so the test can automatically detect regressions instead of relying on printed output.

Suggested implementation:

```rust
#[test]
fn debug_unicode_issue() {
    let json = br#"["a\nb\t\"\\c\u1234d"]"#;
    let reader = ChunkReader::new(json, 5);
    let mut buffer = [0u8; 30];

    let mut parser = StreamParser::new(reader, &mut buffer);
    let mut event_count = 0;
    let mut parsed_string = None;

    loop {
        let event = match parser.next_event() {

```

```rust
    loop {
        let event = match parser.next_event() {

```

```rust
        let event = match parser.next_event() {
            Ok(ev) => ev,
            Err(e) => panic!("parse error: {:?}", e),
        };
        if let Some(ev) = event {
            event_count += 1;
            // Capture the parsed string value
            if let picojson::Event::String(s) = &ev {
                parsed_string = Some(s.clone());
            }
        } else {
            break;
        }
    }

    // The expected string is: a\nb\t"\\c\u{1234}d
    let expected = "a\nb\t\"\\c\u{1234}d";
    let parsed = parsed_string.expect("No string parsed");
    assert_eq!(parsed, expected, "Parsed string does not match expected value");

    // Check the character codes
    let expected_codes: Vec<u32> = expected.chars().map(|c| c as u32).collect();
    let parsed_codes: Vec<u32> = parsed.chars().map(|c| c as u32).collect();
    assert_eq!(parsed_codes, expected_codes, "Parsed character codes do not match expected");

    // Optionally, check event count (should be 3: StartArray, String, EndArray)
    assert_eq!(event_count, 3, "Unexpected number of events");
}

```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-07-15T14:56:16Z

picojson/src/stream_content_builder.rs

+                .mark_filled(bytes_read)
+                .map_err(crate::ParseError::from)?;
+        }
+        Ok(())


suggestion: UTF-8 byte copying for escape sequences could be simplified.

Consider iterating directly over the UTF-8 bytes instead of copying them to a fixed-size array. If the copy is required, please document the reason.

Suggested implementation:

// Shared Unicode escape processing pattern - process UTF-8 bytes directly to avoid unnecessary copying let utf8_bytes_result = { let current_pos = self.stream_buffer.current_position(); let hex_slice_provider = |start, end| { self.stream_buffer .get_string_slice(start, end) .map_err(Into::into)

// Shared Unicode escape processing pattern - process UTF-8 bytes directly to avoid unnecessary copying let utf8_bytes_result = { let current_pos = self.stream_buffer.current_position(); let hex_slice_provider = |start, end| { self.stream_buffer .get_string_slice(start, end) .map_err(Into::into) }; // Instead of copying to a fixed-size array, iterate directly over the UTF-8 bytes // If a copy is required for borrow checker reasons, document it here. // For now, we assume direct iteration is possible: // Example: // for b in utf8_bytes.iter() { // // process each byte // }

If the rest of the function (not shown) still uses a fixed-size array for UTF-8 bytes, you should refactor it to use direct iteration over the byte slice returned by the string slice provider. If you encounter borrow checker issues that require a copy, add a comment explaining why the copy is necessary.

sourcery-ai · 2025-07-15T14:56:16Z

picojson/tests/stream_parser_stress_test.rs

suggestion (testing): New test added for debugging Unicode and escape handling in StreamParser.

Please add assertions to verify the parsed content and character codes, so the test can automatically detect regressions instead of relying on printed output.

Suggested implementation:

#[test] fn debug_unicode_issue() { let json = br#"["a\nb\t\"\\c\u1234d"]"#; let reader = ChunkReader::new(json, 5); let mut buffer = [0u8; 30]; let mut parser = StreamParser::new(reader, &mut buffer); let mut event_count = 0; let mut parsed_string = None; loop { let event = match parser.next_event() {

loop { let event = match parser.next_event() {

let event = match parser.next_event() { Ok(ev) => ev, Err(e) => panic!("parse error: {:?}", e), }; if let Some(ev) = event { event_count += 1; // Capture the parsed string value if let picojson::Event::String(s) = &ev { parsed_string = Some(s.clone()); } } else { break; } } // The expected string is: a\nb\t"\\c\u{1234}d let expected = "a\nb\t\"\\c\u{1234}d"; let parsed = parsed_string.expect("No string parsed"); assert_eq!(parsed, expected, "Parsed string does not match expected value"); // Check the character codes let expected_codes: Vec<u32> = expected.chars().map(|c| c as u32).collect(); let parsed_codes: Vec<u32> = parsed.chars().map(|c| c as u32).collect(); assert_eq!(parsed_codes, expected_codes, "Parsed character codes do not match expected"); // Optionally, check event count (should be 3: StartArray, String, EndArray) assert_eq!(event_count, 3, "Unexpected number of events"); }

gemini-code-assist

Code Review

This is a significant and well-executed refactoring that unifies the SliceParser and StreamParser logic into a shared ParserCore. The new architecture using ContentBuilder and Provider traits greatly reduces code duplication and improves maintainability. I've found a few issues, including a potential correctness bug in surrogate pair handling and some dead code. Once these are addressed, this will be a very solid improvement.

gemini-code-assist · 2025-07-15T14:58:44Z

picojson/src/slice_content_builder.rs

+        // Fix: escape_start_pos should point to the backslash, not the 'u'
+        // But don't subtract if it's already pointing to the backslash
+        let actual_escape_start_pos = escape_start_pos;
+
+        // Handle UTF-8 bytes if we have them (not a high surrogate waiting for low surrogate)
+        if let Some(utf8_bytes) = utf8_bytes_opt {
+            if had_pending_high_surrogate {
+                // This is completing a surrogate pair - need to consume both escapes
+                // First call: consume the high surrogate (6 bytes earlier)
+                self.copy_on_escape
+                    .handle_unicode_escape(actual_escape_start_pos, &[])?;
+                // Second call: consume the low surrogate and write UTF-8
+                self.copy_on_escape
+                    .handle_unicode_escape(actual_escape_start_pos + 6, utf8_bytes)?;
+            } else {
+                // Single Unicode escape - normal processing
+                self.copy_on_escape
+                    .handle_unicode_escape(actual_escape_start_pos, utf8_bytes)?;
+            }
+        }
+
+        Ok(())
+    }


The logic for handling surrogate pairs may be incorrect. The // Fix: comment on line 165 is not accompanied by any change in logic, as actual_escape_start_pos is just assigned the value of escape_start_pos. This is confusing and should be either implemented or removed. Also, the surrogate pair handling logic starting on line 171 seems flawed. When processing a low surrogate, actual_escape_start_pos points to its start. The high surrogate was 6 bytes before this. The current code calls handle_unicode_escape with actual_escape_start_pos and actual_escape_start_pos + 6, which seems incorrect.

The logic should probably be:

Handle the high surrogate at actual_escape_start_pos - 6.

Handle the low surrogate at actual_escape_start_pos and write the combined UTF-8 character.

Here is a suggested correction, which also removes the confusing actual_escape_start_pos variable:

// Shared Unicode escape processing pattern let had_pending_high_surrogate = self.unicode_escape_collector.has_pending_high_surrogate(); let mut utf8_buf = [0u8; 4]; let (utf8_bytes_opt, escape_start_pos) = crate::escape_processor::process_unicode_escape_sequence( current_pos, &mut self.unicode_escape_collector, hex_slice_provider, &mut utf8_buf, )?; // Handle UTF-8 bytes if we have them (not a high surrogate waiting for low surrogate) if let Some(utf8_bytes) = utf8_bytes_opt { if had_pending_high_surrogate { // This is completing a surrogate pair. // First, handle the high surrogate at its original position (6 bytes before the current one). self.copy_on_escape .handle_unicode_escape(escape_start_pos.saturating_sub(6), &[])?; // Then, handle the low surrogate and write the combined UTF-8 character. self.copy_on_escape .handle_unicode_escape(escape_start_pos, utf8_bytes)?; } else { // Single Unicode escape - normal processing self.copy_on_escape .handle_unicode_escape(escape_start_pos, utf8_bytes)?; } } Ok(())

gemini-code-assist · 2025-07-15T14:58:45Z

picojson/src/stream_buffer.rs

        }

        // Update positions
+        let _old_tokenize_pos = self.tokenize_pos;


This variable _old_tokenize_pos is assigned but never used. It appears to be a leftover from debugging and should be removed to improve code clarity.

gemini-code-assist · 2025-07-15T14:58:45Z

picojson/src/stream_content_builder.rs

+/// Custom ByteProvider that can handle reader filling for StreamParser
+pub struct StreamContentBuilderWithFiller<'a, 'b, R: crate::stream_parser::Reader> {
+    content_builder: &'a mut StreamContentBuilder<'b>,
+    reader: &'a mut R,
+    finished: &'a mut bool,
+}
+
+impl<R: crate::stream_parser::Reader> StreamContentBuilderWithFiller<'_, '_, R> {}
+
+impl<R: crate::stream_parser::Reader> crate::shared::ByteProvider
+    for StreamContentBuilderWithFiller<'_, '_, R>
+{
+    fn next_byte(&mut self) -> Result<Option<u8>, crate::ParseError> {
+        // If buffer is empty, try to fill it first
+        if self.content_builder.stream_buffer.is_empty() {
+            self.content_builder.fill_buffer_from_reader(self.reader)?;
+        }
+
+        // If still empty after fill attempt, we're at EOF
+        if self.content_builder.stream_buffer.is_empty() {
+            // Set finished flag when we reach end of stream
+            if !*self.finished {
+                *self.finished = true;
+                self.content_builder.set_finished(true);
+            }
+            return Ok(None);
+        }
+
+        // Get byte and advance
+        let byte = self.content_builder.stream_buffer.current_byte()?;
+        self.content_builder.stream_buffer.advance()?;
+        Ok(Some(byte))
+    }
+}
+
+/// Provider that can fill the buffer from a reader
+pub struct StreamContentBuilderWithReader<'a, 'b, R: crate::stream_parser::Reader> {
+    pub content_builder: &'a mut StreamContentBuilder<'b>,
+    reader: &'a mut R,
+    finished: &'a mut bool,
+}
+
+impl<R: crate::stream_parser::Reader> StreamContentBuilderWithReader<'_, '_, R> {}
+
+impl<R: crate::stream_parser::Reader> crate::shared::ByteProvider
+    for StreamContentBuilderWithReader<'_, '_, R>
+{
+    fn next_byte(&mut self) -> Result<Option<u8>, crate::ParseError> {
+        // If buffer is empty, try to fill it first
+        if self.content_builder.stream_buffer.is_empty() {
+            if let Some(fill_slice) = self.content_builder.stream_buffer.get_fill_slice() {
+                let bytes_read = self
+                    .reader
+                    .read(fill_slice)
+                    .map_err(|_| crate::ParseError::ReaderError)?;
+                self.content_builder
+                    .stream_buffer
+                    .mark_filled(bytes_read)
+                    .map_err(crate::ParseError::from)?;
+            }
+        }
+
+        // If still empty after fill attempt, we're at EOF
+        if self.content_builder.stream_buffer.is_empty() {
+            // Set finished flag when we reach end of stream
+            if !*self.finished {
+                *self.finished = true;
+                self.content_builder.set_finished(true);
+            }
+            return Ok(None);
+        }
+
+        // Get byte and advance
+        let byte = self.content_builder.stream_buffer.current_byte()?;
+        self.content_builder.stream_buffer.advance()?;
+        Ok(Some(byte))
+    }
+}
+
+impl<R: crate::stream_parser::Reader> crate::event_processor::ContentExtractor
+    for StreamContentBuilderWithReader<'_, '_, R>
+{
+    fn parser_state_mut(&mut self) -> &mut crate::shared::State {
+        self.content_builder.parser_state_mut()
+    }
+
+    fn current_position(&self) -> usize {
+        self.content_builder.current_position()
+    }
+
+    fn begin_string_content(&mut self, pos: usize) {
+        self.content_builder.begin_string_content(pos);
+    }
+
+    fn unicode_escape_collector_mut(
+        &mut self,
+    ) -> &mut crate::escape_processor::UnicodeEscapeCollector {
+        self.content_builder.unicode_escape_collector_mut()
+    }
+
+    fn extract_string_content(
+        &mut self,
+        start_pos: usize,
+    ) -> Result<crate::Event<'_, '_>, crate::ParseError> {
+        self.content_builder.extract_string_content(start_pos)
+    }
+
+    fn extract_key_content(
+        &mut self,
+        start_pos: usize,
+    ) -> Result<crate::Event<'_, '_>, crate::ParseError> {
+        self.content_builder.extract_key_content(start_pos)
+    }
+
+    fn extract_number_content(
+        &mut self,
+        start_pos: usize,
+        from_container_end: bool,
+    ) -> Result<crate::Event<'_, '_>, crate::ParseError> {
+        self.content_builder
+            .extract_number_content(start_pos, from_container_end)
+    }
+
+    fn extract_number(
+        &mut self,
+        start_pos: usize,
+        from_container_end: bool,
+        finished: bool,
+    ) -> Result<crate::Event<'_, '_>, crate::ParseError> {
+        self.content_builder
+            .extract_number(start_pos, from_container_end, finished)
+    }
+}
+
+impl<R: crate::stream_parser::Reader> crate::event_processor::EscapeHandler
+    for StreamContentBuilderWithReader<'_, '_, R>
+{
+    fn parser_state(&self) -> &crate::shared::State {
+        self.content_builder.parser_state()
+    }
+
+    fn process_unicode_escape_with_collector(&mut self) -> Result<(), crate::ParseError> {
+        self.content_builder.process_unicode_escape_with_collector()
+    }
+
+    fn handle_simple_escape_char(&mut self, escape_char: u8) -> Result<(), crate::ParseError> {
+        self.content_builder.handle_simple_escape_char(escape_char)
+    }
+
+    fn begin_escape_sequence(&mut self) -> Result<(), crate::ParseError> {
+        self.content_builder.begin_escape_sequence()
+    }
+
+    fn begin_unicode_escape(&mut self) -> Result<(), crate::ParseError> {
+        self.content_builder.begin_unicode_escape()
+    }
+}


The structs StreamContentBuilderWithFiller and StreamContentBuilderWithReader and their implementations appear to be unused, as StreamParser now uses StreamParserProvider to bridge the Reader and the ParserCore. These structs seem like leftovers from a previous stage of the refactoring and should be removed if they are indeed dead code to improve maintainability.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

picojson/src/parser_core.rs (1)

120-149: Consider refactoring duplicate escape token patterns.

The pattern matching for escape tokens is duplicated between the OnBegin and OnEnd cases. While the current implementation is clear with good comments explaining the different timing requirements, you could reduce duplication by extracting the token pattern into a helper function or constant.

+/// Helper to check if a token is a simple escape token
+fn is_simple_escape_token(token: &EventToken) -> Option<&EventToken> {
+    match token {
+        EventToken::EscapeQuote
+        | EventToken::EscapeBackslash
+        | EventToken::EscapeSlash
+        | EventToken::EscapeBackspace
+        | EventToken::EscapeFormFeed
+        | EventToken::EscapeNewline
+        | EventToken::EscapeCarriageReturn
+        | EventToken::EscapeTab => Some(token),
+        _ => None,
+    }
+}
+
                 ujson::Event::Begin(EventToken::EscapeSequence) => {
                     process_begin_escape_sequence_event(provider)?;
                 }
                 _ if process_unicode_escape_events(&taken, provider)? => {
                     // Unicode escape events handled by shared function
                 }
-                ujson::Event::Begin(
-                    escape_token @ (EventToken::EscapeQuote
-                    | EventToken::EscapeBackslash
-                    | EventToken::EscapeSlash
-                    | EventToken::EscapeBackspace
-                    | EventToken::EscapeFormFeed
-                    | EventToken::EscapeNewline
-                    | EventToken::EscapeCarriageReturn
-                    | EventToken::EscapeTab),
-                ) if escape_timing == EscapeTiming::OnBegin => {
+                ujson::Event::Begin(token) if escape_timing == EscapeTiming::OnBegin => {
+                    if let Some(escape_token) = is_simple_escape_token(token) {
                     // SliceParser-specific: Handle simple escape sequences on Begin events
                     // because CopyOnEscape requires starting unescaping immediately when
                     // the escape token begins to maintain zero-copy optimization
                     process_simple_escape_event(&escape_token, provider)?;
+                    }
                 }
-                ujson::Event::End(
-                    escape_token @ (EventToken::EscapeQuote
-                    | EventToken::EscapeBackslash
-                    | EventToken::EscapeSlash
-                    | EventToken::EscapeBackspace
-                    | EventToken::EscapeFormFeed
-                    | EventToken::EscapeNewline
-                    | EventToken::EscapeCarriageReturn
-                    | EventToken::EscapeTab),
-                ) if escape_timing == EscapeTiming::OnEnd => {
+                ujson::Event::End(token) if escape_timing == EscapeTiming::OnEnd => {
+                    if let Some(escape_token) = is_simple_escape_token(token) {
                     // StreamParser-specific: Handle simple escape sequences on End events
                     // because StreamBuffer must wait until the token ends to accumulate
                     // all bytes before processing the complete escape sequence
                     process_simple_escape_event(&escape_token, provider)?;
+                    }
                 }

picojson/src/stream_content_builder.rs (1)

84-119: Refactor repeated position update pattern.

The position update logic after compaction repeats the same pattern for each state variant. This could be simplified.

     fn update_positions_after_compaction(
         &mut self,
         compaction_offset: usize,
     ) -> Result<(), crate::ParseError> {
+        // Helper to safely subtract compaction offset
+        let adjust_position = |pos: &mut usize| {
+            *pos = pos.saturating_sub(compaction_offset);
+        };
+
         // Update positions - since we compact from the token start position,
         // positions should not be discarded in normal operation
         match &mut self.parser_state {
             crate::shared::State::None => {
                 // No position-based state to update
             }
-            crate::shared::State::Key(pos) => {
-                if *pos >= compaction_offset {
-                    *pos = pos.checked_sub(compaction_offset).unwrap_or(0);
-                } else {
-                    // This shouldn't happen since we compact from the token start
-                    *pos = 0;
-                }
-            }
-            crate::shared::State::String(pos) => {
-                if *pos >= compaction_offset {
-                    *pos = pos.checked_sub(compaction_offset).unwrap_or(0);
-                } else {
-                    // This shouldn't happen since we compact from the token start
-                    *pos = 0;
-                }
-            }
-            crate::shared::State::Number(pos) => {
-                if *pos >= compaction_offset {
-                    *pos = pos.checked_sub(compaction_offset).unwrap_or(0);
-                } else {
-                    // This shouldn't happen since we compact from the token start
-                    *pos = 0;
-                }
-            }
+            crate::shared::State::Key(pos) 
+            | crate::shared::State::String(pos) 
+            | crate::shared::State::Number(pos) => {
+                adjust_position(pos);
+            }
         }
         Ok(())
     }

picojson/src/stream_parser.rs (1)

78-87: Consider making the documentation more concise.

The current comments are somewhat repetitive. A more concise version would improve readability:

-/// Implement the required traits for StreamParser to work with unified ParserCore
-/// This provider handles the StreamParser-specific operations needed by the unified parser core
-/// It bridges the gap between the generic ParserCore and the StreamParser's specific requirements
-/// for streaming input and buffer management
-/// The provider contains mutable references to the StreamParser's internal state
-/// which allows the unified parser core to control the parsing process
+/// Provider that bridges StreamParser with the unified ParserCore.
+/// Handles streaming-specific operations like buffer management and reader access.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6be34a4 and cdc45a9.

📒 Files selected for processing (19)

picojson/Cargo.toml (0 hunks)
picojson/build.rs (3 hunks)
picojson/src/chunk_reader.rs (1 hunks)
picojson/src/event_processor.rs (5 hunks)
picojson/src/json_number.rs (1 hunks)
picojson/src/lib.rs (1 hunks)
picojson/src/number_parser.rs (1 hunks)
picojson/src/parse_error.rs (1 hunks)
picojson/src/parser_core.rs (1 hunks)
picojson/src/shared.rs (0 hunks)
picojson/src/slice_content_builder.rs (1 hunks)
picojson/src/slice_input_buffer.rs (1 hunks)
picojson/src/slice_parser.rs (4 hunks)
picojson/src/stream_buffer.rs (1 hunks)
picojson/src/stream_content_builder.rs (1 hunks)
picojson/src/stream_parser.rs (4 hunks)
picojson/tests/debug_root_numbers.rs (0 hunks)
picojson/tests/debug_stress_case.rs (0 hunks)
picojson/tests/stream_parser_stress_test.rs (1 hunks)

💤 Files with no reviewable changes (4)

picojson/src/shared.rs
picojson/tests/debug_root_numbers.rs
picojson/Cargo.toml
picojson/tests/debug_stress_case.rs

🧰 Additional context used

🧠 Learnings (16)

📓 Common learnings

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

picojson/src/lib.rs (4)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

picojson/src/json_number.rs (2)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

picojson/src/slice_input_buffer.rs (1)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/src/chunk_reader.rs (2)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

picojson/src/parse_error.rs (3)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

picojson/src/stream_buffer.rs (5)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

picojson/tests/stream_parser_stress_test.rs (6)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/src/number_parser.rs (5)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

picojson/build.rs (3)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.

picojson/src/event_processor.rs (7)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/src/parser_core.rs (7)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

picojson/src/stream_content_builder.rs (3)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

picojson/src/slice_parser.rs (7)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

picojson/src/stream_parser.rs (7)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

picojson/src/slice_content_builder.rs (4)

Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.

Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.

🧬 Code Graph Analysis (2)

picojson/src/event_processor.rs (3)

picojson/src/stream_content_builder.rs (4)

begin_unicode_escape (456-458)

begin_unicode_escape (493-498)

extract_number (267-282)

extract_number (426-434)

picojson/src/stream_parser.rs (2)

begin_unicode_escape (142-144)

extract_number (183-191)

picojson/src/slice_content_builder.rs (1)

extract_number (106-129)

picojson/src/stream_content_builder.rs (7)

picojson/src/stream_buffer.rs (3)

new (73-80)

current_position (236-238)

current_position (1007-1009)

picojson/src/number_parser.rs (4)

new (93-99)

current_position (22-22)

current_position (110-112)

parse_number_with_delimiter_logic (42-55)

picojson/src/stream_parser.rs (17)

new (46-48)

new (91-97)

new (258-260)

parser_state_mut (148-150)

current_position (152-154)

begin_string_content (156-158)

unicode_escape_collector_mut (160-164)

extract_string_content (166-168)

extract_key_content (170-172)

extract_number_content (174-181)

extract_number (183-191)

next_byte (101-122)

parser_state (126-128)

process_unicode_escape_with_collector (130-132)

handle_simple_escape_char (134-136)

begin_escape_sequence (138-140)

begin_unicode_escape (142-144)

picojson/src/escape_processor.rs (2)

new (200-206)

process_unicode_escape_sequence (659-700)

picojson/src/event_processor.rs (25)

new (425-431)

parser_state_mut (108-108)

parser_state_mut (457-459)

current_position (102-102)

current_position (449-451)

begin_string_content (105-105)

begin_string_content (453-455)

unicode_escape_collector_mut (112-114)

unicode_escape_collector_mut (461-465)

extract_string_content (117-120)

extract_string_content (467-472)

extract_key_content (123-126)

extract_key_content (474-479)

extract_number_content (129-133)

extract_number_content (481-487)

extract_number (141-146)

extract_number (489-496)

parser_state (14-14)

parser_state (435-437)

process_unicode_escape_with_collector (17-17)

process_unicode_escape_with_collector (439-441)

handle_simple_escape_char (20-20)

handle_simple_escape_char (443-445)

begin_escape_sequence (24-26)

begin_unicode_escape (30-32)

picojson/src/shared.rs (5)

new (62-66)

from_utf8 (181-183)

string_content_bounds_from_content_start (118-124)

end_position_excluding_delimiter (149-151)

next_byte (178-178)

picojson/src/parse_error.rs (4)

from (39-46)

from (50-61)

from (65-67)

from (71-73)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Run AVR Test Suites

🔇 Additional comments (24)

picojson/build.rs (3)

82-82: LGTM: Improved lint suppression scope

The change from #[allow(clippy::non_snake_case)] to #![allow(non_snake_case)] is appropriate as it applies to the entire generated module, ensuring consistent test naming that matches the original JSON test suite conventions.

133-133: LGTM: Appropriate dead code suppression

The #[allow(dead_code)] attributes are correctly applied to utility functions that are part of the build script infrastructure but may not be actively used in all configurations.

Also applies to: 168-168, 174-174, 180-180

161-161: LGTM: Modern string formatting syntax

The update to use format!("{result}_{count}") follows Rust's modern formatting style introduced in 1.58+, improving code readability.

picojson/src/number_parser.rs (1)

64-64: LGTM: Minor documentation formatting improvement

The added blank line improves the readability of the documentation comment without affecting functionality.

picojson/src/slice_input_buffer.rs (1)

57-61: LGTM: Useful API addition

The new data_len() method provides a clean way to access the underlying data length for bounds checking, which supports the refactored parsing architecture.

picojson/src/json_number.rs (1)

146-146: LGTM: Modern string formatting syntax

The updates to use named arguments in format strings ({val} instead of "{}", val) follow modern Rust formatting conventions and improve code readability.

Also applies to: 148-148

picojson/src/chunk_reader.rs (1)

122-122: LGTM: Simplified lifetime annotation

The change from explicit lifetime parameter 'a to anonymous lifetime '_ is appropriate here since the lifetime doesn't need to be explicitly referenced within the implementation.

picojson/src/lib.rs (1)

66-67: LGTM: Clean module additions for unified parser architecture

The new module declarations (parser_core, stream_content_builder, slice_content_builder) are well-organized and align with the PR objective of consolidating to a single parser. The placement within the existing module hierarchy is appropriate.

Also applies to: 70-71, 74-75

picojson/src/parse_error.rs (1)

79-81: LGTM: Modern formatting syntax improvements

The updates to use Rust's more concise formatting syntax ({e} instead of {}", e) are clean modernizations that improve readability without changing functionality.

picojson/src/stream_buffer.rs (2)

149-156: Excellent safety improvement: Replace wrapping addition with checked addition

The change from wrapping_add to checked_add with proper error handling significantly improves safety by preventing silent integer overflow that could lead to memory safety issues. Returning InvalidSliceBounds error on overflow is the correct approach for bounds validation.

164-164: LGTM: Intentional unused variable for debugging

The _old_tokenize_pos variable properly captures the previous value and is appropriately marked as unused. This appears to be intentional for debugging or future use.

picojson/tests/stream_parser_stress_test.rs (1)

217-262: LGTM: Comprehensive Unicode escape test for streaming parser

The new debug_unicode_issue test function provides excellent coverage for Unicode escape handling in the streaming parser. The test setup with chunked reading (5-byte chunks, 30-byte buffer) and detailed debug output for character analysis is well-designed for validating the unified parser architecture's escape sequence handling.

picojson/src/event_processor.rs (5)

28-32: LGTM: Well-designed trait method with sensible default

The new begin_unicode_escape method in the EscapeHandler trait is well-designed with clear documentation and a sensible default no-op implementation. This allows parsers that don't need special Unicode escape handling to use the default while providing flexibility for those that do.

135-146: LGTM: Comprehensive number extraction method

The new extract_number method in the ContentExtractor trait is well-documented with clear parameter descriptions. The method signature properly handles the differences between slice and stream parsing (the finished parameter for StreamParser-specific logic) while maintaining a unified interface.

205-207: LGTM: Simplified lifetime parameters improve readability

The simplification of lifetime parameters in create_tokenizer_callback using anonymous lifetimes ('_) improves readability while maintaining the same functionality. This aligns with modern Rust style guidelines.

280-280: LGTM: Properly guarded method call

The call to begin_unicode_escape() is properly guarded by the state check ensuring it only executes when inside a string or key context. This maintains the same pattern as other escape handling code.

489-496: LGTM: Appropriate mock implementation

The stub implementation of extract_number in the mock is appropriate for testing. Using unimplemented!() with a clear message is the correct approach for mock methods that aren't exercised in tests.

picojson/src/slice_content_builder.rs (1)

85-129: Excellent defensive programming in number extraction.

The bounds clamping logic using core::cmp::min() prevents slice bounds errors when parsing standalone numbers at document end. The explicit acknowledgment of the unused finished parameter with clear explanation is also good practice.

picojson/src/stream_parser.rs (6)

3-38: Excellent simplification of StreamParser structure.

The consolidation from 6 internal fields to just 2 (parser_core and provider) significantly reduces state management complexity and aligns well with the unified parser architecture.

193-208: Good handling of EOF for number parsing.

The override of validate_and_extract_number correctly uses the finished state to handle number parsing at end-of-document, which is essential for streaming parsers.

210-226: Clean delegation to unified parser core.

The implementation properly delegates to parser_core.next_event_impl_unified_with_accumulator with appropriate timing (EscapeTiming::OnEnd) for streaming parsers. The byte accumulation closure correctly delegates to the content builder.

232-245: Correct implementation of PullParser trait.

The implementation properly checks the finished flag and applies unescaped reset before delegating to next_event_impl, maintaining consistency with the SliceParser pattern.

1203-1214: Good conversion to assertion-based testing.

The change from printing to assertions improves test automation and makes failures more explicit.

289-1282: Comprehensive test coverage for the refactored StreamParser.

The test suite thoroughly covers:

Basic JSON structures (objects, arrays, primitives)

Escape sequence processing including Unicode

Number parsing with various configurations

Edge cases like buffer boundaries and EOF handling

Error recovery scenarios

This extensive coverage provides confidence in the refactored implementation.

picojson/src/slice_parser.rs

picojson/src/slice_content_builder.rs

coderabbitai · 2025-07-15T14:59:42Z

picojson/src/stream_parser.rs

-        Ok(())
+        // Get byte and advance
+        let byte = self.content_builder.stream_buffer().current_byte()?;
+        let _current_pos = self.content_builder.stream_buffer().current_position();


⚠️ Potential issue

Remove unused variable.

The _current_pos variable is created but never used.

- let _current_pos = self.content_builder.stream_buffer().current_position();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let _current_pos = self.content_builder.stream_buffer().current_position();

🤖 Prompt for AI Agents

In picojson/src/stream_parser.rs at line 119, the variable _current_pos is declared but never used. Remove the declaration of _current_pos entirely to clean up the code and avoid unused variable warnings.

kaidokert added 10 commits July 13, 2025 20:47

Another big refactor

799450c

Chunk update

a74336d

Fixups

9f3cfb7

Bigger refactor

caf2d88

Bigger refactor

e10bec2

Merge remote-tracking branch 'origin/even_bigger_refactor' into even_…

4ba5358

…bigger_refactor

Merge branch 'bigold' into even_bigger_refactor

fb70426

Clean refactor near end

60339b3

More cleaning

a9286be

Fiiixups

cdc45a9

gemini-code-assist bot reviewed Jul 15, 2025

View reviewed changes

sourcery-ai bot reviewed Jul 15, 2025

View reviewed changes

gemini-code-assist bot reviewed Jul 15, 2025

View reviewed changes

coderabbitai bot reviewed Jul 15, 2025

View reviewed changes

kaidokert added 3 commits July 15, 2025 08:10

Merge branch 'main' into big2

909a514

Cleanups

8d596fc

More cleaning up

cc553ac

kaidokert closed this Jul 15, 2025

kaidokert deleted the big2 branch July 16, 2025 08:29

Even bigger refactor #57

Even bigger refactor #57

Uh oh!

Conversation

kaidokert commented Jul 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

sourcery-ai bot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for refactored parser architecture

Class diagram for trait relationships in the new parser core

Class diagram for removed and replaced fields in StreamParser and SliceParser

Class diagram for new modules: parser_core, stream_content_builder, slice_content_builder

Class diagram for ParserState changes

Class diagram for StreamParserProvider and its role

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai bot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kaidokert commented Jul 15, 2025 •

edited by coderabbitai bot

Loading

sourcery-ai bot commented Jul 15, 2025 •

edited

Loading

coderabbitai bot commented Jul 15, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)