Skip to content

Conversation

kaidokert
Copy link
Owner

@kaidokert kaidokert commented Jul 15, 2025

Consolidate to single parser

Summary by Sourcery

Unify StreamParser and SliceParser event loops by extracting common logic into a new ParserCore, and refactor parsing responsibilities into dedicated content builder and provider modules to eliminate duplication, simplify escape handling, and streamline byte provisioning across both parsers.

New Features:

  • Introduce ParserCore to centralize the core event processing loop.

Enhancements:

  • Refactor StreamParser and SliceParser to delegate to ParserCore with StreamContentBuilder and SliceContentBuilder abstractions.
  • Simplify EscapeHandler and ContentExtractor traits by adding a unified extract_number method and standardizing escape timing.
  • Remove legacy state machines, processing enums, and duplicated buffer management code in favor of provider-based architecture.
  • Improve Display implementations for ParseError and JsonNumber with updated formatting macros.

Build:

  • Allow non_snake_case test names and clean up build.rs code generation attributes.

Tests:

  • Add debug_unicode_issue test to exercise complex escape sequences in streaming parser.

Chores:

  • Remove unused log and test-log dependencies from Cargo.toml.

Summary by CodeRabbit

  • New Features

    • Introduced unified parsing core and content builder modules for both slice-based and streaming JSON parsing, improving efficiency and maintainability.
    • Added new internal modules and traits to support zero-copy and streaming content extraction.
    • Enhanced streaming parser with improved Unicode and escape sequence handling.
  • Refactor

    • Refactored slice and stream parsers to delegate parsing logic to shared core components, reducing code duplication and simplifying state management.
    • Updated formatting styles for output consistency.
  • Bug Fixes

    • Improved buffer bounds checking in streaming buffer operations to prevent overflow errors.
  • Tests

    • Added a new debug test for Unicode handling in streaming parsing.
    • Removed obsolete and debug test files for cleaner test coverage.
  • Chores

    • Removed unused dependencies and cleaned up configuration files.

Copy link

sourcery-ai bot commented Jul 15, 2025

Reviewer's Guide

This PR consolidates StreamParser and SliceParser into a shared parsing core by introducing a new ParserCore module, and refactors each parser to delegate the main loop and byte provision to specialized provider/builder types. Common logic for token handling, escape processing, and content extraction has been moved into ParserCore, StreamContentBuilder, and SliceContentBuilder, removing duplication and simplifying parser implementations.

Class diagram for refactored parser architecture

classDiagram
    class ParserCore {
        +Tokenizer
        +ParserState
        +next_event_impl_unified()
        +next_event_impl_unified_with_accumulator()
    }
    class StreamParser {
        +ParserCore parser_core
        +StreamParserProvider provider
    }
    class SliceParser {
        +ParserCore parser_core
        +SliceContentBuilder content_builder
    }
    class StreamParserProvider {
        +StreamContentBuilder content_builder
        +Reader reader
        +bool finished
    }
    class StreamContentBuilder {
        +StreamBuffer stream_buffer
        +State parser_state
        +UnicodeEscapeCollector unicode_escape_collector
        +bool unescaped_reset_queued
        +bool in_unicode_escape
        +bool in_escape_sequence
        +bool finished
    }
    class SliceContentBuilder {
        +SliceInputBuffer buffer
        +CopyOnEscape copy_on_escape
        +State parser_state
        +UnicodeEscapeCollector unicode_escape_collector
        +bool in_escape_sequence
    }
    ParserCore <|-- StreamParser
    ParserCore <|-- SliceParser
    StreamParser o-- StreamParserProvider
    StreamParserProvider o-- StreamContentBuilder
    SliceParser o-- SliceContentBuilder
    StreamContentBuilder o-- StreamBuffer
    StreamContentBuilder o-- UnicodeEscapeCollector
    SliceContentBuilder o-- CopyOnEscape
    SliceContentBuilder o-- UnicodeEscapeCollector
    SliceContentBuilder o-- SliceInputBuffer
    StreamParserProvider o-- Reader
Loading

Class diagram for trait relationships in the new parser core

classDiagram
    class ByteProvider {
        <<trait>>
        +next_byte()
    }
    class ContentExtractor {
        <<trait>>
        +parser_state_mut()
        +current_position()
        +begin_string_content()
        +unicode_escape_collector_mut()
        +extract_string_content()
        +extract_key_content()
        +extract_number_content()
        +extract_number()
    }
    class EscapeHandler {
        <<trait>>
        +parser_state()
        +process_unicode_escape_with_collector()
        +handle_simple_escape_char()
        +begin_escape_sequence()
        +begin_unicode_escape()
    }
    class ParserProvider {
        <<trait>>
    }
    ByteProvider <|.. ParserProvider
    ContentExtractor <|.. ParserProvider
    ContentExtractor <|.. StreamContentBuilder
    ContentExtractor <|.. SliceContentBuilder
    EscapeHandler <|.. StreamContentBuilder
    EscapeHandler <|.. SliceContentBuilder
    ByteProvider <|.. StreamContentBuilder
    ByteProvider <|.. SliceContentBuilder
Loading

Class diagram for removed and replaced fields in StreamParser and SliceParser

classDiagram
    class StreamParser {
        -Tokenizer tokenizer
        -ParserState parser_state
        -Reader reader
        -StreamBuffer stream_buffer
        -ProcessingState processing_state
        -UnicodeEscapeCollector unicode_escape_collector
        +ParserCore parser_core
        +StreamParserProvider provider
    }
    class SliceParser {
        -Tokenizer tokenizer
        -SliceInputBuffer buffer
        -ParserState parser_state
        -CopyOnEscape copy_on_escape
        -UnicodeEscapeCollector unicode_escape_collector
        +ParserCore parser_core
        +SliceContentBuilder content_builder
    }
Loading

Class diagram for new modules: parser_core, stream_content_builder, slice_content_builder

classDiagram
    class parser_core {
        +ParserCore
        +EscapeTiming
    }
    class stream_content_builder {
        +StreamContentBuilder
        +StreamContentBuilderWithFiller
        +StreamContentBuilderWithReader
    }
    class slice_content_builder {
        +SliceContentBuilder
    }
    parser_core ..> StreamContentBuilder : uses
    parser_core ..> SliceContentBuilder : uses
    StreamContentBuilderWithFiller o-- StreamContentBuilder
    StreamContentBuilderWithReader o-- StreamContentBuilder
Loading

Class diagram for ParserState changes

classDiagram
    class ParserState {
        -State state
        +[Option<Event>; 2] evts
    }
Loading

Class diagram for StreamParserProvider and its role

classDiagram
    class StreamParserProvider {
        +StreamContentBuilder content_builder
        +Reader reader
        +bool finished
        +new()
    }
    StreamParserProvider o-- StreamContentBuilder
    StreamParserProvider o-- Reader
Loading

File-Level Changes

Change Details Files
Introduce a unified ParserCore to handle shared parsing loop
  • Add parser_core.rs implementing ParserCore, ParserProvider trait, and EscapeTiming enum
  • Replace duplicated next_event_impl loops in stream_parser.rs and slice_parser.rs with calls to ParserCore
  • Wire ParserCore into both parser implementations
picojson/src/parser_core.rs
picojson/src/stream_parser.rs
picojson/src/slice_parser.rs
Refactor StreamParser to use ParserCore and a provider abstraction
  • Remove manual tokenizer, buffer, and state fields in StreamParser struct
  • Add StreamParserProvider to handle reader access and buffer management
  • Update next_event to invoke parser_core.next_event_impl_unified_with_accumulator
picojson/src/stream_parser.rs
Extract stream content and escape logic into StreamContentBuilder
  • Create stream_content_builder.rs with StreamContentBuilder handling buffer compaction, escape processing, and content extraction
  • Implement ContentExtractor and EscapeHandler for StreamContentBuilder and its reader wrapper
picojson/src/stream_content_builder.rs
Refactor SliceParser to use ParserCore and a content builder
  • Remove CopyOnEscape, tokenizer, and state fields from SliceParser struct
  • Add SliceContentBuilder for zero-copy string and number extraction
  • Delegate next_event to parser_core.next_event_impl_unified with OnBegin timing
picojson/src/slice_parser.rs
picojson/src/slice_content_builder.rs
Update event_processor and related utilities for new architecture
  • Add default begin_unicode_escape and extract_number methods to EscapeHandler and ContentExtractor traits
  • Adjust create_tokenizer_callback signature
  • Tidy parse_error Display impl and json_number formatting
  • Remove unused dev-dependencies and tweak Cargo.toml
picojson/src/event_processor.rs
picojson/src/parse_error.rs
picojson/src/json_number.rs
Cargo.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

coderabbitai bot commented Jul 15, 2025

Warning

Rate limit exceeded

@kaidokert has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 42 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between cdc45a9 and cc553ac.

📒 Files selected for processing (7)
  • picojson/src/escape_processor.rs (1 hunks)
  • picojson/src/event_processor.rs (6 hunks)
  • picojson/src/slice_content_builder.rs (1 hunks)
  • picojson/src/slice_parser.rs (4 hunks)
  • picojson/src/stream_buffer.rs (2 hunks)
  • picojson/src/stream_content_builder.rs (1 hunks)
  • picojson/src/stream_parser.rs (3 hunks)

Walkthrough

This change refactors the JSON parser by introducing unified core abstractions (ParserCore, SliceContentBuilder, StreamContentBuilder) for both slice-based and streaming parsing. It removes manual event, escape, and state management from SliceParser and StreamParser, delegating these to the new shared components. Several new modules and traits are added, and related tests are updated or removed.

Changes

File(s) Change Summary
picojson/Cargo.toml Removed log and test-log dependencies.
picojson/build.rs Suppressed non-snake-case warnings in generated tests; added #[allow(dead_code)] to some functions; updated string formatting style.
picojson/src/chunk_reader.rs Updated Reader trait implementation to use anonymous lifetime.
picojson/src/event_processor.rs Modified EscapeHandler and ContentExtractor traits; added new method signatures; simplified lifetime annotations; updated test mocks.
picojson/src/json_number.rs
picojson/src/parse_error.rs
Updated formatting macros to use Rust's named argument style.
picojson/src/lib.rs Added module declarations: parser_core, stream_content_builder, slice_content_builder.
picojson/src/number_parser.rs Added a blank line to a doc comment.
picojson/src/parser_core.rs New file: Introduced ParserCore, unified event loop, and ParserProvider trait.
picojson/src/shared.rs Removed state field from ParserState struct and its initialization.
picojson/src/slice_content_builder.rs New file: Added SliceContentBuilder struct with trait implementations for content extraction and escape handling.
picojson/src/slice_input_buffer.rs Added data_len method to SliceInputBuffer.
picojson/src/slice_parser.rs Refactored to use ParserCore and SliceContentBuilder, removed manual event and escape handling.
picojson/src/stream_buffer.rs Replaced wrapping addition with checked addition in buffer compaction; removed a debug print in tests.
picojson/src/stream_content_builder.rs New file: Added StreamContentBuilder for streaming parsing, with wrappers for reader integration and trait implementations.
picojson/src/stream_parser.rs Refactored to use ParserCore and StreamParserProvider; removed manual state, event, and escape management.
picojson/tests/debug_root_numbers.rs
picojson/tests/debug_stress_case.rs
Deleted: Removed test files for debugging root numbers and a stress case.
picojson/tests/stream_parser_stress_test.rs Added debug_unicode_issue test for debugging Unicode and escape handling in streaming parser.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Parser (SliceParser/StreamParser)
    participant ParserCore
    participant ContentBuilder (Slice/Stream)
    participant Reader/Buffer

    User->>Parser: next_event()
    Parser->>ParserCore: next_event_impl_unified(...)
    ParserCore->>ContentBuilder: extract content / handle escapes
    ParserCore->>ContentBuilder: request next byte
    ContentBuilder->>Reader/Buffer: fetch/consume byte
    ContentBuilder-->>ParserCore: return byte/content
    ParserCore-->>Parser: Event or Error
    Parser-->>User: Event or Error
Loading

Possibly related PRs

  • kaidokert/picojson-rs#55: Introduces the foundational event_processor module and shared traits that enable the unification and refactor in this PR.
  • kaidokert/picojson-rs#39: Improves StreamBuffer compaction and position tracking, related to buffer handling changes in this PR.

Poem

A rabbit hopped through parser fields,
Refactoring code for greater yields.
Escapes and buffers, once a mess,
Now unified for parsing success!
With builders new and core so bright,
JSON is parsed just right.
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @kaidokert, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a substantial refactor of the picojson library's parsing architecture. The primary goal is to eliminate code duplication between the SliceParser and StreamParser by consolidating their common event processing logic into a new, unified ParserCore. This change introduces a more modular design, abstracting input handling and content extraction into dedicated builder components, which significantly enhances maintainability and clarity of the parsing pipeline.

Highlights

  • Unified Parser Core: Introduced a new ParserCore module (picojson/src/parser_core.rs) that encapsulates the shared event processing loop previously duplicated in SliceParser and StreamParser. This centralizes the core parsing logic.
  • Content Builder Abstraction: Created dedicated content builder modules, SliceContentBuilder (picojson/src/slice_content_builder.rs) and StreamContentBuilder (picojson/src/stream_content_builder.rs), which implement the ContentExtractor and ByteProvider traits. These builders now handle parser-specific input management, string/key/number extraction, and escape sequence processing.
  • Simplified Parser Structures: The SliceParser and StreamParser structs have been significantly simplified. They now primarily compose the new ParserCore and their respective content builders/providers, delegating most of the parsing responsibilities to these new components.
  • Refined Escape Handling: The EscapeHandler trait and related functions in picojson/src/event_processor.rs have been updated to better accommodate the distinct escape processing timings required by slice-based (on Begin events for zero-copy) and stream-based (on End events for accumulation) parsing.
  • Dependency and Linting Cleanup: Removed unused test-log and log dependencies from Cargo.toml and added #[allow(dead_code)] attributes in build.rs for functions only used by the build script, improving code hygiene.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kaidokert - I've reviewed your changes - here's some feedback:

  • The two filler/provider structs (StreamContentBuilderWithFiller and StreamContentBuilderWithReader) have empty impl blocks and duplicate buffer-filling logic—please consolidate or remove them to avoid unused code.
  • The ContentExtractor impls duplicate number extraction in both extract_number_content and extract_number; consider defaulting one in the trait or merging them to cut down boilerplate.
  • There are multiple #[allow(dead_code)] / #[allow(non_snake_case)] attributes sprinkled across new modules—please audit and remove any that are no longer necessary.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The two filler/provider structs (`StreamContentBuilderWithFiller` and `StreamContentBuilderWithReader`) have empty impl blocks and duplicate buffer-filling logic—please consolidate or remove them to avoid unused code.
- The `ContentExtractor` impls duplicate number extraction in both `extract_number_content` and `extract_number`; consider defaulting one in the trait or merging them to cut down boilerplate.
- There are multiple `#[allow(dead_code)]` / `#[allow(non_snake_case)]` attributes sprinkled across new modules—please audit and remove any that are no longer necessary.

## Individual Comments

### Comment 1
<location> `picojson/src/stream_content_builder.rs:500` </location>
<code_context>
+    fn process_unicode_escape_with_collector(&mut self) -> Result<(), ParseError> {
</code_context>

<issue_to_address>
UTF-8 byte copying for escape sequences could be simplified.

Consider iterating directly over the UTF-8 bytes instead of copying them to a fixed-size array. If the copy is required, please document the reason.

Suggested implementation:

```rust
        // Shared Unicode escape processing pattern - process UTF-8 bytes directly to avoid unnecessary copying
        let utf8_bytes_result = {
            let current_pos = self.stream_buffer.current_position();
            let hex_slice_provider = |start, end| {
                self.stream_buffer
                    .get_string_slice(start, end)
                    .map_err(Into::into)

```

```rust
        // Shared Unicode escape processing pattern - process UTF-8 bytes directly to avoid unnecessary copying
        let utf8_bytes_result = {
            let current_pos = self.stream_buffer.current_position();
            let hex_slice_provider = |start, end| {
                self.stream_buffer
                    .get_string_slice(start, end)
                    .map_err(Into::into)
            };
            // Instead of copying to a fixed-size array, iterate directly over the UTF-8 bytes
            // If a copy is required for borrow checker reasons, document it here.
            // For now, we assume direct iteration is possible:
            // Example:
            // for b in utf8_bytes.iter() {
            //     // process each byte
            // }

```

If the rest of the function (not shown) still uses a fixed-size array for UTF-8 bytes, you should refactor it to use direct iteration over the byte slice returned by the string slice provider. If you encounter borrow checker issues that require a copy, add a comment explaining why the copy is necessary.
</issue_to_address>

### Comment 2
<location> `picojson/tests/stream_parser_stress_test.rs:217` </location>
<code_context>
+        }
     }

     #[test]
</code_context>

<issue_to_address>
New test added for debugging Unicode and escape handling in StreamParser.

Please add assertions to verify the parsed content and character codes, so the test can automatically detect regressions instead of relying on printed output.

Suggested implementation:

```rust
#[test]
fn debug_unicode_issue() {
    let json = br#"["a\nb\t\"\\c\u1234d"]"#;
    let reader = ChunkReader::new(json, 5);
    let mut buffer = [0u8; 30];

    let mut parser = StreamParser::new(reader, &mut buffer);
    let mut event_count = 0;
    let mut parsed_string = None;

    loop {
        let event = match parser.next_event() {

```

```rust
    loop {
        let event = match parser.next_event() {

```

```rust
        let event = match parser.next_event() {
            Ok(ev) => ev,
            Err(e) => panic!("parse error: {:?}", e),
        };
        if let Some(ev) = event {
            event_count += 1;
            // Capture the parsed string value
            if let picojson::Event::String(s) = &ev {
                parsed_string = Some(s.clone());
            }
        } else {
            break;
        }
    }

    // The expected string is: a\nb\t"\\c\u{1234}d
    let expected = "a\nb\t\"\\c\u{1234}d";
    let parsed = parsed_string.expect("No string parsed");
    assert_eq!(parsed, expected, "Parsed string does not match expected value");

    // Check the character codes
    let expected_codes: Vec<u32> = expected.chars().map(|c| c as u32).collect();
    let parsed_codes: Vec<u32> = parsed.chars().map(|c| c as u32).collect();
    assert_eq!(parsed_codes, expected_codes, "Parsed character codes do not match expected");

    // Optionally, check event count (should be 3: StartArray, String, EndArray)
    assert_eq!(event_count, 3, "Unexpected number of events");
}

```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

.mark_filled(bytes_read)
.map_err(crate::ParseError::from)?;
}
Ok(())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: UTF-8 byte copying for escape sequences could be simplified.

Consider iterating directly over the UTF-8 bytes instead of copying them to a fixed-size array. If the copy is required, please document the reason.

Suggested implementation:

        // Shared Unicode escape processing pattern - process UTF-8 bytes directly to avoid unnecessary copying
        let utf8_bytes_result = {
            let current_pos = self.stream_buffer.current_position();
            let hex_slice_provider = |start, end| {
                self.stream_buffer
                    .get_string_slice(start, end)
                    .map_err(Into::into)
        // Shared Unicode escape processing pattern - process UTF-8 bytes directly to avoid unnecessary copying
        let utf8_bytes_result = {
            let current_pos = self.stream_buffer.current_position();
            let hex_slice_provider = |start, end| {
                self.stream_buffer
                    .get_string_slice(start, end)
                    .map_err(Into::into)
            };
            // Instead of copying to a fixed-size array, iterate directly over the UTF-8 bytes
            // If a copy is required for borrow checker reasons, document it here.
            // For now, we assume direct iteration is possible:
            // Example:
            // for b in utf8_bytes.iter() {
            //     // process each byte
            // }

If the rest of the function (not shown) still uses a fixed-size array for UTF-8 bytes, you should refactor it to use direct iteration over the byte slice returned by the string slice provider. If you encounter borrow checker issues that require a copy, add a comment explaining why the copy is necessary.

}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): New test added for debugging Unicode and escape handling in StreamParser.

Please add assertions to verify the parsed content and character codes, so the test can automatically detect regressions instead of relying on printed output.

Suggested implementation:

#[test]
fn debug_unicode_issue() {
    let json = br#"["a\nb\t\"\\c\u1234d"]"#;
    let reader = ChunkReader::new(json, 5);
    let mut buffer = [0u8; 30];

    let mut parser = StreamParser::new(reader, &mut buffer);
    let mut event_count = 0;
    let mut parsed_string = None;

    loop {
        let event = match parser.next_event() {
    loop {
        let event = match parser.next_event() {
        let event = match parser.next_event() {
            Ok(ev) => ev,
            Err(e) => panic!("parse error: {:?}", e),
        };
        if let Some(ev) = event {
            event_count += 1;
            // Capture the parsed string value
            if let picojson::Event::String(s) = &ev {
                parsed_string = Some(s.clone());
            }
        } else {
            break;
        }
    }

    // The expected string is: a\nb\t"\\c\u{1234}d
    let expected = "a\nb\t\"\\c\u{1234}d";
    let parsed = parsed_string.expect("No string parsed");
    assert_eq!(parsed, expected, "Parsed string does not match expected value");

    // Check the character codes
    let expected_codes: Vec<u32> = expected.chars().map(|c| c as u32).collect();
    let parsed_codes: Vec<u32> = parsed.chars().map(|c| c as u32).collect();
    assert_eq!(parsed_codes, expected_codes, "Parsed character codes do not match expected");

    // Optionally, check event count (should be 3: StartArray, String, EndArray)
    assert_eq!(event_count, 3, "Unexpected number of events");
}

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a significant and well-executed refactoring that unifies the SliceParser and StreamParser logic into a shared ParserCore. The new architecture using ContentBuilder and Provider traits greatly reduces code duplication and improves maintainability. I've found a few issues, including a potential correctness bug in surrogate pair handling and some dead code. Once these are addressed, this will be a very solid improvement.

Comment on lines 165 to 187
// Fix: escape_start_pos should point to the backslash, not the 'u'
// But don't subtract if it's already pointing to the backslash
let actual_escape_start_pos = escape_start_pos;

// Handle UTF-8 bytes if we have them (not a high surrogate waiting for low surrogate)
if let Some(utf8_bytes) = utf8_bytes_opt {
if had_pending_high_surrogate {
// This is completing a surrogate pair - need to consume both escapes
// First call: consume the high surrogate (6 bytes earlier)
self.copy_on_escape
.handle_unicode_escape(actual_escape_start_pos, &[])?;
// Second call: consume the low surrogate and write UTF-8
self.copy_on_escape
.handle_unicode_escape(actual_escape_start_pos + 6, utf8_bytes)?;
} else {
// Single Unicode escape - normal processing
self.copy_on_escape
.handle_unicode_escape(actual_escape_start_pos, utf8_bytes)?;
}
}

Ok(())
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for handling surrogate pairs may be incorrect. The // Fix: comment on line 165 is not accompanied by any change in logic, as actual_escape_start_pos is just assigned the value of escape_start_pos. This is confusing and should be either implemented or removed. Also, the surrogate pair handling logic starting on line 171 seems flawed. When processing a low surrogate, actual_escape_start_pos points to its start. The high surrogate was 6 bytes before this. The current code calls handle_unicode_escape with actual_escape_start_pos and actual_escape_start_pos + 6, which seems incorrect.

The logic should probably be:

  • Handle the high surrogate at actual_escape_start_pos - 6.
  • Handle the low surrogate at actual_escape_start_pos and write the combined UTF-8 character.

Here is a suggested correction, which also removes the confusing actual_escape_start_pos variable:

        // Shared Unicode escape processing pattern
        let had_pending_high_surrogate = self.unicode_escape_collector.has_pending_high_surrogate();

        let mut utf8_buf = [0u8; 4];
        let (utf8_bytes_opt, escape_start_pos) =
            crate::escape_processor::process_unicode_escape_sequence(
                current_pos,
                &mut self.unicode_escape_collector,
                hex_slice_provider,
                &mut utf8_buf,
            )?;

        // Handle UTF-8 bytes if we have them (not a high surrogate waiting for low surrogate)
        if let Some(utf8_bytes) = utf8_bytes_opt {
            if had_pending_high_surrogate {
                // This is completing a surrogate pair.
                // First, handle the high surrogate at its original position (6 bytes before the current one).
                self.copy_on_escape
                    .handle_unicode_escape(escape_start_pos.saturating_sub(6), &[])?;
                // Then, handle the low surrogate and write the combined UTF-8 character.
                self.copy_on_escape
                    .handle_unicode_escape(escape_start_pos, utf8_bytes)?;
            } else {
                // Single Unicode escape - normal processing
                self.copy_on_escape
                    .handle_unicode_escape(escape_start_pos, utf8_bytes)?;
            }
        }

        Ok(())

}

// Update positions
let _old_tokenize_pos = self.tokenize_pos;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This variable _old_tokenize_pos is assigned but never used. It appears to be a leftover from debugging and should be removed to improve code clarity.

Comment on lines 303 to 459
/// Custom ByteProvider that can handle reader filling for StreamParser
pub struct StreamContentBuilderWithFiller<'a, 'b, R: crate::stream_parser::Reader> {
content_builder: &'a mut StreamContentBuilder<'b>,
reader: &'a mut R,
finished: &'a mut bool,
}

impl<R: crate::stream_parser::Reader> StreamContentBuilderWithFiller<'_, '_, R> {}

impl<R: crate::stream_parser::Reader> crate::shared::ByteProvider
for StreamContentBuilderWithFiller<'_, '_, R>
{
fn next_byte(&mut self) -> Result<Option<u8>, crate::ParseError> {
// If buffer is empty, try to fill it first
if self.content_builder.stream_buffer.is_empty() {
self.content_builder.fill_buffer_from_reader(self.reader)?;
}

// If still empty after fill attempt, we're at EOF
if self.content_builder.stream_buffer.is_empty() {
// Set finished flag when we reach end of stream
if !*self.finished {
*self.finished = true;
self.content_builder.set_finished(true);
}
return Ok(None);
}

// Get byte and advance
let byte = self.content_builder.stream_buffer.current_byte()?;
self.content_builder.stream_buffer.advance()?;
Ok(Some(byte))
}
}

/// Provider that can fill the buffer from a reader
pub struct StreamContentBuilderWithReader<'a, 'b, R: crate::stream_parser::Reader> {
pub content_builder: &'a mut StreamContentBuilder<'b>,
reader: &'a mut R,
finished: &'a mut bool,
}

impl<R: crate::stream_parser::Reader> StreamContentBuilderWithReader<'_, '_, R> {}

impl<R: crate::stream_parser::Reader> crate::shared::ByteProvider
for StreamContentBuilderWithReader<'_, '_, R>
{
fn next_byte(&mut self) -> Result<Option<u8>, crate::ParseError> {
// If buffer is empty, try to fill it first
if self.content_builder.stream_buffer.is_empty() {
if let Some(fill_slice) = self.content_builder.stream_buffer.get_fill_slice() {
let bytes_read = self
.reader
.read(fill_slice)
.map_err(|_| crate::ParseError::ReaderError)?;
self.content_builder
.stream_buffer
.mark_filled(bytes_read)
.map_err(crate::ParseError::from)?;
}
}

// If still empty after fill attempt, we're at EOF
if self.content_builder.stream_buffer.is_empty() {
// Set finished flag when we reach end of stream
if !*self.finished {
*self.finished = true;
self.content_builder.set_finished(true);
}
return Ok(None);
}

// Get byte and advance
let byte = self.content_builder.stream_buffer.current_byte()?;
self.content_builder.stream_buffer.advance()?;
Ok(Some(byte))
}
}

impl<R: crate::stream_parser::Reader> crate::event_processor::ContentExtractor
for StreamContentBuilderWithReader<'_, '_, R>
{
fn parser_state_mut(&mut self) -> &mut crate::shared::State {
self.content_builder.parser_state_mut()
}

fn current_position(&self) -> usize {
self.content_builder.current_position()
}

fn begin_string_content(&mut self, pos: usize) {
self.content_builder.begin_string_content(pos);
}

fn unicode_escape_collector_mut(
&mut self,
) -> &mut crate::escape_processor::UnicodeEscapeCollector {
self.content_builder.unicode_escape_collector_mut()
}

fn extract_string_content(
&mut self,
start_pos: usize,
) -> Result<crate::Event<'_, '_>, crate::ParseError> {
self.content_builder.extract_string_content(start_pos)
}

fn extract_key_content(
&mut self,
start_pos: usize,
) -> Result<crate::Event<'_, '_>, crate::ParseError> {
self.content_builder.extract_key_content(start_pos)
}

fn extract_number_content(
&mut self,
start_pos: usize,
from_container_end: bool,
) -> Result<crate::Event<'_, '_>, crate::ParseError> {
self.content_builder
.extract_number_content(start_pos, from_container_end)
}

fn extract_number(
&mut self,
start_pos: usize,
from_container_end: bool,
finished: bool,
) -> Result<crate::Event<'_, '_>, crate::ParseError> {
self.content_builder
.extract_number(start_pos, from_container_end, finished)
}
}

impl<R: crate::stream_parser::Reader> crate::event_processor::EscapeHandler
for StreamContentBuilderWithReader<'_, '_, R>
{
fn parser_state(&self) -> &crate::shared::State {
self.content_builder.parser_state()
}

fn process_unicode_escape_with_collector(&mut self) -> Result<(), crate::ParseError> {
self.content_builder.process_unicode_escape_with_collector()
}

fn handle_simple_escape_char(&mut self, escape_char: u8) -> Result<(), crate::ParseError> {
self.content_builder.handle_simple_escape_char(escape_char)
}

fn begin_escape_sequence(&mut self) -> Result<(), crate::ParseError> {
self.content_builder.begin_escape_sequence()
}

fn begin_unicode_escape(&mut self) -> Result<(), crate::ParseError> {
self.content_builder.begin_unicode_escape()
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The structs StreamContentBuilderWithFiller and StreamContentBuilderWithReader and their implementations appear to be unused, as StreamParser now uses StreamParserProvider to bridge the Reader and the ParserCore. These structs seem like leftovers from a previous stage of the refactoring and should be removed if they are indeed dead code to improve maintainability.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
picojson/src/parser_core.rs (1)

120-149: Consider refactoring duplicate escape token patterns.

The pattern matching for escape tokens is duplicated between the OnBegin and OnEnd cases. While the current implementation is clear with good comments explaining the different timing requirements, you could reduce duplication by extracting the token pattern into a helper function or constant.

+/// Helper to check if a token is a simple escape token
+fn is_simple_escape_token(token: &EventToken) -> Option<&EventToken> {
+    match token {
+        EventToken::EscapeQuote
+        | EventToken::EscapeBackslash
+        | EventToken::EscapeSlash
+        | EventToken::EscapeBackspace
+        | EventToken::EscapeFormFeed
+        | EventToken::EscapeNewline
+        | EventToken::EscapeCarriageReturn
+        | EventToken::EscapeTab => Some(token),
+        _ => None,
+    }
+}
+
                 ujson::Event::Begin(EventToken::EscapeSequence) => {
                     process_begin_escape_sequence_event(provider)?;
                 }
                 _ if process_unicode_escape_events(&taken, provider)? => {
                     // Unicode escape events handled by shared function
                 }
-                ujson::Event::Begin(
-                    escape_token @ (EventToken::EscapeQuote
-                    | EventToken::EscapeBackslash
-                    | EventToken::EscapeSlash
-                    | EventToken::EscapeBackspace
-                    | EventToken::EscapeFormFeed
-                    | EventToken::EscapeNewline
-                    | EventToken::EscapeCarriageReturn
-                    | EventToken::EscapeTab),
-                ) if escape_timing == EscapeTiming::OnBegin => {
+                ujson::Event::Begin(token) if escape_timing == EscapeTiming::OnBegin => {
+                    if let Some(escape_token) = is_simple_escape_token(token) {
                     // SliceParser-specific: Handle simple escape sequences on Begin events
                     // because CopyOnEscape requires starting unescaping immediately when
                     // the escape token begins to maintain zero-copy optimization
                     process_simple_escape_event(&escape_token, provider)?;
+                    }
                 }
-                ujson::Event::End(
-                    escape_token @ (EventToken::EscapeQuote
-                    | EventToken::EscapeBackslash
-                    | EventToken::EscapeSlash
-                    | EventToken::EscapeBackspace
-                    | EventToken::EscapeFormFeed
-                    | EventToken::EscapeNewline
-                    | EventToken::EscapeCarriageReturn
-                    | EventToken::EscapeTab),
-                ) if escape_timing == EscapeTiming::OnEnd => {
+                ujson::Event::End(token) if escape_timing == EscapeTiming::OnEnd => {
+                    if let Some(escape_token) = is_simple_escape_token(token) {
                     // StreamParser-specific: Handle simple escape sequences on End events
                     // because StreamBuffer must wait until the token ends to accumulate
                     // all bytes before processing the complete escape sequence
                     process_simple_escape_event(&escape_token, provider)?;
+                    }
                 }
picojson/src/stream_content_builder.rs (1)

84-119: Refactor repeated position update pattern.

The position update logic after compaction repeats the same pattern for each state variant. This could be simplified.

     fn update_positions_after_compaction(
         &mut self,
         compaction_offset: usize,
     ) -> Result<(), crate::ParseError> {
+        // Helper to safely subtract compaction offset
+        let adjust_position = |pos: &mut usize| {
+            *pos = pos.saturating_sub(compaction_offset);
+        };
+
         // Update positions - since we compact from the token start position,
         // positions should not be discarded in normal operation
         match &mut self.parser_state {
             crate::shared::State::None => {
                 // No position-based state to update
             }
-            crate::shared::State::Key(pos) => {
-                if *pos >= compaction_offset {
-                    *pos = pos.checked_sub(compaction_offset).unwrap_or(0);
-                } else {
-                    // This shouldn't happen since we compact from the token start
-                    *pos = 0;
-                }
-            }
-            crate::shared::State::String(pos) => {
-                if *pos >= compaction_offset {
-                    *pos = pos.checked_sub(compaction_offset).unwrap_or(0);
-                } else {
-                    // This shouldn't happen since we compact from the token start
-                    *pos = 0;
-                }
-            }
-            crate::shared::State::Number(pos) => {
-                if *pos >= compaction_offset {
-                    *pos = pos.checked_sub(compaction_offset).unwrap_or(0);
-                } else {
-                    // This shouldn't happen since we compact from the token start
-                    *pos = 0;
-                }
-            }
+            crate::shared::State::Key(pos) 
+            | crate::shared::State::String(pos) 
+            | crate::shared::State::Number(pos) => {
+                adjust_position(pos);
+            }
         }
         Ok(())
     }
picojson/src/stream_parser.rs (1)

78-87: Consider making the documentation more concise.

The current comments are somewhat repetitive. A more concise version would improve readability:

-/// Implement the required traits for StreamParser to work with unified ParserCore
-/// This provider handles the StreamParser-specific operations needed by the unified parser core
-/// It bridges the gap between the generic ParserCore and the StreamParser's specific requirements
-/// for streaming input and buffer management
-/// The provider contains mutable references to the StreamParser's internal state
-/// which allows the unified parser core to control the parsing process
+/// Provider that bridges StreamParser with the unified ParserCore.
+/// Handles streaming-specific operations like buffer management and reader access.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6be34a4 and cdc45a9.

📒 Files selected for processing (19)
  • picojson/Cargo.toml (0 hunks)
  • picojson/build.rs (3 hunks)
  • picojson/src/chunk_reader.rs (1 hunks)
  • picojson/src/event_processor.rs (5 hunks)
  • picojson/src/json_number.rs (1 hunks)
  • picojson/src/lib.rs (1 hunks)
  • picojson/src/number_parser.rs (1 hunks)
  • picojson/src/parse_error.rs (1 hunks)
  • picojson/src/parser_core.rs (1 hunks)
  • picojson/src/shared.rs (0 hunks)
  • picojson/src/slice_content_builder.rs (1 hunks)
  • picojson/src/slice_input_buffer.rs (1 hunks)
  • picojson/src/slice_parser.rs (4 hunks)
  • picojson/src/stream_buffer.rs (1 hunks)
  • picojson/src/stream_content_builder.rs (1 hunks)
  • picojson/src/stream_parser.rs (4 hunks)
  • picojson/tests/debug_root_numbers.rs (0 hunks)
  • picojson/tests/debug_stress_case.rs (0 hunks)
  • picojson/tests/stream_parser_stress_test.rs (1 hunks)
💤 Files with no reviewable changes (4)
  • picojson/src/shared.rs
  • picojson/tests/debug_root_numbers.rs
  • picojson/Cargo.toml
  • picojson/tests/debug_stress_case.rs
🧰 Additional context used
🧠 Learnings (16)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
picojson/src/lib.rs (4)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
picojson/src/json_number.rs (2)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
picojson/src/slice_input_buffer.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
picojson/src/chunk_reader.rs (2)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
picojson/src/parse_error.rs (3)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
picojson/src/stream_buffer.rs (5)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
picojson/tests/stream_parser_stress_test.rs (6)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
picojson/src/number_parser.rs (5)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
picojson/build.rs (3)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.
picojson/src/event_processor.rs (7)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
picojson/src/parser_core.rs (7)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
picojson/src/stream_content_builder.rs (3)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
picojson/src/slice_parser.rs (7)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
picojson/src/stream_parser.rs (7)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
picojson/src/slice_content_builder.rs (4)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
🧬 Code Graph Analysis (2)
picojson/src/event_processor.rs (3)
picojson/src/stream_content_builder.rs (4)
  • begin_unicode_escape (456-458)
  • begin_unicode_escape (493-498)
  • extract_number (267-282)
  • extract_number (426-434)
picojson/src/stream_parser.rs (2)
  • begin_unicode_escape (142-144)
  • extract_number (183-191)
picojson/src/slice_content_builder.rs (1)
  • extract_number (106-129)
picojson/src/stream_content_builder.rs (7)
picojson/src/stream_buffer.rs (3)
  • new (73-80)
  • current_position (236-238)
  • current_position (1007-1009)
picojson/src/number_parser.rs (4)
  • new (93-99)
  • current_position (22-22)
  • current_position (110-112)
  • parse_number_with_delimiter_logic (42-55)
picojson/src/stream_parser.rs (17)
  • new (46-48)
  • new (91-97)
  • new (258-260)
  • parser_state_mut (148-150)
  • current_position (152-154)
  • begin_string_content (156-158)
  • unicode_escape_collector_mut (160-164)
  • extract_string_content (166-168)
  • extract_key_content (170-172)
  • extract_number_content (174-181)
  • extract_number (183-191)
  • next_byte (101-122)
  • parser_state (126-128)
  • process_unicode_escape_with_collector (130-132)
  • handle_simple_escape_char (134-136)
  • begin_escape_sequence (138-140)
  • begin_unicode_escape (142-144)
picojson/src/escape_processor.rs (2)
  • new (200-206)
  • process_unicode_escape_sequence (659-700)
picojson/src/event_processor.rs (25)
  • new (425-431)
  • parser_state_mut (108-108)
  • parser_state_mut (457-459)
  • current_position (102-102)
  • current_position (449-451)
  • begin_string_content (105-105)
  • begin_string_content (453-455)
  • unicode_escape_collector_mut (112-114)
  • unicode_escape_collector_mut (461-465)
  • extract_string_content (117-120)
  • extract_string_content (467-472)
  • extract_key_content (123-126)
  • extract_key_content (474-479)
  • extract_number_content (129-133)
  • extract_number_content (481-487)
  • extract_number (141-146)
  • extract_number (489-496)
  • parser_state (14-14)
  • parser_state (435-437)
  • process_unicode_escape_with_collector (17-17)
  • process_unicode_escape_with_collector (439-441)
  • handle_simple_escape_char (20-20)
  • handle_simple_escape_char (443-445)
  • begin_escape_sequence (24-26)
  • begin_unicode_escape (30-32)
picojson/src/shared.rs (5)
  • new (62-66)
  • from_utf8 (181-183)
  • string_content_bounds_from_content_start (118-124)
  • end_position_excluding_delimiter (149-151)
  • next_byte (178-178)
picojson/src/parse_error.rs (4)
  • from (39-46)
  • from (50-61)
  • from (65-67)
  • from (71-73)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run AVR Test Suites
🔇 Additional comments (24)
picojson/build.rs (3)

82-82: LGTM: Improved lint suppression scope

The change from #[allow(clippy::non_snake_case)] to #![allow(non_snake_case)] is appropriate as it applies to the entire generated module, ensuring consistent test naming that matches the original JSON test suite conventions.


133-133: LGTM: Appropriate dead code suppression

The #[allow(dead_code)] attributes are correctly applied to utility functions that are part of the build script infrastructure but may not be actively used in all configurations.

Also applies to: 168-168, 174-174, 180-180


161-161: LGTM: Modern string formatting syntax

The update to use format!("{result}_{count}") follows Rust's modern formatting style introduced in 1.58+, improving code readability.

picojson/src/number_parser.rs (1)

64-64: LGTM: Minor documentation formatting improvement

The added blank line improves the readability of the documentation comment without affecting functionality.

picojson/src/slice_input_buffer.rs (1)

57-61: LGTM: Useful API addition

The new data_len() method provides a clean way to access the underlying data length for bounds checking, which supports the refactored parsing architecture.

picojson/src/json_number.rs (1)

146-146: LGTM: Modern string formatting syntax

The updates to use named arguments in format strings ({val} instead of "{}", val) follow modern Rust formatting conventions and improve code readability.

Also applies to: 148-148

picojson/src/chunk_reader.rs (1)

122-122: LGTM: Simplified lifetime annotation

The change from explicit lifetime parameter 'a to anonymous lifetime '_ is appropriate here since the lifetime doesn't need to be explicitly referenced within the implementation.

picojson/src/lib.rs (1)

66-67: LGTM: Clean module additions for unified parser architecture

The new module declarations (parser_core, stream_content_builder, slice_content_builder) are well-organized and align with the PR objective of consolidating to a single parser. The placement within the existing module hierarchy is appropriate.

Also applies to: 70-71, 74-75

picojson/src/parse_error.rs (1)

79-81: LGTM: Modern formatting syntax improvements

The updates to use Rust's more concise formatting syntax ({e} instead of {}", e) are clean modernizations that improve readability without changing functionality.

picojson/src/stream_buffer.rs (2)

149-156: Excellent safety improvement: Replace wrapping addition with checked addition

The change from wrapping_add to checked_add with proper error handling significantly improves safety by preventing silent integer overflow that could lead to memory safety issues. Returning InvalidSliceBounds error on overflow is the correct approach for bounds validation.


164-164: LGTM: Intentional unused variable for debugging

The _old_tokenize_pos variable properly captures the previous value and is appropriately marked as unused. This appears to be intentional for debugging or future use.

picojson/tests/stream_parser_stress_test.rs (1)

217-262: LGTM: Comprehensive Unicode escape test for streaming parser

The new debug_unicode_issue test function provides excellent coverage for Unicode escape handling in the streaming parser. The test setup with chunked reading (5-byte chunks, 30-byte buffer) and detailed debug output for character analysis is well-designed for validating the unified parser architecture's escape sequence handling.

picojson/src/event_processor.rs (5)

28-32: LGTM: Well-designed trait method with sensible default

The new begin_unicode_escape method in the EscapeHandler trait is well-designed with clear documentation and a sensible default no-op implementation. This allows parsers that don't need special Unicode escape handling to use the default while providing flexibility for those that do.


135-146: LGTM: Comprehensive number extraction method

The new extract_number method in the ContentExtractor trait is well-documented with clear parameter descriptions. The method signature properly handles the differences between slice and stream parsing (the finished parameter for StreamParser-specific logic) while maintaining a unified interface.


205-207: LGTM: Simplified lifetime parameters improve readability

The simplification of lifetime parameters in create_tokenizer_callback using anonymous lifetimes ('_) improves readability while maintaining the same functionality. This aligns with modern Rust style guidelines.


280-280: LGTM: Properly guarded method call

The call to begin_unicode_escape() is properly guarded by the state check ensuring it only executes when inside a string or key context. This maintains the same pattern as other escape handling code.


489-496: LGTM: Appropriate mock implementation

The stub implementation of extract_number in the mock is appropriate for testing. Using unimplemented!() with a clear message is the correct approach for mock methods that aren't exercised in tests.

picojson/src/slice_content_builder.rs (1)

85-129: Excellent defensive programming in number extraction.

The bounds clamping logic using core::cmp::min() prevents slice bounds errors when parsing standalone numbers at document end. The explicit acknowledgment of the unused finished parameter with clear explanation is also good practice.

picojson/src/stream_parser.rs (6)

3-38: Excellent simplification of StreamParser structure.

The consolidation from 6 internal fields to just 2 (parser_core and provider) significantly reduces state management complexity and aligns well with the unified parser architecture.


193-208: Good handling of EOF for number parsing.

The override of validate_and_extract_number correctly uses the finished state to handle number parsing at end-of-document, which is essential for streaming parsers.


210-226: Clean delegation to unified parser core.

The implementation properly delegates to parser_core.next_event_impl_unified_with_accumulator with appropriate timing (EscapeTiming::OnEnd) for streaming parsers. The byte accumulation closure correctly delegates to the content builder.


232-245: Correct implementation of PullParser trait.

The implementation properly checks the finished flag and applies unescaped reset before delegating to next_event_impl, maintaining consistency with the SliceParser pattern.


1203-1214: Good conversion to assertion-based testing.

The change from printing to assertions improves test automation and makes failures more explicit.


289-1282: Comprehensive test coverage for the refactored StreamParser.

The test suite thoroughly covers:

  • Basic JSON structures (objects, arrays, primitives)
  • Escape sequence processing including Unicode
  • Number parsing with various configurations
  • Edge cases like buffer boundaries and EOF handling
  • Error recovery scenarios

This extensive coverage provides confidence in the refactored implementation.

Ok(())
// Get byte and advance
let byte = self.content_builder.stream_buffer().current_byte()?;
let _current_pos = self.content_builder.stream_buffer().current_position();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused variable.

The _current_pos variable is created but never used.

-        let _current_pos = self.content_builder.stream_buffer().current_position();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let _current_pos = self.content_builder.stream_buffer().current_position();
🤖 Prompt for AI Agents
In picojson/src/stream_parser.rs at line 119, the variable _current_pos is
declared but never used. Remove the declaration of _current_pos entirely to
clean up the code and avoid unused variable warnings.

@kaidokert kaidokert closed this Jul 15, 2025
@kaidokert kaidokert deleted the big2 branch July 16, 2025 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant