-
Notifications
You must be signed in to change notification settings - Fork 2
Re-introduce push parser #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update introduces a comprehensive SAX-style incremental JSON push parser to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PushParser
participant Handler
participant ContentExtractor
User->>PushParser: write(chunk)
PushParser->>ContentExtractor: set_chunk(chunk)
loop While events available
PushParser->>ContentExtractor: extract next event
ContentExtractor-->>PushParser: Event or Error
PushParser->>Handler: handle_event(Event)
Handler-->>PushParser: Ok/Error
end
PushParser->>ContentExtractor: reset_input()
User->>PushParser: finish()
PushParser->>Handler: handle_event(EndDocument)
Handler-->>PushParser: Ok
PushParser-->>User: Handler (finalized)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90 minutes Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Reviewer's GuideThis PR reintroduces the PushParser by extending the core parser to handle chunked input via a new PushContentExtractor and flags in ParserCore, updates library exports accordingly, and complements these changes with extensive conformance, escape, stress, and buffer-isolation tests while enhancing shared utilities and stream buffer capabilities. Sequence diagram for PushParser chunked input processingsequenceDiagram
actor User
participant PushParser
participant PushContentExtractor
participant ParserCore
participant Handler as "PushParserHandler"
User->>PushParser: write(chunk)
PushParser->>PushContentExtractor: set_chunk(chunk)
loop for each event in chunk
PushParser->>ParserCore: next_event_impl_with_flags(...)
ParserCore->>PushContentExtractor: handle_byte_accumulation(byte)
ParserCore-->>PushParser: Event or EndOfData
alt Event
PushParser->>Handler: handle_event(event)
end
end
PushParser->>PushContentExtractor: reset_input()
PushParser->>PushContentExtractor: add_position_offset(chunk.len())
Class diagram for PushParser and related typesclassDiagram
class PushParser {
- PushContentExtractor extractor
- H handler
- ParserCore core
+ new(handler, buffer)
+ write(data)
+ finish()
}
class PushContentExtractor {
- StreamBuffer stream_buffer
- State parser_state
- UnicodeEscapeCollector unicode_escape_collector
- bool unescaped_reset_queued
- usize position_offset
- usize current_position
- usize token_start_pos
- bool using_unescaped_buffer
- &'input [u8] current_chunk
- usize chunk_cursor
- bool in_unicode_escape
- bool in_simple_escape
+ new(buffer)
+ set_chunk(chunk)
+ reset_input()
+ ...
}
class PushParserHandler {
<<trait>>
+ handle_event(event)
}
class ParserCore {
- Tokenizer tokenizer
- ParserState parser_state
- bool in_escape_sequence
- bool handles_chunked_input
+ new()
+ new_chunked()
+ next_event_impl_with_flags(...)
}
class PushParseError {
<<enum>>
+ Parse(ParseError)
+ Handler(E)
}
PushParser --> PushContentExtractor : uses
PushParser --> PushParserHandler : uses
PushParser --> ParserCore : uses
PushContentExtractor --> StreamBuffer : uses
PushContentExtractor --> UnicodeEscapeCollector : uses
PushContentExtractor --> State : uses
ParserCore --> Tokenizer : uses
ParserCore --> ParserState : uses
PushParserHandler <|.. JsonEventPrinter : implements
PushParser o-- PushParseError : error handling
Class diagram for PushContentExtractor escape and buffer logicclassDiagram
class PushContentExtractor {
- StreamBuffer stream_buffer
- UnicodeEscapeCollector unicode_escape_collector
- bool in_unicode_escape
- bool in_simple_escape
- bool using_unescaped_buffer
+ handle_byte_accumulation(byte)
+ begin_escape_sequence()
+ begin_unicode_escape()
+ process_simple_escape_event(...)
+ copy_partial_content_to_scratch()
+ truncate_unescaped_by(count)
}
PushContentExtractor --> StreamBuffer : manages
PushContentExtractor --> UnicodeEscapeCollector : manages
Class diagram for ParserCore chunked input supportclassDiagram
class ParserCore {
- bool handles_chunked_input
+ new()
+ new_chunked()
+ next_event_impl_with_flags(...)
}
ParserCore : +handles_chunked_input
ParserCore : +new_chunked() for PushParser
ParserCore : +new() for non-chunked parsers
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @kaidokert, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the picojson
library by re-introducing a PushParser
. This new parser provides a flexible and efficient way to process JSON data incrementally, which is crucial for applications dealing with large or streaming JSON inputs. The changes involve core parser modifications, a new content extraction mechanism, and a thorough set of tests to validate its behavior under diverse conditions.
Highlights
- New Push Parser: I've re-introduced and implemented a new
PushParser
for SAX-style, event-driven JSON parsing. This parser allows for incremental processing of JSON data, making it suitable for streaming scenarios where the full input is not available at once. - Content Extraction and Buffer Management: A new
PushContentExtractor
has been added to manage the parsing state and handle content extraction, including complex escape sequences and tokens that span across multiple input chunks. It utilizes a scratch buffer for unescaped content and implements a 'copy-on-escape' optimization to minimize memory allocations. - Core Parser Enhancements: The underlying
ParserCore
has been updated to differentiate between chunked and non-chunked input sources, allowing thePushParser
to signalEndOfData
when a chunk is exhausted, rather than prematurely ending the document. This enables seamless incremental parsing. - Comprehensive Testing: Extensive new test suites have been added, including dedicated tests for the
PushParser
's functionality, escape handling, copy-on-escape optimization, and robust stress tests covering various buffer sizes, chunking patterns, and edge cases to ensure stability and correctness.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kaidokert - I've reviewed your changes - here's some feedback:
- Consider refactoring the conformance and stress tests into table-driven or parameterized tests instead of generating dozens of near-duplicate functions to reduce boilerplate and ease future maintenance.
- The handles_chunked_input flag in ParserCore adds branching for chunked vs non-chunked flows; splitting those behaviors into dedicated parser core implementations could simplify the control flow and make each path more self-contained.
- The PushContentExtractor and escape-handling logic have grown highly intricate—breaking out escape processing, byte accumulation, and content extraction into smaller, well-documented modules would improve readability and maintainability.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider refactoring the conformance and stress tests into table-driven or parameterized tests instead of generating dozens of near-duplicate functions to reduce boilerplate and ease future maintenance.
- The handles_chunked_input flag in ParserCore adds branching for chunked vs non-chunked flows; splitting those behaviors into dedicated parser core implementations could simplify the control flow and make each path more self-contained.
- The PushContentExtractor and escape-handling logic have grown highly intricate—breaking out escape processing, byte accumulation, and content extraction into smaller, well-documented modules would improve readability and maintainability.
## Individual Comments
### Comment 1
<location> `picojson/src/push_parser.rs:192` </location>
<code_context>
+
+// Implement From<ParseError> for common error types used in tests
+// This needs to be globally accessible for integration tests, not just unit tests
+impl From<ParseError> for () {
+ fn from(_: ParseError) -> Self {}
+}
</code_context>
<issue_to_address>
Empty implementation of From<ParseError> for () may hide errors.
This implementation discards error details, making debugging harder. If required, please document the reasoning.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request re-introduces the push parser. The implementation is thorough, and the addition of a comprehensive test suite, including extensive stress tests, is a fantastic contribution that builds confidence in the new functionality.
I've identified a few areas for improvement, primarily related to error handling in the new example, and a bug in one of the new tests. These are small refinements that will help improve the overall quality and correctness of the codebase. Great work!
let mut parser = PushParser::<_, DefaultConfig>::new(handler, &mut buffer); | ||
|
||
// Test: simple string followed by escaped string | ||
parser.write(br#"{"simple": "esc\\n"}"#).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an issue in the test_buffer_isolation
test case. The input JSON string "esc\n"
is parsed as esc
followed by a newline character. However, the test asserts that the result is esc
followed by a literal backslash and n
.
The comment // JSON "esc\\n" should become string "esc\n"
suggests the intent was to test a literal backslash. To achieve this, the JSON string needs to be "esc\\n"
, which requires escaping the backslash in the Rust byte string.
parser.write(br#"{"simple": "esc\\n"}"#).unwrap(); | |
parser.write(br#"{\"simple\": \"esc\\\\n\"}"#).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (5)
picojson/tests/input_buffer_full_test.rs (1)
1-2
: Add SPDX license header for consistency.Other test files in the project include the SPDX license header.
+// SPDX-License-Identifier: Apache-2.0 + // Test for InputBufferFull error variant use picojson::{ParseError, PullParser, StreamParser};picojson/src/push_parser.rs (2)
107-119
: Consider refactoring the special case handling for Numbers.The nested conditions and special case for Number state handling adds complexity. The comment mentions "stale flag from previous Key processing" which suggests there might be a state management issue.
Consider extracting this logic into a separate method with clearer documentation about why Numbers need special handling:
- // Special case: For Numbers, check if the scratch buffer is actually empty - // This handles the byte-by-byte case where the flag is stale from previous Key processing - if matches!(extractor_state, State::Number(_)) { - let buffer_slice = self.extractor.get_unescaped_slice().unwrap_or(&[]); - let buffer_empty = buffer_slice.is_empty(); - - if buffer_empty { - self.extractor.copy_partial_content_to_scratch()?; - } - } + self.handle_number_boundary_case(&extractor_state)?;
191-194
: Remove or document the generic unit conversion.The
From<ParseError> for ()
implementation seems like a convenience for tests but could be confusing in production code.Either remove this if it's only for tests, or add clear documentation:
-// Implement From<ParseError> for common error types used in tests -// This needs to be globally accessible for integration tests, not just unit tests +#[cfg(test)] impl From<ParseError> for () { fn from(_: ParseError) -> Self {} }picojson/tests/push_parser.rs (1)
283-324
: Consider validating parsed outputWhile this test correctly validates that problematic lines from pass1.json parse without errors, it would be more robust to also verify the actual parsed content in the TraceHandler events.
picojson/tests/json_checker_tests.rs (1)
51-71
: Error handling uses workaround for non-exported typeThe implementation is correct, but using
InvalidNumber
as a placeholder for handler errors (line 61) is a workaround. Consider exposing a more appropriate error variant for handler errors in the public API.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
picojson/examples/push_parser_demo.rs
(1 hunks)picojson/src/event_processor.rs
(2 hunks)picojson/src/lib.rs
(1 hunks)picojson/src/push_content_builder.rs
(1 hunks)picojson/src/push_parser.rs
(1 hunks)picojson/src/shared.rs
(1 hunks)picojson/src/stream_buffer.rs
(1 hunks)picojson/tests/input_buffer_full_test.rs
(1 hunks)picojson/tests/json_checker_tests.rs
(5 hunks)picojson/tests/push_parser.rs
(1 hunks)picojson/tests/push_parser_copy_on_escape.rs
(1 hunks)picojson/tests/push_parser_escapes.rs
(1 hunks)picojson/tests/push_parser_invalidslicebounds_repro.rs
(1 hunks)picojson/tests/push_parser_stress_test.rs
(1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
📚 Learning: in the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and ful...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Applied to files:
picojson/src/stream_buffer.rs
picojson/src/lib.rs
picojson/tests/push_parser_escapes.rs
picojson/examples/push_parser_demo.rs
picojson/tests/push_parser_copy_on_escape.rs
picojson/tests/push_parser.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
picojson/tests/push_parser_stress_test.rs
picojson/tests/push_parser_invalidslicebounds_repro.rs
picojson/tests/input_buffer_full_test.rs
picojson/src/push_content_builder.rs
picojson/tests/json_checker_tests.rs
📚 Learning: in the picojson-rs project, the bitstack trait was redesigned to return bool instead of option...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Applied to files:
picojson/src/stream_buffer.rs
picojson/src/lib.rs
picojson/tests/push_parser_escapes.rs
picojson/src/shared.rs
picojson/examples/push_parser_demo.rs
picojson/tests/push_parser_copy_on_escape.rs
picojson/tests/push_parser.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
picojson/tests/push_parser_stress_test.rs
picojson/tests/push_parser_invalidslicebounds_repro.rs
picojson/tests/json_checker_tests.rs
📚 Learning: the pullparser trait in picojson-rs provides both next() and next_event() methods. the next() method...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Applied to files:
picojson/src/lib.rs
picojson/tests/push_parser_escapes.rs
picojson/examples/push_parser_demo.rs
picojson/tests/push_parser_copy_on_escape.rs
picojson/tests/push_parser.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
picojson/tests/push_parser_stress_test.rs
picojson/tests/push_parser_invalidslicebounds_repro.rs
picojson/tests/input_buffer_full_test.rs
picojson/src/push_content_builder.rs
picojson/tests/json_checker_tests.rs
📚 Learning: in rust crate organization, functions can be imported into lib.rs via `use` statements not for direc...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.
Applied to files:
picojson/src/lib.rs
📚 Learning: in json parsing tests for picojson-rs, when testing unicode escape sequences, raw string literals us...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Applied to files:
picojson/tests/push_parser_escapes.rs
picojson/examples/push_parser_demo.rs
picojson/tests/push_parser_copy_on_escape.rs
picojson/tests/push_parser.rs
picojson/src/push_parser.rs
picojson/tests/push_parser_stress_test.rs
picojson/tests/push_parser_invalidslicebounds_repro.rs
picojson/tests/input_buffer_full_test.rs
picojson/src/push_content_builder.rs
picojson/tests/json_checker_tests.rs
📚 Learning: in the stax json parser codebase, escapesequence event handlers exist in flex_parser.rs not because ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Applied to files:
picojson/tests/push_parser_escapes.rs
picojson/examples/push_parser_demo.rs
picojson/tests/push_parser_copy_on_escape.rs
picojson/tests/push_parser.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
picojson/tests/push_parser_stress_test.rs
picojson/tests/push_parser_invalidslicebounds_repro.rs
picojson/tests/input_buffer_full_test.rs
picojson/src/push_content_builder.rs
picojson/tests/json_checker_tests.rs
📚 Learning: in picojson-rs event processing, the sliceparser uses an if/else pattern with process_simple_events(...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Applied to files:
picojson/tests/push_parser_escapes.rs
picojson/examples/push_parser_demo.rs
picojson/tests/push_parser_copy_on_escape.rs
picojson/tests/push_parser.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
picojson/tests/push_parser_stress_test.rs
picojson/tests/push_parser_invalidslicebounds_repro.rs
picojson/tests/input_buffer_full_test.rs
picojson/src/push_content_builder.rs
picojson/tests/json_checker_tests.rs
📚 Learning: in picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Applied to files:
picojson/tests/push_parser_escapes.rs
picojson/examples/push_parser_demo.rs
picojson/tests/push_parser_copy_on_escape.rs
picojson/tests/push_parser.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
picojson/tests/push_parser_stress_test.rs
picojson/tests/push_parser_invalidslicebounds_repro.rs
picojson/tests/input_buffer_full_test.rs
picojson/src/push_content_builder.rs
picojson/tests/json_checker_tests.rs
📚 Learning: in picojson-rs sliceparser, is_empty() and is_past_end() serve different purposes: is_empty() return...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Applied to files:
picojson/tests/push_parser_escapes.rs
picojson/examples/push_parser_demo.rs
picojson/tests/push_parser_copy_on_escape.rs
picojson/tests/push_parser.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
picojson/tests/push_parser_stress_test.rs
picojson/tests/push_parser_invalidslicebounds_repro.rs
picojson/tests/input_buffer_full_test.rs
picojson/src/push_content_builder.rs
picojson/tests/json_checker_tests.rs
🧬 Code Graph Analysis (2)
picojson/src/stream_buffer.rs (1)
picojson/src/push_content_builder.rs (1)
truncate_unescaped_by
(130-132)
picojson/tests/json_checker_tests.rs (5)
picojson/examples/push_parser_demo.rs (4)
handle_event
(25-65)parser
(111-112)parser
(119-120)new
(12-17)picojson/tests/push_parser.rs (22)
handle_event
(12-14)handle_event
(36-49)handle_event
(72-82)handle_event
(111-123)handle_event
(157-168)handle_event
(202-213)handle_event
(247-257)handle_event
(291-302)handle_event
(334-346)handle_event
(431-442)handle_event
(467-482)parser
(25-25)parser
(58-58)parser
(93-93)parser
(132-132)parser
(177-177)parser
(222-222)parser
(266-266)parser
(496-496)parser
(552-552)parser
(582-582)parser
(688-688)picojson/src/push_parser.rs (1)
new
(40-46)picojson/src/stream_content_builder.rs (1)
new
(30-39)picojson/src/chunk_reader.rs (1)
full_slice
(113-119)
🔇 Additional comments (47)
picojson/src/stream_buffer.rs (1)
258-261
: LGTM! Safe buffer truncation implementation.The method correctly uses
saturating_sub
to prevent underflow and follows the defensive programming patterns established in this codebase. The implementation is simple, safe, and serves the push parser's content extraction needs effectively.picojson/src/shared.rs (1)
248-254
: LGTM! Clean abstraction for uniform byte access.The method provides a clean abstraction for accessing underlying bytes regardless of whether content comes from input or scratch buffers. The lifetime annotation is correct and the implementation follows the established pattern of the
ContentPiece
enum.picojson/src/lib.rs (1)
101-105
: LGTM! Well-structured API extension.The push parser API additions are properly integrated with the existing module structure and follow established patterns. The exports provide a clean interface for the new SAX-style parsing functionality.
picojson/examples/push_parser_demo.rs (1)
1-131
: Excellent demonstration of push parser capabilities!This example effectively showcases the push parser's SAX-style event handling with clean, educational code. The incremental chunk feeding clearly demonstrates streaming capabilities, error handling follows best practices, and the emoji-enhanced output makes it engaging and easy to follow.
picojson/tests/push_parser_escapes.rs (1)
1-178
: Comprehensive escape sequence test coverage!These tests provide excellent coverage of escape sequence handling, properly distinguishing between literal backslashes and actual escape sequences. The test cases cover all major escape types (basic, quotes, Unicode) and test both keys and values. The EventCollector pattern makes assertions clean and maintainable.
picojson/src/event_processor.rs (3)
25-28
: LGTM! Clear documentation for the new field.The documentation clearly explains the purpose of
handles_chunked_input
and its effect on parser behavior.
31-49
: Well-structured constructors with clear purpose.The two constructors clearly differentiate between chunked and non-chunked parser initialization. Good use of descriptive names.
104-120
: Correct handling of chunked vs non-chunked parser behavior.The conditional logic properly differentiates between parser types at end of input. For chunked parsers, returning
EndOfData
allows proper chunk boundary handling, while non-chunked parsers correctly finalize the document.picojson/tests/push_parser_copy_on_escape.rs (3)
8-52
: Well-structured test for borrowed string optimization.The test correctly verifies that strings without escape sequences are returned as borrowed references, which is an important performance optimization.
158-165
: Correct validation of JSON escape sequence decoding.The test and comment correctly explain that JSON
"esc\\n"
(double backslash) should decode to the string"esc\n"
(literal backslash + n), not a newline character. This is the expected JSON behavior.
84-84
: Escape sequences are correct and consistent—LGTM.The use of
\\n
and\\t
in the raw byte string literal matches other parser tests (e.g. push_parser.rs), confirming that JSON escape sequences are formatted properly across the codebase. Approved.picojson/tests/input_buffer_full_test.rs (1)
40-86
: Good test coverage for buffer overflow scenarios.The test correctly validates that the parser reports appropriate buffer overflow errors when processing large JSON documents with insufficient buffer space. The error handling for both
InputBufferFull
andScratchBufferFull
is appropriate.picojson/tests/push_parser_invalidslicebounds_repro.rs (2)
8-66
: Excellent test handler design for event validation.The
ReproHandler
provides thorough validation by comparing events against expected sequences and ensuring all events are received. Theevent_to_string
method provides clear event representation for debugging.
125-131
: Good chunked parsing test strategy.Writing in small 8-byte chunks effectively stress-tests the parser's buffer boundary handling, which is crucial for validating the fix for InvalidSliceBounds errors.
picojson/src/push_parser.rs (2)
23-33
: Well-designed push parser structure.The generic parameters are clearly documented, and the structure properly separates concerns between content extraction, event handling, and core parsing logic.
69-71
: Correct use ofalways_accumulate_during_escapes
flag.Setting this flag to
true
ensures that all hex digits in Unicode escape sequences are properly accumulated, which is essential for correct escape handling in chunked parsing.picojson/tests/push_parser.rs (13)
17-26
: LGTM!Clean test that validates basic PushParser compilation and usage without lifetime issues.
28-62
: LGTM!Well-structured test validating HRTB pattern integration with tokenizer. The event count assertion correctly expects 2 events (String + EndDocument).
64-101
: LGTM!Correctly tests string extraction with proper lifetime handling using
to_owned()
.
103-147
: LGTM!Comprehensive test for object parsing that correctly validates the complete event sequence including keys and values.
149-192
: LGTM!Correctly tests simple escape sequence processing, verifying that
\n
in JSON is converted to an actual newline character.
194-237
: LGTM!Properly tests Unicode escape processing, correctly expecting
\u0041
to be converted to 'A'.
239-281
: LGTM!Excellent test for consecutive Unicode escapes. The conversion of
\uCAFE\uBABE
to Korean characters is correct and tests an important edge case.
326-402
: LGTM!Good comprehensive test with proper feature-gated sections for float support. The larger buffer size (2048) is appropriate for the complex JSON structure.
404-457
: LGTM!Excellent test comparing SliceParser and PushParser behavior for empty keys, ensuring consistent handling across parser types.
459-526
: LGTM!Comprehensive number parsing test with proper feature gating for float support. The expected event sequences correctly match the conditional JSON input.
623-645
: Remove redundant importsThe imports on line 625 are redundant.
#[test] fn test_invalid_unicode_escape_in_key() { - use picojson::{DefaultConfig, PushParser, PushParserHandler}; - struct Handler;⛔ Skipped due to learnings
Learnt from: kaidokert PR: kaidokert/picojson-rs#69 File: picojson/tests/json_checker_tests.rs:296-297 Timestamp: 2025-07-27T05:05:22.707Z Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Learnt from: kaidokert PR: kaidokert/picojson-rs#1 File: stax/src/flex_parser.rs:0-0 Timestamp: 2025-06-28T18:12:30.015Z Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert PR: kaidokert/picojson-rs#55 File: picojson/src/event_processor.rs:188-203 Timestamp: 2025-07-13T05:06:50.688Z Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert PR: kaidokert/picojson-rs#5 File: picojson/src/lib.rs:0-0 Timestamp: 2025-06-29T17:48:18.198Z Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
698-723
: Remove redundant importsThe imports on line 700 are redundant.
#[test] fn test_invalid_escape_sequences_in_keys() { - use picojson::{DefaultConfig, PushParser, PushParserHandler}; - struct Handler;⛔ Skipped due to learnings
Learnt from: kaidokert PR: kaidokert/picojson-rs#69 File: picojson/tests/json_checker_tests.rs:296-297 Timestamp: 2025-07-27T05:05:22.707Z Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Learnt from: kaidokert PR: kaidokert/picojson-rs#1 File: stax/src/flex_parser.rs:0-0 Timestamp: 2025-06-28T18:12:30.015Z Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert PR: kaidokert/picojson-rs#5 File: picojson/src/lib.rs:0-0 Timestamp: 2025-06-29T17:48:18.198Z Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert PR: kaidokert/picojson-rs#55 File: picojson/src/slice_parser.rs:0-0 Timestamp: 2025-07-13T05:10:01.847Z Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert PR: kaidokert/picojson-rs#55 File: picojson/src/event_processor.rs:188-203 Timestamp: 2025-07-13T05:06:50.688Z Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert PR: kaidokert/picojson-rs#44 File: picojson/src/chunk_reader.rs:28-33 Timestamp: 2025-07-07T01:39:55.177Z Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert PR: kaidokert/picojson-rs#55 File: picojson/src/slice_parser.rs:273-286 Timestamp: 2025-07-13T05:11:46.914Z Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
647-696
: Remove redundant importsThe imports on line 649 are redundant except for
String
enum which is not imported at module level.#[test] fn test_mixed_borrowed_and_unescaped_strings() { - use picojson::{DefaultConfig, Event, PushParser, PushParserHandler, String}; + use picojson::String; struct Handler { events: Vec<std::string::String>, }⛔ Skipped due to learnings
Learnt from: kaidokert PR: kaidokert/picojson-rs#69 File: picojson/tests/json_checker_tests.rs:296-297 Timestamp: 2025-07-27T05:05:22.707Z Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Learnt from: kaidokert PR: kaidokert/picojson-rs#1 File: stax/src/flex_parser.rs:0-0 Timestamp: 2025-06-28T18:12:30.015Z Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert PR: kaidokert/picojson-rs#55 File: picojson/src/slice_parser.rs:0-0 Timestamp: 2025-07-13T05:10:01.847Z Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert PR: kaidokert/picojson-rs#5 File: picojson/src/lib.rs:0-0 Timestamp: 2025-06-29T17:48:18.198Z Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert PR: kaidokert/picojson-rs#55 File: picojson/src/slice_parser.rs:273-286 Timestamp: 2025-07-13T05:11:46.914Z Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Learnt from: kaidokert PR: kaidokert/picojson-rs#44 File: picojson/src/chunk_reader.rs:28-33 Timestamp: 2025-07-07T01:39:55.177Z Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert PR: kaidokert/picojson-rs#55 File: picojson/src/event_processor.rs:188-203 Timestamp: 2025-07-13T05:06:50.688Z Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
picojson/tests/json_checker_tests.rs (4)
17-20
: LGTM!Clean imports and minimal handler implementation appropriate for conformance testing.
Also applies to: 39-49
73-89
: LGTM!Clean StreamParser test implementation using ChunkReader::full_slice for in-memory testing.
144-226
: LGTM!Comprehensive conformance tests for PushParser and StreamParser that correctly mirror the existing SliceParser tests.
265-290
: LGTM!Fail tests and known deviations correctly extend coverage to PushParser, maintaining consistency with SliceParser behavior.
Also applies to: 316-335
picojson/tests/push_parser_stress_test.rs (6)
13-114
: LGTM!Well-designed event handling infrastructure with OwnedEvent for lifetime-safe comparisons and immediate validation in StressTestHandler.
116-161
: LGTM!ChunkedWriter provides excellent control over data delivery patterns for comprehensive stress testing.
172-307
: LGTM!Excellent test scenario coverage including edge cases like empty strings, Unicode escapes, deeply nested structures, and buffer boundary conditions.
337-405
: LGTM!Sophisticated buffer size calculation that correctly accounts for copy-on-escape optimization and chunk boundary effects.
407-533
: LGTM!Comprehensive stress tests with good coverage of buffer sizes and chunk patterns. The pragmatic handling of copy-on-escape optimization allowing smaller buffers than expected is appropriate.
535-647
: LGTM!Excellent edge case coverage for Unicode handling and document validation. The invalid JSON scenarios appropriately test error detection.
picojson/src/push_content_builder.rs (8)
11-21
: LGTM!Well-designed trait for SAX-style event handling with proper lifetime separation between input data and scratch buffer.
23-68
: LGTM!Comprehensive state tracking structure with proper initialization. The separation of concerns between position tracking, buffer management, and escape processing is clear.
70-138
: LGTM!Clean utility methods with clear single responsibilities for state management.
144-251
: LGTM!Sophisticated escape processing with proper state management. The selective byte accumulation logic correctly handles transitions between normal content and escape sequences.
259-344
: LGTM!Correct ContentExtractor implementation handling both borrowed and unescaped content paths. The position adjustments for PushParser are properly documented.
372-411
: LGTM!Excellent copy-on-escape optimization implementation with proper bounds checking and error handling.
413-452
: LGTM!Correct handling of tokenizer's Unicode escape processing behavior. The critical comment at line 420 properly documents why the first hex digit needs special handling.
455-536
: LGTM!Robust DataSource implementation with correct position translation and chunk boundary handling. The state-based content copying logic properly handles all token types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
picojson/src/push_parser.rs (1)
143-145
: Use the existing From implementation for error construction.The error is constructed with a fully qualified path when it could use the From implementation.
- return Err(crate::push_parser::PushParseError::Parse( - ParseError::EndOfData, - )); + return Err(ParseError::EndOfData.into());
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
picojson/examples/push_parser_demo.rs
(1 hunks)picojson/src/lib.rs
(1 hunks)picojson/src/push_parser.rs
(1 hunks)picojson/tests/json_checker_tests.rs
(5 hunks)picojson/tests/push_parser.rs
(1 hunks)picojson/tests/push_parser_copy_on_escape.rs
(1 hunks)picojson/tests/push_parser_escapes.rs
(1 hunks)picojson/tests/push_parser_invalidslicebounds_repro.rs
(1 hunks)picojson/tests/push_parser_stress_test.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- picojson/src/lib.rs
- picojson/examples/push_parser_demo.rs
- picojson/tests/push_parser_copy_on_escape.rs
- picojson/tests/push_parser_invalidslicebounds_repro.rs
- picojson/tests/push_parser.rs
- picojson/tests/push_parser_escapes.rs
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
📚 Learning: in picojson-rs event processing, the sliceparser uses an if/else pattern with process_simple_events(...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Applied to files:
picojson/tests/json_checker_tests.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: the pullparser trait in picojson-rs provides both next() and next_event() methods. the next() method...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Applied to files:
picojson/tests/json_checker_tests.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: in the picojson-rs project, the bitstack trait was redesigned to return bool instead of option...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Applied to files:
picojson/tests/json_checker_tests.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: in the stax json parser codebase, escapesequence event handlers exist in flex_parser.rs not because ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Applied to files:
picojson/tests/json_checker_tests.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: in json parsing tests for picojson-rs, when testing unicode escape sequences, raw string literals us...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Applied to files:
picojson/tests/json_checker_tests.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: in the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and ful...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Applied to files:
picojson/tests/json_checker_tests.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: in picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Applied to files:
picojson/tests/json_checker_tests.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: in picojson-rs sliceparser, is_empty() and is_past_end() serve different purposes: is_empty() return...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Applied to files:
picojson/tests/json_checker_tests.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run AVR Test Suites
🔇 Additional comments (11)
picojson/src/push_parser.rs (3)
23-46
: Well-structured push parser implementation.The generic parameters and lifetime annotations are appropriately designed. The use of
ParserCore::new_chunked()
correctly distinguishes this parser from non-chunked variants.
157-188
: Clean error handling design.The error enum properly distinguishes between parser and handler errors, with appropriate From implementations for error propagation.
109-118
: Ignore the Number‐state workaround suggestion; the unescaped‐content flag is already managed correctly.The extractor’s
has_unescaped_content()
flag is cleared viaapply_unescaped_reset_if_queued()
at the start of each chunk and reset tofalse
when the scratch buffer is cleared inPushContentBuilder
. Since numeric tokens don’t generate escapes, the special‐case inpush_parser.rs
simply detects an empty scratch buffer for chunked numbers and is necessary to handle partial numeric content spanning buffers. No additional flag resets are needed here.Likely an incorrect or invalid review comment.
picojson/tests/json_checker_tests.rs (3)
39-49
: Clean test handler implementation.The
ConformanceTestHandler
appropriately focuses on event counting for conformance validation.
51-85
: Well-structured test runner functions.Both
run_push_parser_test
andrun_stream_parser_test
follow consistent patterns and properly handle error cases.
140-331
: Comprehensive conformance test coverage.The additions properly extend the test suite to cover
PushParser
andStreamParser
, maintaining consistency with the existingSliceParser
tests. The macro usage for fail tests is efficient and maintainable.picojson/tests/push_parser_stress_test.rs (5)
13-114
: Robust test infrastructure design.The owned event representation and handler implementations provide a solid foundation for stress testing. The immediate validation in
StressTestHandler
helps pinpoint failures quickly.
116-161
: Well-implemented chunked data delivery.The
ChunkedWriter
provides excellent control over input chunking patterns for stress testing parser robustness.
172-307
: Comprehensive test scenario coverage.The scenarios effectively cover edge cases including empty strings, long values, deep nesting, Unicode escapes, and complex structures. The documented minimum buffer sizes provide valuable insights into parser requirements.
339-408
: Intelligent buffer size calculation logic.The functions correctly account for copy-on-escape optimization and special requirements for escape processing scenarios. The differentiation based on chunk patterns is appropriate.
410-650
: Excellent stress test coverage.The stress tests systematically validate parser behavior across:
- Various buffer sizes (1-50 bytes)
- Multiple chunk patterns (byte-by-byte to large chunks)
- Unicode edge cases including chunk boundary splits
- Invalid document scenarios
The tests appropriately handle cases where copy-on-escape optimization performs better than expected (line 524).
Summary by Sourcery
Reintroduce a push-based JSON parser into picojson by adding PushParser and related infrastructure, updating ParserCore to handle chunked input, and integrating comprehensive tests and an example to validate and demonstrate the new functionality.
New Features:
Enhancements:
Documentation:
Tests:
Summary by CodeRabbit
New Features
Bug Fixes
Tests