-
Notifications
You must be signed in to change notification settings - Fork 2
Datasource intro #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datasource intro #77
Conversation
* Push parser
* Push demo * Apply suggestions from code review
Reviewer's GuideThis PR introduces a generic content data source abstraction to unify raw and unescaped JSON content extraction across different parser implementations, refactors existing builders to leverage this trait, and exposes scratch buffer access in CopyOnEscape. Class diagram for new DataSource trait and related typesclassDiagram
class DataSource {
<<trait>>
+get_borrowed_slice(start: usize, end: usize) Result<&[u8], ParseError>
+get_unescaped_slice() Result<&[u8], ParseError>
+has_unescaped_content() bool
}
class ContentPiece {
<<enum>>
+Input(&[u8])
+Scratch(&[u8])
+to_string() Result<String, ParseError>
}
class SliceContentBuilder {
+get_borrowed_slice(start, end)
+get_unescaped_slice()
+has_unescaped_content()
}
class StreamContentBuilder {
+get_borrowed_slice(start, end)
+get_unescaped_slice()
+has_unescaped_content()
}
DataSource <|.. SliceContentBuilder
DataSource <|.. StreamContentBuilder
class CopyOnEscape {
+has_unescaped_content() bool
+get_scratch_buffer_slice(start, end) Result<&[u8], ParseError>
+get_scratch_range() (usize, usize)
}
SliceContentBuilder o-- CopyOnEscape
class StreamBuffer {
+get_string_slice(start, end) Result<&[u8], ParseError>
+get_unescaped_slice() Result<&[u8], ParseError>
+has_unescaped_content() bool
}
StreamContentBuilder o-- StreamBuffer
Class diagram for get_content_piece helper and ContentExtractor refactorclassDiagram
class get_content_piece {
+get_content_piece(source, start_pos, current_pos) Result<ContentPiece, ParseError>
}
class ContentExtractor {
<<trait>>
+extract_string_content(start_pos)
+extract_key_content(start_pos)
+extract_number(start_pos, from_container_end, finished)
}
class Event {
+String(String)
+Key(String)
+Number(JsonNumber)
}
ContentExtractor <|.. SliceContentBuilder
ContentExtractor <|.. StreamContentBuilder
get_content_piece ..> DataSource : uses
SliceContentBuilder ..> get_content_piece : uses
StreamContentBuilder ..> get_content_piece : uses
Event <.. ContentPiece : produces
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
WalkthroughThis change introduces a significant refactor to the picojson parser, especially its push (incremental) parsing infrastructure. It adds a new Changes
Sequence Diagram(s)PushParser Event Flow with PushContentExtractorsequenceDiagram
participant User
participant PushParser
participant PushContentExtractor
participant ParserCore
participant Handler
User->>PushParser: write(data)
PushParser->>PushContentExtractor: provide chunk
PushContentExtractor->>ParserCore: next_event_impl_with_flags
ParserCore->>PushContentExtractor: request byte, accumulate, handle escapes
PushContentExtractor->>ParserCore: provide bytes, manage escape state
ParserCore->>PushContentExtractor: emit Event
PushContentExtractor->>Handler: handle_event(Event)
User->>PushParser: finish()
PushParser->>PushContentExtractor: finalize, check completion
PushParser->>Handler: emit EndDocument
PushParser-->>User: return Handler
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @kaidokert, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a fundamental abstraction for content extraction within the picojson
library by defining a DataSource
trait. This trait standardizes how different JSON parser implementations access both raw input data and processed, unescaped content. The change centralizes content retrieval logic, leading to more consistent, maintainable, and less duplicated code across the parsing components.
Highlights
- New Abstraction: DataSource Trait: Introduced a new
DataSource
trait inpicojson/src/shared.rs
to provide a unified interface for accessing JSON content. This trait abstracts whether content is borrowed directly from the input or retrieved from a temporary scratch buffer after unescaping. - Unified Content Extraction Logic: Added a generic helper function
get_content_piece
inpicojson/src/shared.rs
that leverages theDataSource
trait. This function centralizes the logic for determining and retrieving the correct content (borrowed or unescaped) for strings and keys, reducing duplication across different parser implementations. - Adaptation of Content Builders: The
SliceContentBuilder
(picojson/src/slice_content_builder.rs
) andStreamContentBuilder
(picojson/src/stream_content_builder.rs
) have been updated to implement the newDataSource
trait. Theirextract_string_content
,extract_key_content
, andextract_number
methods now utilize this new abstraction, streamlining content retrieval. - Enhanced CopyOnEscape: The
CopyOnEscape
struct inpicojson/src/copy_on_escape.rs
gained new public methods (has_unescaped_content
,get_scratch_buffer_slice
,get_scratch_range
) to expose its internal state and scratch buffer. This enablesCopyOnEscape
to be seamlessly integrated with the newDataSource
trait for slice-based parsing.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kaidokert - I've reviewed your changes - here's some feedback:
- Rename ContentPiece::to_string to something like
into_json_string
orinto_string
to avoid shadowing the standardto_string
and make it clear it returns the custom String enum. - Consider simplifying the DataSource trait signatures by removing the explicit
&'input self
/&'scratch self
receivers and relying on the return lifetimes, which would reduce complexity and improve readability.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Rename ContentPiece::to_string to something like `into_json_string` or `into_string` to avoid shadowing the standard `to_string` and make it clear it returns the custom String enum.
- Consider simplifying the DataSource trait signatures by removing the explicit `&'input self`/`&'scratch self` receivers and relying on the return lifetimes, which would reduce complexity and improve readability.
## Individual Comments
### Comment 1
<location> `picojson/src/slice_content_builder.rs:74` </location>
<code_context>
+ fn extract_string_content(&mut self, start_pos: usize) -> Result<Event<'_, '_>, ParseError> {
+ // SliceParser-specific: Complete CopyOnEscape processing for unescaped content
+ let current_pos = self.current_position();
+ if self.has_unescaped_content() {
+ let end_pos = ContentRange::end_position_excluding_delimiter(current_pos);
+ self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing
</code_context>
<issue_to_address>
Redundant CopyOnEscape processing if get_content_piece also checks for unescaped content.
Consolidate CopyOnEscape handling to a single location to prevent redundant processing and potential state inconsistencies.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
// SliceParser-specific: Complete CopyOnEscape processing for unescaped content
let current_pos = self.current_position();
if self.has_unescaped_content() {
let end_pos = ContentRange::end_position_excluding_delimiter(current_pos);
self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing
}
// Use the unified helper function to get the content
let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?;
Ok(Event::String(content_piece.to_string()?))
=======
let current_pos = self.current_position();
// Use the unified helper function to get the content (including CopyOnEscape handling)
let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?;
Ok(Event::String(content_piece.to_string()?))
>>>>>>> REPLACE
</suggested_fix>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
// SliceParser-specific: Complete CopyOnEscape processing for unescaped content | ||
let current_pos = self.current_position(); | ||
if self.has_unescaped_content() { | ||
let end_pos = ContentRange::end_position_excluding_delimiter(current_pos); | ||
self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing | ||
} | ||
|
||
// Use the unified helper function to get the content | ||
let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?; | ||
Ok(Event::String(content_piece.to_string()?)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Redundant CopyOnEscape processing if get_content_piece also checks for unescaped content.
Consolidate CopyOnEscape handling to a single location to prevent redundant processing and potential state inconsistencies.
// SliceParser-specific: Complete CopyOnEscape processing for unescaped content | |
let current_pos = self.current_position(); | |
if self.has_unescaped_content() { | |
let end_pos = ContentRange::end_position_excluding_delimiter(current_pos); | |
self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing | |
} | |
// Use the unified helper function to get the content | |
let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?; | |
Ok(Event::String(content_piece.to_string()?)) | |
let current_pos = self.current_position(); | |
// Use the unified helper function to get the content (including CopyOnEscape handling) | |
let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?; | |
Ok(Event::String(content_piece.to_string()?)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a DataSource
trait to abstract away content extraction from either the input or a scratch buffer. This is a great refactoring that helps unify the logic between SliceContentBuilder
and StreamContentBuilder
.
My review includes a couple of suggestions:
- In
picojson/src/shared.rs
, theContentPiece::to_string
method can be made more concise and idiomatic. - In
picojson/src/slice_content_builder.rs
, the refactoredextract_string_content
andextract_key_content
methods have become less efficient. I've suggested reverting them to a simpler form that avoids redundant work, as theCopyOnEscape
helper already provides the necessary logic.
Overall, the abstraction is a good addition, and with these changes, the implementation will be even better.
fn extract_string_content(&mut self, start_pos: usize) -> Result<Event<'_, '_>, ParseError> { | ||
// SliceParser-specific: Complete CopyOnEscape processing for unescaped content | ||
let current_pos = self.current_position(); | ||
if self.has_unescaped_content() { | ||
let end_pos = ContentRange::end_position_excluding_delimiter(current_pos); | ||
self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing | ||
} | ||
|
||
// Use the unified helper function to get the content | ||
let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?; | ||
Ok(Event::String(content_piece.to_string()?)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new implementation is less efficient than the previous version. When has_unescaped_content()
is true, it calls self.copy_on_escape.end_string(end_pos)?
, which creates a String
that is immediately discarded. Then, get_content_piece
is called to get the content again and create another String
.
The copy_on_escape.end_string()
method already encapsulates the logic of choosing between a borrowed or unescaped string, which is what get_content_piece
aims to abstract. For SliceContentBuilder
, it's more direct and efficient to just use the result of end_string()
.
I suggest reverting to a simpler implementation similar to the one before this change.
fn extract_string_content(&mut self, _start_pos: usize) -> Result<Event<'_, '_>, ParseError> {
let end_pos = ContentRange::end_position_excluding_delimiter(self.current_position());
let value_result = self.copy_on_escape.end_string(end_pos)?;
Ok(Event::String(value_result))
}
fn extract_key_content(&mut self, start_pos: usize) -> Result<Event<'_, '_>, ParseError> { | ||
// SliceParser-specific: Complete CopyOnEscape processing for unescaped content | ||
let current_pos = self.current_position(); | ||
if self.has_unescaped_content() { | ||
let end_pos = ContentRange::end_position_excluding_delimiter(current_pos); | ||
self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing | ||
} | ||
|
||
// Use the unified helper function to get the content | ||
let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?; | ||
Ok(Event::Key(content_piece.to_string()?)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to extract_string_content
, this implementation is inefficient due to the redundant creation of a String
instance. The call to self.copy_on_escape.end_string(end_pos)?
creates a String
that is discarded, and then get_content_piece
re-processes to create it again.
It would be more efficient to use the result of end_string()
directly, as it already handles both borrowed and unescaped cases correctly.
fn extract_key_content(&mut self, _start_pos: usize) -> Result<Event<'_, '_>, ParseError> {
let end_pos = ContentRange::end_position_excluding_delimiter(self.current_position());
let key_result = self.copy_on_escape.end_string(end_pos)?;
Ok(Event::Key(key_result))
}
picojson/src/shared.rs
Outdated
pub fn to_string(self) -> Result<String<'input, 'scratch>, ParseError> { | ||
match self { | ||
ContentPiece::Input(bytes) => { | ||
let content_str = from_utf8(bytes)?; | ||
Ok(String::Borrowed(content_str)) | ||
} | ||
ContentPiece::Scratch(bytes) => { | ||
let content_str = from_utf8(bytes)?; | ||
Ok(String::Unescaped(content_str)) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can be made more concise. The logic inside both match arms is very similar. You can use .map()
to transform the Result
from from_utf8
directly into the desired String
variant. This will make the code more idiomatic and reduce duplication.
pub fn to_string(self) -> Result<String<'input, 'scratch>, ParseError> {
match self {
ContentPiece::Input(bytes) => from_utf8(bytes).map(String::Borrowed),
ContentPiece::Scratch(bytes) => from_utf8(bytes).map(String::Unescaped),
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
picojson/src/shared.rs (1)
231-242
: Simplify the to_string method using map().The logic in both match arms is identical and can be consolidated.
- pub fn to_string(self) -> Result<String<'input, 'scratch>, ParseError> { - match self { - ContentPiece::Input(bytes) => { - let content_str = from_utf8(bytes)?; - Ok(String::Borrowed(content_str)) - } - ContentPiece::Scratch(bytes) => { - let content_str = from_utf8(bytes)?; - Ok(String::Unescaped(content_str)) - } - } - } + pub fn to_string(self) -> Result<String<'input, 'scratch>, ParseError> { + match self { + ContentPiece::Input(bytes) => from_utf8(bytes).map(String::Borrowed), + ContentPiece::Scratch(bytes) => from_utf8(bytes).map(String::Unescaped), + } + }
🧹 Nitpick comments (1)
picojson/src/slice_content_builder.rs (1)
71-82
: Consider documenting why CopyOnEscape finalization is necessary here.While this appears redundant (as noted in past reviews), the
end_string()
call is necessary to finalize the CopyOnEscape state before extracting content. Consider adding a comment explaining this two-step process to prevent future confusion.fn extract_string_content(&mut self, start_pos: usize) -> Result<Event<'_, '_>, ParseError> { // SliceParser-specific: Complete CopyOnEscape processing for unescaped content let current_pos = self.current_position(); if self.has_unescaped_content() { let end_pos = ContentRange::end_position_excluding_delimiter(current_pos); - self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing + // Note: end_string() must be called to finalize escape processing state, + // even though we discard its return value and re-extract via get_content_piece + self.copy_on_escape.end_string(end_pos)?; } // Use the unified helper function to get the content let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?; Ok(Event::String(content_piece.to_string()?)) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
picojson/examples/push_parser_demo.rs
(2 hunks)picojson/src/copy_on_escape.rs
(1 hunks)picojson/src/escape_processor.rs
(3 hunks)picojson/src/push_parser.rs
(5 hunks)picojson/src/shared.rs
(1 hunks)picojson/src/slice_content_builder.rs
(4 hunks)picojson/src/stream_content_builder.rs
(5 hunks)
✅ Files skipped from review due to trivial changes (2)
- picojson/examples/push_parser_demo.rs
- picojson/src/copy_on_escape.rs
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
📚 Learning: in json parsing tests for picojson-rs, when testing unicode escape sequences, raw string literals us...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Applied to files:
picojson/src/escape_processor.rs
picojson/src/push_parser.rs
picojson/src/slice_content_builder.rs
📚 Learning: in the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and ful...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Applied to files:
picojson/src/escape_processor.rs
picojson/src/push_parser.rs
picojson/src/shared.rs
picojson/src/stream_content_builder.rs
📚 Learning: in the stax json parser codebase, escapesequence event handlers exist in flex_parser.rs not because ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Applied to files:
picojson/src/escape_processor.rs
picojson/src/push_parser.rs
picojson/src/shared.rs
picojson/src/stream_content_builder.rs
picojson/src/slice_content_builder.rs
📚 Learning: the pullparser trait in picojson-rs provides both next() and next_event() methods. the next() method...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Applied to files:
picojson/src/push_parser.rs
picojson/src/shared.rs
picojson/src/stream_content_builder.rs
picojson/src/slice_content_builder.rs
📚 Learning: in the picojson-rs project, the bitstack trait was redesigned to return bool instead of option...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Applied to files:
picojson/src/push_parser.rs
picojson/src/shared.rs
picojson/src/stream_content_builder.rs
📚 Learning: in picojson-rs sliceparser, is_empty() and is_past_end() serve different purposes: is_empty() return...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Applied to files:
picojson/src/push_parser.rs
picojson/src/shared.rs
picojson/src/stream_content_builder.rs
picojson/src/slice_content_builder.rs
📚 Learning: in picojson-rs event processing, the sliceparser uses an if/else pattern with process_simple_events(...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Applied to files:
picojson/src/push_parser.rs
picojson/src/stream_content_builder.rs
picojson/src/slice_content_builder.rs
📚 Learning: in picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Applied to files:
picojson/src/push_parser.rs
picojson/src/stream_content_builder.rs
picojson/src/slice_content_builder.rs
🧬 Code Graph Analysis (1)
picojson/src/shared.rs (5)
picojson/src/slice_content_builder.rs (3)
get_borrowed_slice
(177-179)get_unescaped_slice
(181-192)has_unescaped_content
(194-196)picojson/src/stream_content_builder.rs (3)
get_borrowed_slice
(327-331)get_unescaped_slice
(333-335)has_unescaped_content
(337-339)picojson/src/push_parser.rs (3)
get_borrowed_slice
(54-70)get_unescaped_slice
(72-76)has_unescaped_content
(78-80)picojson/src/stream_buffer.rs (2)
get_unescaped_slice
(218-225)has_unescaped_content
(243-245)picojson/src/copy_on_escape.rs (1)
has_unescaped_content
(210-212)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run AVR Test Suites
🔇 Additional comments (13)
picojson/src/shared.rs (2)
174-213
: Well-designed trait abstraction for unified content access.The
DataSource
trait provides a clean and flexible interface for accessing both borrowed and unescaped content with proper lifetime management.
257-278
: Clean implementation of the content extraction helper.The function effectively consolidates the content extraction logic using the
DataSource
trait, providing a consistent interface across all parsers.picojson/src/escape_processor.rs (2)
270-278
: LGTM! Clean accessor methods for surrogate state management.These methods properly expose the pending high surrogate state for external management.
660-738
: Good refactoring to use the DataSource trait abstraction.The function now properly uses the
DataSource
trait for buffer access and externalizes surrogate state management, making it reusable across different parser implementations.picojson/src/push_parser.rs (4)
41-81
: Well-implemented DataSource adapter for push parser.The
PushParserDataSource
correctly handles position conversion for chunked input and properly delegates to the stream buffer for unescaped content.
230-238
: Correct usage of DataSource pattern in finish method.The empty input data is appropriate here since we're only accessing the unescaped buffer content.
300-327
: Clean refactoring using the unified content extraction pattern.The code correctly handles position calculations and properly uses the
get_content_piece
helper for consistent string/key extraction.
352-374
: Consistent application of the DataSource pattern for number extraction.The unified approach using
get_content_piece
simplifies the code and maintains consistency across all content types.picojson/src/slice_content_builder.rs (2)
97-112
: Clean simplification of number extraction.Good use of the
DataSource
trait method directly for number bytes extraction.
172-197
: Well-implemented DataSource trait with proper safety checks.The implementation correctly handles both borrowed and unescaped content access with appropriate error handling.
picojson/src/stream_content_builder.rs (3)
198-206
: Good refactoring with proper StreamParser-specific handling.The queue reset mechanism appropriately handles content cleanup for streaming scenarios while using the unified extraction pattern.
218-235
: Clean implementation using the DataSource abstraction.The number extraction correctly handles the end position calculation and uses the trait method appropriately.
320-340
: Correct DataSource implementation for streaming context.The implementation properly handles the unified lifetime scenario for stream parsing and correctly delegates to the StreamBuffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
picojson/examples/push_parser_demo.rs
(2 hunks)picojson/src/lib.rs
(1 hunks)picojson/src/push_content_builder.rs
(1 hunks)picojson/src/push_parser.rs
(20 hunks)picojson/tests/json_checker_tests.rs
(1 hunks)picojson/tests/push_parser.rs
(14 hunks)picojson/tests/push_parser_copy_on_escape.rs
(3 hunks)picojson/tests/push_parser_escapes.rs
(6 hunks)picojson/tests/push_parser_invalidslicebounds_repro.rs
(3 hunks)picojson/tests/push_parser_stress_test.rs
(9 hunks)
✅ Files skipped from review due to trivial changes (5)
- picojson/tests/push_parser_escapes.rs
- picojson/tests/json_checker_tests.rs
- picojson/tests/push_parser_invalidslicebounds_repro.rs
- picojson/tests/push_parser_copy_on_escape.rs
- picojson/src/lib.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- picojson/examples/push_parser_demo.rs
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
📚 Learning: in the picojson-rs project, the bitstack trait was redesigned to return bool instead of option...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Applied to files:
picojson/tests/push_parser.rs
picojson/src/push_content_builder.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: in picojson-rs event processing, the sliceparser uses an if/else pattern with process_simple_events(...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Applied to files:
picojson/tests/push_parser.rs
picojson/src/push_content_builder.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: in the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and ful...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Applied to files:
picojson/tests/push_parser.rs
picojson/src/push_content_builder.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: the pullparser trait in picojson-rs provides both next() and next_event() methods. the next() method...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Applied to files:
picojson/tests/push_parser.rs
picojson/src/push_content_builder.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: in picojson-rs sliceparser, is_empty() and is_past_end() serve different purposes: is_empty() return...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Applied to files:
picojson/tests/push_parser.rs
picojson/src/push_content_builder.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: in the stax json parser codebase, escapesequence event handlers exist in flex_parser.rs not because ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Applied to files:
picojson/tests/push_parser.rs
picojson/src/push_content_builder.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: in json parsing tests for picojson-rs, when testing unicode escape sequences, raw string literals us...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Applied to files:
picojson/tests/push_parser.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
📚 Learning: in picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Applied to files:
picojson/tests/push_parser.rs
picojson/src/push_content_builder.rs
picojson/tests/push_parser_stress_test.rs
picojson/src/push_parser.rs
🧬 Code Graph Analysis (2)
picojson/tests/push_parser_stress_test.rs (3)
picojson/tests/push_parser.rs (9)
parser
(25-25)parser
(58-58)parser
(93-93)parser
(132-132)parser
(177-177)parser
(222-222)parser
(266-266)parser
(496-496)parser
(552-552)picojson/src/push_content_builder.rs (1)
new
(47-59)picojson/src/push_parser.rs (1)
new
(61-69)
picojson/src/push_parser.rs (4)
picojson/src/push_content_builder.rs (2)
stream_buffer
(109-111)new
(47-59)picojson/src/escape_processor.rs (1)
new
(200-206)picojson/src/stream_buffer.rs (1)
current_byte
(81-89)picojson/src/parse_error.rs (5)
from
(41-48)from
(52-63)from
(67-69)from
(73-75)from
(79-81)
🪛 GitHub Actions: AVR Tests
picojson/src/push_parser.rs
[error] 12-12: Rust compiler error E0463: can't find crate for std
. The avr-none
target may not support the standard library. Consider building the standard library from source with cargo build -Zbuild-std
.
🔇 Additional comments (17)
picojson/tests/push_parser.rs (1)
25-25
: LGTM! Consistent API simplification across all tests.The refactoring to use
parser.finish::<()>().unwrap()
instead of separatefinish()
anddestroy()
calls is consistently applied across all test functions. This aligns with the newPushParser
API wherefinish
consumes the parser and returns the handler directly.Also applies to: 58-58, 93-93, 132-132, 177-177, 222-222, 266-266, 313-313, 323-323, 401-401, 451-451, 496-496, 552-552, 688-688
picojson/src/push_content_builder.rs (8)
11-21
: Well-designed SAX-style event handler trait.The trait properly separates input and scratch buffer lifetimes, following the established pattern for push-based parsers. The generic error type
E
provides flexibility for different error handling strategies.
46-59
: Proper initialization of content builder state.All fields are correctly initialized with appropriate default values. The StreamBuffer encapsulation provides clean separation of buffer management concerns.
119-124
: Correct use of HRTB for flexible lifetime handling.The method properly delegates to the handler with appropriate lifetime constraints, ensuring the handler can work with any combination of input and scratch lifetimes.
137-157
: Atomic number emission with proper cleanup.The method correctly ensures the unescaped buffer is cleared after event emission, preventing data leakage between parsing operations. Error handling appropriately wraps both parsing and handler errors.
159-201
: Clean delegation to shared content extraction logic.The method effectively uses the DataSource pattern to avoid code duplication while properly handling position adjustments for quoted content. The unified handling of strings and keys reduces complexity.
204-239
: Consistent number emission using DataSource pattern.The method maintains consistency with string/key extraction by using the DataSource pattern while ensuring atomic buffer cleanup after event emission.
255-381
: Proper ContentExtractor adaptation for push parsing.The implementation correctly handles the differences between pull and push parsing. The
next_byte
returning None is appropriate since bytes are fed externally. Unicode escape processing and state management are properly implemented.
384-423
: Well-implemented DataSource abstraction for chunk processing.The implementation correctly handles position offset conversion using
saturating_sub
to prevent underflow. Bounds checking ensures safe slice access, and the abstraction cleanly separates borrowed and unescaped content sources.picojson/tests/push_parser_stress_test.rs (4)
116-141
: Method signature correctly updated for new parser API.The
run
method now properly takes ownership of the parser and returns the handler, aligning with the newfinish()
API that consumes the parser. The generic constraintE: From<ParseError>
ensures proper error conversion support.
225-225
: Test scenarios correctly updated for escape processing.The changes properly reflect:
- Correct JSON escape sequence format using single backslashes in raw byte string literals
- Expected
Unescaped
string type for content with escape sequences- Appropriate buffer size adjustments for the new escape processing logic
Also applies to: 247-254
304-311
: Consistent parser usage pattern with proper ownership transfer.The immutable parser binding correctly reflects that ownership transfers to
ChunkedWriter::run()
. The pattern of creating parser → running → verifying handler is consistently applied throughout the tests.Also applies to: 614-618
528-529
: Unicode test expectations properly updated.The tests correctly expect
picojson::String::Unescaped
variants for Unicode escape sequences, with properly decoded Unicode characters in the expected values.Also applies to: 539-540, 551-552
picojson/src/push_parser.rs (4)
40-69
: Clean separation of concerns with PushContentBuilder delegation.The refactored struct properly delegates content management to
PushContentBuilder
while maintaining high-level state control. This improves modularity and reduces complexity in the parser implementation.
72-161
: Proper delegation pattern maintains control flow integrity.The
write
method correctly delegates all content operations toPushContentBuilder
while preserving the original parsing logic. Position tracking, buffer management, and escape processing are appropriately handled through delegation.
163-216
: Finish method correctly implements ownership transfer pattern.The method properly:
- Validates parsing state completeness
- Emits the final
EndDocument
event- Consumes the parser and returns the handler
This API change simplifies usage by eliminating the need for a separate
destroy()
method.
581-602
: Helper functions properly handle tokenizer integration.The functions correctly:
- Convert relative tokenizer positions to absolute positions
- Use the established 2-slot event storage pattern
- Provide clean iteration over stored events
#[cfg(any(test, debug_assertions))] | ||
extern crate std; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix AVR build failure by restricting std dependency to tests only.
The AVR target failure occurs because debug_assertions
might be enabled in AVR builds, causing the std import. Since these error conversions are only used in tests, restrict the configuration to tests only.
-#[cfg(any(test, debug_assertions))]
+#[cfg(test)]
extern crate std;
// ... later in file ...
-#[cfg(any(test, debug_assertions))]
+#[cfg(test)]
impl From<ParseError> for std::string::String {
fn from(_: ParseError) -> Self {
std::string::String::new()
}
}
-#[cfg(any(test, debug_assertions))]
+#[cfg(test)]
impl From<ParseError> for () {
fn from(_: ParseError) -> Self {
()
}
}
Also applies to: 606-618
🧰 Tools
🪛 GitHub Actions: AVR Tests
[error] 12-12: Rust compiler error E0463: can't find crate for std
. The avr-none
target may not support the standard library. Consider building the standard library from source with cargo build -Zbuild-std
.
🤖 Prompt for AI Agents
In picojson/src/push_parser.rs at lines 11-12 and also lines 606-618, the std
crate is imported with a cfg attribute that includes debug_assertions, which
causes build failures on AVR targets. Modify the cfg attribute to restrict the
std import to tests only by using #[cfg(test)] instead of #[cfg(any(test,
debug_assertions))]. Apply this change to both the initial import and the code
block at lines 606-618 to fix the AVR build failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
picojson/src/push_parser.rs (1)
13-14
: Fix AVR build failure by restricting std dependency to tests only.The AVR target failure occurs because
debug_assertions
might be enabled in AVR builds, causing the std import. Since these error conversions are only used in tests, restrict the configuration to tests only.-#[cfg(any(test, debug_assertions))] +#[cfg(test)] extern crate std; // ... later in file ... -#[cfg(any(test, debug_assertions))] +#[cfg(test)] impl From<ParseError> for std::string::String { fn from(_: ParseError) -> Self { std::string::String::new() } } -#[cfg(any(test, debug_assertions))] +#[cfg(test)] impl From<ParseError> for () { fn from(_: ParseError) -> Self { () } }Also applies to: 197-209
🧹 Nitpick comments (2)
picojson/src/push_content_builder.rs (2)
351-356
: Remove or document the empty code block.The empty else block serves no purpose and should be removed for cleaner code.
- if self.unicode_escape_collector.is_in_progress() { - // The collector is still in progress, which means not all hex digits were processed - // This can happen when the End UnicodeEscape event consumes the last hex digit before - // the byte accumulator can process it. For now, we'll log this and not treat it as an error. - } else { - } + // The collector might still be in progress if the End UnicodeEscape event consumed + // the last hex digit before the byte accumulator could process it. + // This is expected behavior for chunked input and not an error.
564-565
: Remove empty else block.- } else { - }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
picojson/src/event_processor.rs
(4 hunks)picojson/src/json_number.rs
(2 hunks)picojson/src/json_string.rs
(1 hunks)picojson/src/push_content_builder.rs
(1 hunks)picojson/src/push_parser.rs
(3 hunks)picojson/src/shared.rs
(3 hunks)picojson/tests/push_parser_stress_test.rs
(10 hunks)
✅ Files skipped from review due to trivial changes (1)
- picojson/src/json_string.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- picojson/tests/push_parser_stress_test.rs
- picojson/src/shared.rs
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
📚 Learning: in the picojson-rs project, the bitstack trait was redesigned to return bool instead of option...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Applied to files:
picojson/src/json_number.rs
picojson/src/event_processor.rs
picojson/src/push_content_builder.rs
picojson/src/push_parser.rs
📚 Learning: in the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and ful...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Applied to files:
picojson/src/json_number.rs
picojson/src/event_processor.rs
picojson/src/push_content_builder.rs
picojson/src/push_parser.rs
📚 Learning: in picojson-rs sliceparser, is_empty() and is_past_end() serve different purposes: is_empty() return...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Applied to files:
picojson/src/json_number.rs
picojson/src/event_processor.rs
picojson/src/push_content_builder.rs
picojson/src/push_parser.rs
📚 Learning: in picojson-rs event processing, the sliceparser uses an if/else pattern with process_simple_events(...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Applied to files:
picojson/src/event_processor.rs
picojson/src/push_content_builder.rs
picojson/src/push_parser.rs
📚 Learning: in the stax json parser codebase, escapesequence event handlers exist in flex_parser.rs not because ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Applied to files:
picojson/src/event_processor.rs
picojson/src/push_content_builder.rs
picojson/src/push_parser.rs
📚 Learning: the pullparser trait in picojson-rs provides both next() and next_event() methods. the next() method...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Applied to files:
picojson/src/event_processor.rs
picojson/src/push_content_builder.rs
picojson/src/push_parser.rs
📚 Learning: in picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Applied to files:
picojson/src/event_processor.rs
picojson/src/push_content_builder.rs
picojson/src/push_parser.rs
📚 Learning: in json parsing tests for picojson-rs, when testing unicode escape sequences, raw string literals us...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Applied to files:
picojson/src/event_processor.rs
picojson/src/push_content_builder.rs
picojson/src/push_parser.rs
📚 Learning: in rust crate organization, functions can be imported into lib.rs via `use` statements not for direc...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.
Applied to files:
picojson/src/push_parser.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run AVR Test Suites
🔇 Additional comments (7)
picojson/src/json_number.rs (1)
28-28
: LGTM!Adding
Clone
toNumberResult
andJsonNumber
is appropriate since all contained types implementClone
. This enables duplication of parsed numeric values in the event-driven parsing model.Also applies to: 50-50
picojson/src/event_processor.rs (6)
25-27
: LGTM! Clear documentation for the chunked input flag.The addition of
handles_chunked_input
field with its comprehensive documentation clearly explains the behavioral difference between chunked (PushParser) and non-chunked (Slice/Stream) parsers.
31-49
: Well-structured constructors for different parser modes.The separation into
new()
for non-chunked parsers andnew_chunked()
for chunked parsers provides clear API boundaries and correctly initializes thehandles_chunked_input
flag.
51-54
: Useful accessor for escape sequence state.The
in_escape_sequence()
method provides clean read-only access to the escape sequence state, supporting the refactored escape handling logic.
59-70
: Good refactoring with delegation pattern.The updated
next_event_impl
maintains backward compatibility while delegating to the newnext_event_impl_with_flags
method. This provides a clean upgrade path for existing callers.
95-114
: Well-designed byte accumulation logic for different parser modes.The refactored accumulation logic correctly handles the different requirements for PushParser (which needs to accumulate during escape sequences) and other parsers. The detailed comments clearly explain the reasoning behind each branch.
116-133
: Correct end-of-input handling for different parser types.The implementation properly differentiates between chunked parsers (which return
EndOfData
to signal chunk boundaries) and non-chunked parsers (which finalize the document). The callback lifetime is correctly managed, and any events generated byfinish()
are properly processed.
} else { | ||
// Non-hex digit during Unicode escape - this shouldn't happen in valid JSON | ||
self.in_unicode_escape = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider returning an error for non-hex digits during Unicode escape.
When a non-hex digit is encountered during Unicode escape processing, the code silently resets the flag. This could mask parsing errors in malformed JSON.
Consider returning a parsing error instead:
} else {
// Non-hex digit during Unicode escape - this shouldn't happen in valid JSON
- self.in_unicode_escape = false;
+ return Err(ParseError::InvalidUnicodeEscape);
}
🤖 Prompt for AI Agents
In picojson/src/push_content_builder.rs around lines 198 to 201, the code
currently resets the in_unicode_escape flag silently when encountering a non-hex
digit during Unicode escape processing. Instead, modify the code to return a
parsing error at this point to properly signal malformed JSON input, ensuring
the error is propagated to the caller for handling.
let mut temp_buffer = [0u8; 1024]; // Should handle most token sizes | ||
let byte_count = partial_slice.len(); | ||
if byte_count > temp_buffer.len() { | ||
return Err(ParseError::InputBufferFull); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
Document the 1024-byte token size limitation.
The fixed buffer size could cause issues with long strings or keys. Consider documenting this limitation or making it configurable.
Consider adding a constant and documentation:
+ /// Maximum size for a single token when copying across chunk boundaries
+ const MAX_TOKEN_SIZE: usize = 1024;
+
// Copy bytes to a fixed-size buffer to avoid borrowing conflicts
- let mut temp_buffer = [0u8; 1024]; // Should handle most token sizes
+ let mut temp_buffer = [0u8; Self::MAX_TOKEN_SIZE];
let byte_count = partial_slice.len();
if byte_count > temp_buffer.len() {
- return Err(ParseError::InputBufferFull);
+ // Token too large for cross-chunk handling
+ return Err(ParseError::InputBufferFull);
}
Document and parameterize the 1024-byte scratch buffer
The fixed-capacity buffer in copy_partial_content_to_scratch
should be defined as a named constant (with documentation) and the empty else
block removed to improve clarity and maintainability.
• Introduce a private constant in push_content_builder.rs
:
/// Maximum size for a single token when copying across chunk boundaries.
const MAX_TOKEN_SIZE: usize = 1024;
• Replace the anonymous buffer with the named constant:
- let mut temp_buffer = [0u8; 1024]; // Should handle most token sizes
+ let mut temp_buffer = [0u8; Self::MAX_TOKEN_SIZE];
• Remove the empty else {}
branch:
@@ -561,7 +561,6 @@
let byte_count = partial_slice.len();
if byte_count > temp_buffer.len() {
// Token too large for cross-chunk handling
return Err(ParseError::InputBufferFull);
}
- else { }
// continue processing…
• (Optional) If consumers need larger tokens in non-embedded contexts, consider exposing MAX_TOKEN_SIZE
as a configurable parameter on the builder or via a feature flag.
🤖 Prompt for AI Agents
In picojson/src/push_content_builder.rs around lines 547 to 551, the 1024-byte
scratch buffer should be replaced with a named private constant MAX_TOKEN_SIZE
with a documentation comment explaining its purpose. Define this constant at the
top of the file or an appropriate location. Replace the fixed-size array length
1024 with MAX_TOKEN_SIZE. Remove the empty else block following the if condition
to improve code clarity. Optionally, consider making MAX_TOKEN_SIZE configurable
if larger tokens are needed in some contexts.
// Special case: For Numbers, check if the scratch buffer is actually empty | ||
// This handles the byte-by-byte case where the flag is stale from previous Key processing | ||
if matches!(extractor_state, State::Number(_)) { | ||
let buffer_slice = self.extractor.get_unescaped_slice().unwrap_or(&[]); | ||
let buffer_empty = buffer_slice.is_empty(); | ||
|
||
if buffer_empty { | ||
self.extractor.copy_partial_content_to_scratch()?; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider improving state management to avoid stale flags.
The special case for numbers suggests that has_unescaped_content()
can return stale values from previous token processing. This workaround indicates a state management issue that should be addressed at the root.
Consider resetting the using_unescaped_buffer
flag when transitioning between tokens to ensure it accurately reflects the current state, eliminating the need for this special case.
🤖 Prompt for AI Agents
In picojson/src/push_parser.rs around lines 113 to 123, the current code uses a
special case to handle stale flags in has_unescaped_content() for Number states,
indicating a state management issue. To fix this, identify where the
using_unescaped_buffer flag is set and reset it appropriately during token
transitions, ensuring it reflects the current token's state accurately. This
will remove the need for the special case by preventing stale flag values from
previous token processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
picojson/src/push_content_builder.rs (2)
198-201
: Consider returning an error for non-hex digits during Unicode escape.When a non-hex digit is encountered during Unicode escape processing, the code silently resets the flag. This could mask parsing errors in malformed JSON.
540-544
: Document and parameterize the 1024-byte scratch buffer.The fixed-capacity buffer should be defined as a named constant with documentation, and the size limitation should be clearly documented.
picojson/src/slice_content_builder.rs (2)
84-95
: Same inefficiency in key extraction as in string extraction.This method has the same double-processing issue where
end_string()
is called but its result is discarded.Apply the same fix as for
extract_string_content
:fn extract_key_content(&mut self, start_pos: usize) -> Result<Event<'_, '_>, ParseError> { - // SliceParser-specific: Complete CopyOnEscape processing for unescaped content let current_pos = self.current_position(); - if self.has_unescaped_content() { - let end_pos = ContentRange::end_position_excluding_delimiter(current_pos); - self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing - } - - // Use the unified helper function to get the content - let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?; - Ok(Event::Key(content_piece.into_string()?)) + let end_pos = ContentRange::end_position_excluding_delimiter(current_pos); + let key_result = self.copy_on_escape.end_string(end_pos)?; + Ok(Event::Key(key_result)) }
71-82
: Inefficient double processing of unescaped content.When
has_unescaped_content()
is true, the code callsend_string()
which creates a String that is immediately discarded, thenget_content_piece
recreates it. This duplicates work unnecessarily.Consider using the result of
end_string()
directly:fn extract_string_content(&mut self, start_pos: usize) -> Result<Event<'_, '_>, ParseError> { - // SliceParser-specific: Complete CopyOnEscape processing for unescaped content let current_pos = self.current_position(); - if self.has_unescaped_content() { - let end_pos = ContentRange::end_position_excluding_delimiter(current_pos); - self.copy_on_escape.end_string(end_pos)?; // Complete the CopyOnEscape processing - } - - // Use the unified helper function to get the content - let content_piece = crate::shared::get_content_piece(self, start_pos, current_pos)?; - Ok(Event::String(content_piece.into_string()?)) + let end_pos = ContentRange::end_position_excluding_delimiter(current_pos); + let value_result = self.copy_on_escape.end_string(end_pos)?; + Ok(Event::String(value_result)) }picojson/src/push_parser.rs (2)
13-14
: Fix AVR build failure by restricting std dependency to tests only.The AVR target failure occurs because
debug_assertions
might be enabled in AVR builds, causing the std import. Since these error conversions are only used in tests, restrict the configuration to tests only.Also applies to: 195-205
111-121
: Address state management to avoid stale flags.The special case for numbers indicates that
has_unescaped_content()
can return stale values from previous token processing. This workaround suggests an underlying state management issue.
🧹 Nitpick comments (1)
picojson/src/event_processor.rs (1)
90-106
: Well-designed byte accumulation logic for escape sequences.The conditional accumulation logic correctly handles the different requirements of chunked parsers during escape sequences while maintaining backward compatibility for other parsers.
Consider extracting the accumulation logic into a helper method for better readability:
-let should_accumulate = if always_accumulate_during_escapes { - // For PushParser: accumulate during escapes even when events are generated - // This ensures hex digits reach the accumulator even when End UnicodeEscape events consume them - // BUT still respect the normal logic when not in escape sequences - if self.in_escape_sequence { - true // Always accumulate during escape sequences - } else { - !have_events(&self.parser_state.evts) // Normal behavior outside escapes - } -} else { - // For other parsers: only accumulate when no events generated and not in escape - !have_events(&self.parser_state.evts) && !self.in_escape_sequence -}; +let should_accumulate = self.should_accumulate_byte( + always_accumulate_during_escapes, + &self.parser_state.evts +);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
picojson/src/copy_on_escape.rs
(1 hunks)picojson/src/escape_processor.rs
(4 hunks)picojson/src/event_processor.rs
(4 hunks)picojson/src/push_content_builder.rs
(1 hunks)picojson/src/push_parser.rs
(3 hunks)picojson/src/shared.rs
(2 hunks)picojson/src/slice_content_builder.rs
(4 hunks)picojson/src/slice_input_buffer.rs
(0 hunks)picojson/src/stream_content_builder.rs
(5 hunks)
💤 Files with no reviewable changes (1)
- picojson/src/slice_input_buffer.rs
🚧 Files skipped from review as they are similar to previous changes (4)
- picojson/src/copy_on_escape.rs
- picojson/src/stream_content_builder.rs
- picojson/src/shared.rs
- picojson/src/escape_processor.rs
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
📚 Learning: the pullparser trait in picojson-rs provides both next() and next_event() methods. the next() method...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Applied to files:
picojson/src/push_content_builder.rs
picojson/src/slice_content_builder.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
📚 Learning: in the stax json parser codebase, escapesequence event handlers exist in flex_parser.rs not because ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Applied to files:
picojson/src/push_content_builder.rs
picojson/src/slice_content_builder.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
📚 Learning: in picojson-rs event processing, the sliceparser uses an if/else pattern with process_simple_events(...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Applied to files:
picojson/src/push_content_builder.rs
picojson/src/slice_content_builder.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
📚 Learning: in picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array ...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Applied to files:
picojson/src/push_content_builder.rs
picojson/src/slice_content_builder.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
📚 Learning: in the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and ful...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Applied to files:
picojson/src/push_content_builder.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
📚 Learning: in the picojson-rs project, the bitstack trait was redesigned to return bool instead of option...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Applied to files:
picojson/src/push_content_builder.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
📚 Learning: in picojson-rs sliceparser, is_empty() and is_past_end() serve different purposes: is_empty() return...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Applied to files:
picojson/src/push_content_builder.rs
picojson/src/slice_content_builder.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
📚 Learning: in json parsing tests for picojson-rs, when testing unicode escape sequences, raw string literals us...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#69
File: picojson/tests/json_checker_tests.rs:296-297
Timestamp: 2025-07-27T05:05:22.707Z
Learning: In JSON parsing tests for picojson-rs, when testing Unicode escape sequences, raw string literals use double backslashes (e.g., r#"\\uCAFE"#) to create JSON input containing single backslashes (\uCAFE) that the JSON parser processes. The double backslashes are not an error - they correctly represent the JSON input format that contains escape sequences for the parser to decode.
Applied to files:
picojson/src/push_content_builder.rs
picojson/src/slice_content_builder.rs
picojson/src/event_processor.rs
picojson/src/push_parser.rs
📚 Learning: in rust crate organization, functions can be imported into lib.rs via `use` statements not for direc...
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.
Applied to files:
picojson/src/push_parser.rs
🧬 Code Graph Analysis (1)
picojson/src/slice_content_builder.rs (6)
picojson/src/push_content_builder.rs (6)
extract_string_content
(288-303)extract_key_content
(305-318)extract_number
(320-339)get_borrowed_slice
(478-500)get_unescaped_slice
(502-506)has_unescaped_content
(508-510)picojson/src/stream_content_builder.rs (6)
extract_string_content
(198-206)extract_key_content
(208-216)extract_number
(218-235)get_borrowed_slice
(327-331)get_unescaped_slice
(333-335)has_unescaped_content
(337-339)picojson/src/slice_input_buffer.rs (1)
current_pos
(43-45)picojson/src/shared.rs (5)
end_position_excluding_delimiter
(153-155)get_content_piece
(264-282)get_borrowed_slice
(198-202)get_unescaped_slice
(209-209)has_unescaped_content
(216-216)picojson/src/escape_processor.rs (1)
process_unicode_escape_sequence
(684-746)picojson/src/copy_on_escape.rs (1)
has_unescaped_content
(210-212)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run AVR Test Suites
🔇 Additional comments (7)
picojson/src/push_content_builder.rs (2)
11-21
: Well-designed trait for push parser event handling.The trait is properly documented with clear generic parameters and follows Rust conventions.
477-511
: Correct implementation of DataSource for chunked input.The implementation properly handles position offset conversions and validates bounds, which is essential for chunked parsing.
picojson/src/event_processor.rs (2)
31-49
: Clear separation of chunked vs non-chunked parser construction.The dual constructor approach makes the parser type explicit and improves code clarity.
111-128
: Correct end-of-input handling for different parser types.The logic properly differentiates between chunked parsers (which need more data) and non-chunked parsers (which finalize parsing), ensuring correct behavior for both use cases.
picojson/src/slice_content_builder.rs (1)
172-197
: Well-implemented DataSource trait with proper lifetime management.The implementation correctly delegates to the underlying buffers and maintains proper lifetime boundaries between input and scratch data.
picojson/src/push_parser.rs (2)
26-49
: Excellent refactoring with clear separation of concerns.The new structure properly delegates content extraction to
PushContentExtractor
and shares parsing logic throughParserCore
, resulting in cleaner and more maintainable code.
133-157
: Clean implementation of parser finalization.The
finish
method properly validates parsing completion, emits the final event, and returns the handler in a single operation, improving the API ergonomics.
Summary by Sourcery
Introduce a DataSource abstraction to unify borrowed and unescaped JSON content extraction, refactor slice and stream content builders to leverage the new interface, and extend CopyOnEscape to support scratch buffer access.
Enhancements:
Summary by CodeRabbit
New Features
PushContentExtractor
component to support efficient event-driven, chunked JSON parsing.Refactor
DataSource
trait.Bug Fixes
Tests
Documentation