-
Notifications
You must be signed in to change notification settings - Fork 2
Even bigger refactor #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideThis PR consolidates StreamParser and SliceParser into a shared parsing core by introducing a new ParserCore module, and refactors each parser to delegate the main loop and byte provision to specialized provider/builder types. Common logic for token handling, escape processing, and content extraction has been moved into ParserCore, StreamContentBuilder, and SliceContentBuilder, removing duplication and simplifying parser implementations. Class diagram for refactored parser architectureclassDiagram
class ParserCore {
+Tokenizer
+ParserState
+next_event_impl_unified()
+next_event_impl_unified_with_accumulator()
}
class StreamParser {
+ParserCore parser_core
+StreamParserProvider provider
}
class SliceParser {
+ParserCore parser_core
+SliceContentBuilder content_builder
}
class StreamParserProvider {
+StreamContentBuilder content_builder
+Reader reader
+bool finished
}
class StreamContentBuilder {
+StreamBuffer stream_buffer
+State parser_state
+UnicodeEscapeCollector unicode_escape_collector
+bool unescaped_reset_queued
+bool in_unicode_escape
+bool in_escape_sequence
+bool finished
}
class SliceContentBuilder {
+SliceInputBuffer buffer
+CopyOnEscape copy_on_escape
+State parser_state
+UnicodeEscapeCollector unicode_escape_collector
+bool in_escape_sequence
}
ParserCore <|-- StreamParser
ParserCore <|-- SliceParser
StreamParser o-- StreamParserProvider
StreamParserProvider o-- StreamContentBuilder
SliceParser o-- SliceContentBuilder
StreamContentBuilder o-- StreamBuffer
StreamContentBuilder o-- UnicodeEscapeCollector
SliceContentBuilder o-- CopyOnEscape
SliceContentBuilder o-- UnicodeEscapeCollector
SliceContentBuilder o-- SliceInputBuffer
StreamParserProvider o-- Reader
Class diagram for trait relationships in the new parser coreclassDiagram
class ByteProvider {
<<trait>>
+next_byte()
}
class ContentExtractor {
<<trait>>
+parser_state_mut()
+current_position()
+begin_string_content()
+unicode_escape_collector_mut()
+extract_string_content()
+extract_key_content()
+extract_number_content()
+extract_number()
}
class EscapeHandler {
<<trait>>
+parser_state()
+process_unicode_escape_with_collector()
+handle_simple_escape_char()
+begin_escape_sequence()
+begin_unicode_escape()
}
class ParserProvider {
<<trait>>
}
ByteProvider <|.. ParserProvider
ContentExtractor <|.. ParserProvider
ContentExtractor <|.. StreamContentBuilder
ContentExtractor <|.. SliceContentBuilder
EscapeHandler <|.. StreamContentBuilder
EscapeHandler <|.. SliceContentBuilder
ByteProvider <|.. StreamContentBuilder
ByteProvider <|.. SliceContentBuilder
Class diagram for removed and replaced fields in StreamParser and SliceParserclassDiagram
class StreamParser {
-Tokenizer tokenizer
-ParserState parser_state
-Reader reader
-StreamBuffer stream_buffer
-ProcessingState processing_state
-UnicodeEscapeCollector unicode_escape_collector
+ParserCore parser_core
+StreamParserProvider provider
}
class SliceParser {
-Tokenizer tokenizer
-SliceInputBuffer buffer
-ParserState parser_state
-CopyOnEscape copy_on_escape
-UnicodeEscapeCollector unicode_escape_collector
+ParserCore parser_core
+SliceContentBuilder content_builder
}
Class diagram for new modules: parser_core, stream_content_builder, slice_content_builderclassDiagram
class parser_core {
+ParserCore
+EscapeTiming
}
class stream_content_builder {
+StreamContentBuilder
+StreamContentBuilderWithFiller
+StreamContentBuilderWithReader
}
class slice_content_builder {
+SliceContentBuilder
}
parser_core ..> StreamContentBuilder : uses
parser_core ..> SliceContentBuilder : uses
StreamContentBuilderWithFiller o-- StreamContentBuilder
StreamContentBuilderWithReader o-- StreamContentBuilder
Class diagram for ParserState changesclassDiagram
class ParserState {
-State state
+[Option<Event>; 2] evts
}
Class diagram for StreamParserProvider and its roleclassDiagram
class StreamParserProvider {
+StreamContentBuilder content_builder
+Reader reader
+bool finished
+new()
}
StreamParserProvider o-- StreamContentBuilder
StreamParserProvider o-- Reader
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Warning Rate limit exceeded@kaidokert has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 42 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (7)
WalkthroughThis change refactors the JSON parser by introducing unified core abstractions ( Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Parser (SliceParser/StreamParser)
participant ParserCore
participant ContentBuilder (Slice/Stream)
participant Reader/Buffer
User->>Parser: next_event()
Parser->>ParserCore: next_event_impl_unified(...)
ParserCore->>ContentBuilder: extract content / handle escapes
ParserCore->>ContentBuilder: request next byte
ContentBuilder->>Reader/Buffer: fetch/consume byte
ContentBuilder-->>ParserCore: return byte/content
ParserCore-->>Parser: Event or Error
Parser-->>User: Event or Error
Possibly related PRs
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @kaidokert, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request undertakes a substantial refactor of the picojson
library's parsing architecture. The primary goal is to eliminate code duplication between the SliceParser
and StreamParser
by consolidating their common event processing logic into a new, unified ParserCore
. This change introduces a more modular design, abstracting input handling and content extraction into dedicated builder components, which significantly enhances maintainability and clarity of the parsing pipeline.
Highlights
- Unified Parser Core: Introduced a new
ParserCore
module (picojson/src/parser_core.rs
) that encapsulates the shared event processing loop previously duplicated inSliceParser
andStreamParser
. This centralizes the core parsing logic. - Content Builder Abstraction: Created dedicated content builder modules,
SliceContentBuilder
(picojson/src/slice_content_builder.rs
) andStreamContentBuilder
(picojson/src/stream_content_builder.rs
), which implement theContentExtractor
andByteProvider
traits. These builders now handle parser-specific input management, string/key/number extraction, and escape sequence processing. - Simplified Parser Structures: The
SliceParser
andStreamParser
structs have been significantly simplified. They now primarily compose the newParserCore
and their respective content builders/providers, delegating most of the parsing responsibilities to these new components. - Refined Escape Handling: The
EscapeHandler
trait and related functions inpicojson/src/event_processor.rs
have been updated to better accommodate the distinct escape processing timings required by slice-based (onBegin
events for zero-copy) and stream-based (onEnd
events for accumulation) parsing. - Dependency and Linting Cleanup: Removed unused
test-log
andlog
dependencies fromCargo.toml
and added#[allow(dead_code)]
attributes inbuild.rs
for functions only used by the build script, improving code hygiene.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kaidokert - I've reviewed your changes - here's some feedback:
- The two filler/provider structs (
StreamContentBuilderWithFiller
andStreamContentBuilderWithReader
) have empty impl blocks and duplicate buffer-filling logic—please consolidate or remove them to avoid unused code. - The
ContentExtractor
impls duplicate number extraction in bothextract_number_content
andextract_number
; consider defaulting one in the trait or merging them to cut down boilerplate. - There are multiple
#[allow(dead_code)]
/#[allow(non_snake_case)]
attributes sprinkled across new modules—please audit and remove any that are no longer necessary.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The two filler/provider structs (`StreamContentBuilderWithFiller` and `StreamContentBuilderWithReader`) have empty impl blocks and duplicate buffer-filling logic—please consolidate or remove them to avoid unused code.
- The `ContentExtractor` impls duplicate number extraction in both `extract_number_content` and `extract_number`; consider defaulting one in the trait or merging them to cut down boilerplate.
- There are multiple `#[allow(dead_code)]` / `#[allow(non_snake_case)]` attributes sprinkled across new modules—please audit and remove any that are no longer necessary.
## Individual Comments
### Comment 1
<location> `picojson/src/stream_content_builder.rs:500` </location>
<code_context>
+ fn process_unicode_escape_with_collector(&mut self) -> Result<(), ParseError> {
</code_context>
<issue_to_address>
UTF-8 byte copying for escape sequences could be simplified.
Consider iterating directly over the UTF-8 bytes instead of copying them to a fixed-size array. If the copy is required, please document the reason.
Suggested implementation:
```rust
// Shared Unicode escape processing pattern - process UTF-8 bytes directly to avoid unnecessary copying
let utf8_bytes_result = {
let current_pos = self.stream_buffer.current_position();
let hex_slice_provider = |start, end| {
self.stream_buffer
.get_string_slice(start, end)
.map_err(Into::into)
```
```rust
// Shared Unicode escape processing pattern - process UTF-8 bytes directly to avoid unnecessary copying
let utf8_bytes_result = {
let current_pos = self.stream_buffer.current_position();
let hex_slice_provider = |start, end| {
self.stream_buffer
.get_string_slice(start, end)
.map_err(Into::into)
};
// Instead of copying to a fixed-size array, iterate directly over the UTF-8 bytes
// If a copy is required for borrow checker reasons, document it here.
// For now, we assume direct iteration is possible:
// Example:
// for b in utf8_bytes.iter() {
// // process each byte
// }
```
If the rest of the function (not shown) still uses a fixed-size array for UTF-8 bytes, you should refactor it to use direct iteration over the byte slice returned by the string slice provider. If you encounter borrow checker issues that require a copy, add a comment explaining why the copy is necessary.
</issue_to_address>
### Comment 2
<location> `picojson/tests/stream_parser_stress_test.rs:217` </location>
<code_context>
+ }
}
#[test]
</code_context>
<issue_to_address>
New test added for debugging Unicode and escape handling in StreamParser.
Please add assertions to verify the parsed content and character codes, so the test can automatically detect regressions instead of relying on printed output.
Suggested implementation:
```rust
#[test]
fn debug_unicode_issue() {
let json = br#"["a\nb\t\"\\c\u1234d"]"#;
let reader = ChunkReader::new(json, 5);
let mut buffer = [0u8; 30];
let mut parser = StreamParser::new(reader, &mut buffer);
let mut event_count = 0;
let mut parsed_string = None;
loop {
let event = match parser.next_event() {
```
```rust
loop {
let event = match parser.next_event() {
```
```rust
let event = match parser.next_event() {
Ok(ev) => ev,
Err(e) => panic!("parse error: {:?}", e),
};
if let Some(ev) = event {
event_count += 1;
// Capture the parsed string value
if let picojson::Event::String(s) = &ev {
parsed_string = Some(s.clone());
}
} else {
break;
}
}
// The expected string is: a\nb\t"\\c\u{1234}d
let expected = "a\nb\t\"\\c\u{1234}d";
let parsed = parsed_string.expect("No string parsed");
assert_eq!(parsed, expected, "Parsed string does not match expected value");
// Check the character codes
let expected_codes: Vec<u32> = expected.chars().map(|c| c as u32).collect();
let parsed_codes: Vec<u32> = parsed.chars().map(|c| c as u32).collect();
assert_eq!(parsed_codes, expected_codes, "Parsed character codes do not match expected");
// Optionally, check event count (should be 3: StartArray, String, EndArray)
assert_eq!(event_count, 3, "Unexpected number of events");
}
```
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
.mark_filled(bytes_read) | ||
.map_err(crate::ParseError::from)?; | ||
} | ||
Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: UTF-8 byte copying for escape sequences could be simplified.
Consider iterating directly over the UTF-8 bytes instead of copying them to a fixed-size array. If the copy is required, please document the reason.
Suggested implementation:
// Shared Unicode escape processing pattern - process UTF-8 bytes directly to avoid unnecessary copying
let utf8_bytes_result = {
let current_pos = self.stream_buffer.current_position();
let hex_slice_provider = |start, end| {
self.stream_buffer
.get_string_slice(start, end)
.map_err(Into::into)
// Shared Unicode escape processing pattern - process UTF-8 bytes directly to avoid unnecessary copying
let utf8_bytes_result = {
let current_pos = self.stream_buffer.current_position();
let hex_slice_provider = |start, end| {
self.stream_buffer
.get_string_slice(start, end)
.map_err(Into::into)
};
// Instead of copying to a fixed-size array, iterate directly over the UTF-8 bytes
// If a copy is required for borrow checker reasons, document it here.
// For now, we assume direct iteration is possible:
// Example:
// for b in utf8_bytes.iter() {
// // process each byte
// }
If the rest of the function (not shown) still uses a fixed-size array for UTF-8 bytes, you should refactor it to use direct iteration over the byte slice returned by the string slice provider. If you encounter borrow checker issues that require a copy, add a comment explaining why the copy is necessary.
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): New test added for debugging Unicode and escape handling in StreamParser.
Please add assertions to verify the parsed content and character codes, so the test can automatically detect regressions instead of relying on printed output.
Suggested implementation:
#[test]
fn debug_unicode_issue() {
let json = br#"["a\nb\t\"\\c\u1234d"]"#;
let reader = ChunkReader::new(json, 5);
let mut buffer = [0u8; 30];
let mut parser = StreamParser::new(reader, &mut buffer);
let mut event_count = 0;
let mut parsed_string = None;
loop {
let event = match parser.next_event() {
loop {
let event = match parser.next_event() {
let event = match parser.next_event() {
Ok(ev) => ev,
Err(e) => panic!("parse error: {:?}", e),
};
if let Some(ev) = event {
event_count += 1;
// Capture the parsed string value
if let picojson::Event::String(s) = &ev {
parsed_string = Some(s.clone());
}
} else {
break;
}
}
// The expected string is: a\nb\t"\\c\u{1234}d
let expected = "a\nb\t\"\\c\u{1234}d";
let parsed = parsed_string.expect("No string parsed");
assert_eq!(parsed, expected, "Parsed string does not match expected value");
// Check the character codes
let expected_codes: Vec<u32> = expected.chars().map(|c| c as u32).collect();
let parsed_codes: Vec<u32> = parsed.chars().map(|c| c as u32).collect();
assert_eq!(parsed_codes, expected_codes, "Parsed character codes do not match expected");
// Optionally, check event count (should be 3: StartArray, String, EndArray)
assert_eq!(event_count, 3, "Unexpected number of events");
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a significant and well-executed refactoring that unifies the SliceParser
and StreamParser
logic into a shared ParserCore
. The new architecture using ContentBuilder
and Provider
traits greatly reduces code duplication and improves maintainability. I've found a few issues, including a potential correctness bug in surrogate pair handling and some dead code. Once these are addressed, this will be a very solid improvement.
// Fix: escape_start_pos should point to the backslash, not the 'u' | ||
// But don't subtract if it's already pointing to the backslash | ||
let actual_escape_start_pos = escape_start_pos; | ||
|
||
// Handle UTF-8 bytes if we have them (not a high surrogate waiting for low surrogate) | ||
if let Some(utf8_bytes) = utf8_bytes_opt { | ||
if had_pending_high_surrogate { | ||
// This is completing a surrogate pair - need to consume both escapes | ||
// First call: consume the high surrogate (6 bytes earlier) | ||
self.copy_on_escape | ||
.handle_unicode_escape(actual_escape_start_pos, &[])?; | ||
// Second call: consume the low surrogate and write UTF-8 | ||
self.copy_on_escape | ||
.handle_unicode_escape(actual_escape_start_pos + 6, utf8_bytes)?; | ||
} else { | ||
// Single Unicode escape - normal processing | ||
self.copy_on_escape | ||
.handle_unicode_escape(actual_escape_start_pos, utf8_bytes)?; | ||
} | ||
} | ||
|
||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for handling surrogate pairs may be incorrect. The // Fix:
comment on line 165 is not accompanied by any change in logic, as actual_escape_start_pos
is just assigned the value of escape_start_pos
. This is confusing and should be either implemented or removed. Also, the surrogate pair handling logic starting on line 171 seems flawed. When processing a low surrogate, actual_escape_start_pos
points to its start. The high surrogate was 6 bytes before this. The current code calls handle_unicode_escape
with actual_escape_start_pos
and actual_escape_start_pos + 6
, which seems incorrect.
The logic should probably be:
- Handle the high surrogate at
actual_escape_start_pos - 6
. - Handle the low surrogate at
actual_escape_start_pos
and write the combined UTF-8 character.
Here is a suggested correction, which also removes the confusing actual_escape_start_pos
variable:
// Shared Unicode escape processing pattern
let had_pending_high_surrogate = self.unicode_escape_collector.has_pending_high_surrogate();
let mut utf8_buf = [0u8; 4];
let (utf8_bytes_opt, escape_start_pos) =
crate::escape_processor::process_unicode_escape_sequence(
current_pos,
&mut self.unicode_escape_collector,
hex_slice_provider,
&mut utf8_buf,
)?;
// Handle UTF-8 bytes if we have them (not a high surrogate waiting for low surrogate)
if let Some(utf8_bytes) = utf8_bytes_opt {
if had_pending_high_surrogate {
// This is completing a surrogate pair.
// First, handle the high surrogate at its original position (6 bytes before the current one).
self.copy_on_escape
.handle_unicode_escape(escape_start_pos.saturating_sub(6), &[])?;
// Then, handle the low surrogate and write the combined UTF-8 character.
self.copy_on_escape
.handle_unicode_escape(escape_start_pos, utf8_bytes)?;
} else {
// Single Unicode escape - normal processing
self.copy_on_escape
.handle_unicode_escape(escape_start_pos, utf8_bytes)?;
}
}
Ok(())
picojson/src/stream_buffer.rs
Outdated
} | ||
|
||
// Update positions | ||
let _old_tokenize_pos = self.tokenize_pos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Custom ByteProvider that can handle reader filling for StreamParser | ||
pub struct StreamContentBuilderWithFiller<'a, 'b, R: crate::stream_parser::Reader> { | ||
content_builder: &'a mut StreamContentBuilder<'b>, | ||
reader: &'a mut R, | ||
finished: &'a mut bool, | ||
} | ||
|
||
impl<R: crate::stream_parser::Reader> StreamContentBuilderWithFiller<'_, '_, R> {} | ||
|
||
impl<R: crate::stream_parser::Reader> crate::shared::ByteProvider | ||
for StreamContentBuilderWithFiller<'_, '_, R> | ||
{ | ||
fn next_byte(&mut self) -> Result<Option<u8>, crate::ParseError> { | ||
// If buffer is empty, try to fill it first | ||
if self.content_builder.stream_buffer.is_empty() { | ||
self.content_builder.fill_buffer_from_reader(self.reader)?; | ||
} | ||
|
||
// If still empty after fill attempt, we're at EOF | ||
if self.content_builder.stream_buffer.is_empty() { | ||
// Set finished flag when we reach end of stream | ||
if !*self.finished { | ||
*self.finished = true; | ||
self.content_builder.set_finished(true); | ||
} | ||
return Ok(None); | ||
} | ||
|
||
// Get byte and advance | ||
let byte = self.content_builder.stream_buffer.current_byte()?; | ||
self.content_builder.stream_buffer.advance()?; | ||
Ok(Some(byte)) | ||
} | ||
} | ||
|
||
/// Provider that can fill the buffer from a reader | ||
pub struct StreamContentBuilderWithReader<'a, 'b, R: crate::stream_parser::Reader> { | ||
pub content_builder: &'a mut StreamContentBuilder<'b>, | ||
reader: &'a mut R, | ||
finished: &'a mut bool, | ||
} | ||
|
||
impl<R: crate::stream_parser::Reader> StreamContentBuilderWithReader<'_, '_, R> {} | ||
|
||
impl<R: crate::stream_parser::Reader> crate::shared::ByteProvider | ||
for StreamContentBuilderWithReader<'_, '_, R> | ||
{ | ||
fn next_byte(&mut self) -> Result<Option<u8>, crate::ParseError> { | ||
// If buffer is empty, try to fill it first | ||
if self.content_builder.stream_buffer.is_empty() { | ||
if let Some(fill_slice) = self.content_builder.stream_buffer.get_fill_slice() { | ||
let bytes_read = self | ||
.reader | ||
.read(fill_slice) | ||
.map_err(|_| crate::ParseError::ReaderError)?; | ||
self.content_builder | ||
.stream_buffer | ||
.mark_filled(bytes_read) | ||
.map_err(crate::ParseError::from)?; | ||
} | ||
} | ||
|
||
// If still empty after fill attempt, we're at EOF | ||
if self.content_builder.stream_buffer.is_empty() { | ||
// Set finished flag when we reach end of stream | ||
if !*self.finished { | ||
*self.finished = true; | ||
self.content_builder.set_finished(true); | ||
} | ||
return Ok(None); | ||
} | ||
|
||
// Get byte and advance | ||
let byte = self.content_builder.stream_buffer.current_byte()?; | ||
self.content_builder.stream_buffer.advance()?; | ||
Ok(Some(byte)) | ||
} | ||
} | ||
|
||
impl<R: crate::stream_parser::Reader> crate::event_processor::ContentExtractor | ||
for StreamContentBuilderWithReader<'_, '_, R> | ||
{ | ||
fn parser_state_mut(&mut self) -> &mut crate::shared::State { | ||
self.content_builder.parser_state_mut() | ||
} | ||
|
||
fn current_position(&self) -> usize { | ||
self.content_builder.current_position() | ||
} | ||
|
||
fn begin_string_content(&mut self, pos: usize) { | ||
self.content_builder.begin_string_content(pos); | ||
} | ||
|
||
fn unicode_escape_collector_mut( | ||
&mut self, | ||
) -> &mut crate::escape_processor::UnicodeEscapeCollector { | ||
self.content_builder.unicode_escape_collector_mut() | ||
} | ||
|
||
fn extract_string_content( | ||
&mut self, | ||
start_pos: usize, | ||
) -> Result<crate::Event<'_, '_>, crate::ParseError> { | ||
self.content_builder.extract_string_content(start_pos) | ||
} | ||
|
||
fn extract_key_content( | ||
&mut self, | ||
start_pos: usize, | ||
) -> Result<crate::Event<'_, '_>, crate::ParseError> { | ||
self.content_builder.extract_key_content(start_pos) | ||
} | ||
|
||
fn extract_number_content( | ||
&mut self, | ||
start_pos: usize, | ||
from_container_end: bool, | ||
) -> Result<crate::Event<'_, '_>, crate::ParseError> { | ||
self.content_builder | ||
.extract_number_content(start_pos, from_container_end) | ||
} | ||
|
||
fn extract_number( | ||
&mut self, | ||
start_pos: usize, | ||
from_container_end: bool, | ||
finished: bool, | ||
) -> Result<crate::Event<'_, '_>, crate::ParseError> { | ||
self.content_builder | ||
.extract_number(start_pos, from_container_end, finished) | ||
} | ||
} | ||
|
||
impl<R: crate::stream_parser::Reader> crate::event_processor::EscapeHandler | ||
for StreamContentBuilderWithReader<'_, '_, R> | ||
{ | ||
fn parser_state(&self) -> &crate::shared::State { | ||
self.content_builder.parser_state() | ||
} | ||
|
||
fn process_unicode_escape_with_collector(&mut self) -> Result<(), crate::ParseError> { | ||
self.content_builder.process_unicode_escape_with_collector() | ||
} | ||
|
||
fn handle_simple_escape_char(&mut self, escape_char: u8) -> Result<(), crate::ParseError> { | ||
self.content_builder.handle_simple_escape_char(escape_char) | ||
} | ||
|
||
fn begin_escape_sequence(&mut self) -> Result<(), crate::ParseError> { | ||
self.content_builder.begin_escape_sequence() | ||
} | ||
|
||
fn begin_unicode_escape(&mut self) -> Result<(), crate::ParseError> { | ||
self.content_builder.begin_unicode_escape() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structs StreamContentBuilderWithFiller
and StreamContentBuilderWithReader
and their implementations appear to be unused, as StreamParser
now uses StreamParserProvider
to bridge the Reader
and the ParserCore
. These structs seem like leftovers from a previous stage of the refactoring and should be removed if they are indeed dead code to improve maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
picojson/src/parser_core.rs (1)
120-149
: Consider refactoring duplicate escape token patterns.The pattern matching for escape tokens is duplicated between the
OnBegin
andOnEnd
cases. While the current implementation is clear with good comments explaining the different timing requirements, you could reduce duplication by extracting the token pattern into a helper function or constant.+/// Helper to check if a token is a simple escape token +fn is_simple_escape_token(token: &EventToken) -> Option<&EventToken> { + match token { + EventToken::EscapeQuote + | EventToken::EscapeBackslash + | EventToken::EscapeSlash + | EventToken::EscapeBackspace + | EventToken::EscapeFormFeed + | EventToken::EscapeNewline + | EventToken::EscapeCarriageReturn + | EventToken::EscapeTab => Some(token), + _ => None, + } +} + ujson::Event::Begin(EventToken::EscapeSequence) => { process_begin_escape_sequence_event(provider)?; } _ if process_unicode_escape_events(&taken, provider)? => { // Unicode escape events handled by shared function } - ujson::Event::Begin( - escape_token @ (EventToken::EscapeQuote - | EventToken::EscapeBackslash - | EventToken::EscapeSlash - | EventToken::EscapeBackspace - | EventToken::EscapeFormFeed - | EventToken::EscapeNewline - | EventToken::EscapeCarriageReturn - | EventToken::EscapeTab), - ) if escape_timing == EscapeTiming::OnBegin => { + ujson::Event::Begin(token) if escape_timing == EscapeTiming::OnBegin => { + if let Some(escape_token) = is_simple_escape_token(token) { // SliceParser-specific: Handle simple escape sequences on Begin events // because CopyOnEscape requires starting unescaping immediately when // the escape token begins to maintain zero-copy optimization process_simple_escape_event(&escape_token, provider)?; + } } - ujson::Event::End( - escape_token @ (EventToken::EscapeQuote - | EventToken::EscapeBackslash - | EventToken::EscapeSlash - | EventToken::EscapeBackspace - | EventToken::EscapeFormFeed - | EventToken::EscapeNewline - | EventToken::EscapeCarriageReturn - | EventToken::EscapeTab), - ) if escape_timing == EscapeTiming::OnEnd => { + ujson::Event::End(token) if escape_timing == EscapeTiming::OnEnd => { + if let Some(escape_token) = is_simple_escape_token(token) { // StreamParser-specific: Handle simple escape sequences on End events // because StreamBuffer must wait until the token ends to accumulate // all bytes before processing the complete escape sequence process_simple_escape_event(&escape_token, provider)?; + } }picojson/src/stream_content_builder.rs (1)
84-119
: Refactor repeated position update pattern.The position update logic after compaction repeats the same pattern for each state variant. This could be simplified.
fn update_positions_after_compaction( &mut self, compaction_offset: usize, ) -> Result<(), crate::ParseError> { + // Helper to safely subtract compaction offset + let adjust_position = |pos: &mut usize| { + *pos = pos.saturating_sub(compaction_offset); + }; + // Update positions - since we compact from the token start position, // positions should not be discarded in normal operation match &mut self.parser_state { crate::shared::State::None => { // No position-based state to update } - crate::shared::State::Key(pos) => { - if *pos >= compaction_offset { - *pos = pos.checked_sub(compaction_offset).unwrap_or(0); - } else { - // This shouldn't happen since we compact from the token start - *pos = 0; - } - } - crate::shared::State::String(pos) => { - if *pos >= compaction_offset { - *pos = pos.checked_sub(compaction_offset).unwrap_or(0); - } else { - // This shouldn't happen since we compact from the token start - *pos = 0; - } - } - crate::shared::State::Number(pos) => { - if *pos >= compaction_offset { - *pos = pos.checked_sub(compaction_offset).unwrap_or(0); - } else { - // This shouldn't happen since we compact from the token start - *pos = 0; - } - } + crate::shared::State::Key(pos) + | crate::shared::State::String(pos) + | crate::shared::State::Number(pos) => { + adjust_position(pos); + } } Ok(()) }picojson/src/stream_parser.rs (1)
78-87
: Consider making the documentation more concise.The current comments are somewhat repetitive. A more concise version would improve readability:
-/// Implement the required traits for StreamParser to work with unified ParserCore -/// This provider handles the StreamParser-specific operations needed by the unified parser core -/// It bridges the gap between the generic ParserCore and the StreamParser's specific requirements -/// for streaming input and buffer management -/// The provider contains mutable references to the StreamParser's internal state -/// which allows the unified parser core to control the parsing process +/// Provider that bridges StreamParser with the unified ParserCore. +/// Handles streaming-specific operations like buffer management and reader access.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
picojson/Cargo.toml
(0 hunks)picojson/build.rs
(3 hunks)picojson/src/chunk_reader.rs
(1 hunks)picojson/src/event_processor.rs
(5 hunks)picojson/src/json_number.rs
(1 hunks)picojson/src/lib.rs
(1 hunks)picojson/src/number_parser.rs
(1 hunks)picojson/src/parse_error.rs
(1 hunks)picojson/src/parser_core.rs
(1 hunks)picojson/src/shared.rs
(0 hunks)picojson/src/slice_content_builder.rs
(1 hunks)picojson/src/slice_input_buffer.rs
(1 hunks)picojson/src/slice_parser.rs
(4 hunks)picojson/src/stream_buffer.rs
(1 hunks)picojson/src/stream_content_builder.rs
(1 hunks)picojson/src/stream_parser.rs
(4 hunks)picojson/tests/debug_root_numbers.rs
(0 hunks)picojson/tests/debug_stress_case.rs
(0 hunks)picojson/tests/stream_parser_stress_test.rs
(1 hunks)
💤 Files with no reviewable changes (4)
- picojson/src/shared.rs
- picojson/tests/debug_root_numbers.rs
- picojson/Cargo.toml
- picojson/tests/debug_stress_case.rs
🧰 Additional context used
🧠 Learnings (16)
📓 Common learnings
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
picojson/src/lib.rs (4)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
picojson/src/json_number.rs (2)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
picojson/src/slice_input_buffer.rs (1)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
picojson/src/chunk_reader.rs (2)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
picojson/src/parse_error.rs (3)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
picojson/src/stream_buffer.rs (5)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
picojson/tests/stream_parser_stress_test.rs (6)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
picojson/src/number_parser.rs (5)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
picojson/build.rs (3)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/lib.rs:19-21
Timestamp: 2025-06-28T18:14:22.871Z
Learning: In Rust crate organization, functions can be imported into lib.rs via `use` statements not for direct usage within lib.rs itself, but to make them available to other modules within the same crate via the `crate::` path. This is a valid pattern and such imports should not be flagged as unused even if lib.rs doesn't directly call them.
picojson/src/event_processor.rs (7)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
picojson/src/parser_core.rs (7)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
picojson/src/stream_content_builder.rs (3)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
picojson/src/slice_parser.rs (7)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
picojson/src/stream_parser.rs (7)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#44
File: picojson/src/chunk_reader.rs:28-33
Timestamp: 2025-07-07T01:39:55.177Z
Learning: The PullParser trait in picojson-rs provides both next() and next_event() methods. The next() method is an iterator-like convenience method that returns Option<Result<Event, ParseError>>, returning None when EndDocument is reached. The next_event() method returns Result<Event, ParseError> directly. Both methods are valid and the choice depends on whether you want iterator-style usage (next) or direct result handling (next_event).
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/event_processor.rs:188-203
Timestamp: 2025-07-13T05:06:50.688Z
Learning: In picojson-rs event_processor module, the tokenizer callback intentionally uses a fixed-size array of 2 slots and silently drops events when full. This is deliberate design for embedded/constrained environments requiring: zero-allocation guarantee, panic-free operation, deterministic memory usage, and fixed memory footprint. The array size of 2 is sufficient for ujson's event generation patterns when processed in tight loops.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: tokenizer/src/bitstack/mod.rs:0-0
Timestamp: 2025-06-28T23:43:22.783Z
Learning: In the picojson-rs project, the BitStack trait was redesigned to return bool instead of Option<bool> for pop() and top() methods. Empty stacks return false rather than None, which simplifies the API and avoids Option handling.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
picojson/src/slice_content_builder.rs (4)
Learnt from: kaidokert
PR: kaidokert/picojson-rs#5
File: picojson/src/lib.rs:0-0
Timestamp: 2025-06-29T17:48:18.198Z
Learning: In the picojson-rs project, the `use tokenizer as ujson;` alias in lib.rs is a transitionary and fully internal private alias used during crate reorganization. Examples and external code no longer depend on this alias, making the private visibility appropriate.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:0-0
Timestamp: 2025-07-13T05:10:01.847Z
Learning: In picojson-rs event processing, the SliceParser uses an if/else pattern with process_simple_events() first, then process_begin_events() as fallback. Both branches use identical match statements for all EventResult variants for consistency and maintainability, even though process_begin_events() only returns Continue or None while process_simple_events() can return all variants.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#55
File: picojson/src/slice_parser.rs:273-286
Timestamp: 2025-07-13T05:11:46.914Z
Learning: In picojson-rs SliceParser, is_empty() and is_past_end() serve different purposes: is_empty() returns true when pos >= data.len() (at document boundary, all input consumed), while is_past_end() returns true when pos > data.len() (gone beyond input). For number parsing delimiter logic, is_empty() is correct because it detects when parsing the last token at document end, whereas is_past_end() would incorrectly indicate not at document end for standalone numbers.
Learnt from: kaidokert
PR: kaidokert/picojson-rs#1
File: stax/src/flex_parser.rs:0-0
Timestamp: 2025-06-28T18:12:30.015Z
Learning: In the stax JSON parser codebase, EscapeSequence event handlers exist in flex_parser.rs not because they're needed by that parser variant, but to avoid catch-all patterns in match statements. The flex parser doesn't need to process EscapeSequence events, but the other parser variant (direct parser) does need them.
🧬 Code Graph Analysis (2)
picojson/src/event_processor.rs (3)
picojson/src/stream_content_builder.rs (4)
begin_unicode_escape
(456-458)begin_unicode_escape
(493-498)extract_number
(267-282)extract_number
(426-434)picojson/src/stream_parser.rs (2)
begin_unicode_escape
(142-144)extract_number
(183-191)picojson/src/slice_content_builder.rs (1)
extract_number
(106-129)
picojson/src/stream_content_builder.rs (7)
picojson/src/stream_buffer.rs (3)
new
(73-80)current_position
(236-238)current_position
(1007-1009)picojson/src/number_parser.rs (4)
new
(93-99)current_position
(22-22)current_position
(110-112)parse_number_with_delimiter_logic
(42-55)picojson/src/stream_parser.rs (17)
new
(46-48)new
(91-97)new
(258-260)parser_state_mut
(148-150)current_position
(152-154)begin_string_content
(156-158)unicode_escape_collector_mut
(160-164)extract_string_content
(166-168)extract_key_content
(170-172)extract_number_content
(174-181)extract_number
(183-191)next_byte
(101-122)parser_state
(126-128)process_unicode_escape_with_collector
(130-132)handle_simple_escape_char
(134-136)begin_escape_sequence
(138-140)begin_unicode_escape
(142-144)picojson/src/escape_processor.rs (2)
new
(200-206)process_unicode_escape_sequence
(659-700)picojson/src/event_processor.rs (25)
new
(425-431)parser_state_mut
(108-108)parser_state_mut
(457-459)current_position
(102-102)current_position
(449-451)begin_string_content
(105-105)begin_string_content
(453-455)unicode_escape_collector_mut
(112-114)unicode_escape_collector_mut
(461-465)extract_string_content
(117-120)extract_string_content
(467-472)extract_key_content
(123-126)extract_key_content
(474-479)extract_number_content
(129-133)extract_number_content
(481-487)extract_number
(141-146)extract_number
(489-496)parser_state
(14-14)parser_state
(435-437)process_unicode_escape_with_collector
(17-17)process_unicode_escape_with_collector
(439-441)handle_simple_escape_char
(20-20)handle_simple_escape_char
(443-445)begin_escape_sequence
(24-26)begin_unicode_escape
(30-32)picojson/src/shared.rs (5)
new
(62-66)from_utf8
(181-183)string_content_bounds_from_content_start
(118-124)end_position_excluding_delimiter
(149-151)next_byte
(178-178)picojson/src/parse_error.rs (4)
from
(39-46)from
(50-61)from
(65-67)from
(71-73)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run AVR Test Suites
🔇 Additional comments (24)
picojson/build.rs (3)
82-82
: LGTM: Improved lint suppression scopeThe change from
#[allow(clippy::non_snake_case)]
to#![allow(non_snake_case)]
is appropriate as it applies to the entire generated module, ensuring consistent test naming that matches the original JSON test suite conventions.
133-133
: LGTM: Appropriate dead code suppressionThe
#[allow(dead_code)]
attributes are correctly applied to utility functions that are part of the build script infrastructure but may not be actively used in all configurations.Also applies to: 168-168, 174-174, 180-180
161-161
: LGTM: Modern string formatting syntaxThe update to use
format!("{result}_{count}")
follows Rust's modern formatting style introduced in 1.58+, improving code readability.picojson/src/number_parser.rs (1)
64-64
: LGTM: Minor documentation formatting improvementThe added blank line improves the readability of the documentation comment without affecting functionality.
picojson/src/slice_input_buffer.rs (1)
57-61
: LGTM: Useful API additionThe new
data_len()
method provides a clean way to access the underlying data length for bounds checking, which supports the refactored parsing architecture.picojson/src/json_number.rs (1)
146-146
: LGTM: Modern string formatting syntaxThe updates to use named arguments in format strings (
{val}
instead of"{}", val
) follow modern Rust formatting conventions and improve code readability.Also applies to: 148-148
picojson/src/chunk_reader.rs (1)
122-122
: LGTM: Simplified lifetime annotationThe change from explicit lifetime parameter
'a
to anonymous lifetime'_
is appropriate here since the lifetime doesn't need to be explicitly referenced within the implementation.picojson/src/lib.rs (1)
66-67
: LGTM: Clean module additions for unified parser architectureThe new module declarations (
parser_core
,stream_content_builder
,slice_content_builder
) are well-organized and align with the PR objective of consolidating to a single parser. The placement within the existing module hierarchy is appropriate.Also applies to: 70-71, 74-75
picojson/src/parse_error.rs (1)
79-81
: LGTM: Modern formatting syntax improvementsThe updates to use Rust's more concise formatting syntax (
{e}
instead of{}", e
) are clean modernizations that improve readability without changing functionality.picojson/src/stream_buffer.rs (2)
149-156
: Excellent safety improvement: Replace wrapping addition with checked additionThe change from
wrapping_add
tochecked_add
with proper error handling significantly improves safety by preventing silent integer overflow that could lead to memory safety issues. ReturningInvalidSliceBounds
error on overflow is the correct approach for bounds validation.
164-164
: LGTM: Intentional unused variable for debuggingThe
_old_tokenize_pos
variable properly captures the previous value and is appropriately marked as unused. This appears to be intentional for debugging or future use.picojson/tests/stream_parser_stress_test.rs (1)
217-262
: LGTM: Comprehensive Unicode escape test for streaming parserThe new
debug_unicode_issue
test function provides excellent coverage for Unicode escape handling in the streaming parser. The test setup with chunked reading (5-byte chunks, 30-byte buffer) and detailed debug output for character analysis is well-designed for validating the unified parser architecture's escape sequence handling.picojson/src/event_processor.rs (5)
28-32
: LGTM: Well-designed trait method with sensible defaultThe new
begin_unicode_escape
method in theEscapeHandler
trait is well-designed with clear documentation and a sensible default no-op implementation. This allows parsers that don't need special Unicode escape handling to use the default while providing flexibility for those that do.
135-146
: LGTM: Comprehensive number extraction methodThe new
extract_number
method in theContentExtractor
trait is well-documented with clear parameter descriptions. The method signature properly handles the differences between slice and stream parsing (thefinished
parameter for StreamParser-specific logic) while maintaining a unified interface.
205-207
: LGTM: Simplified lifetime parameters improve readabilityThe simplification of lifetime parameters in
create_tokenizer_callback
using anonymous lifetimes ('_
) improves readability while maintaining the same functionality. This aligns with modern Rust style guidelines.
280-280
: LGTM: Properly guarded method callThe call to
begin_unicode_escape()
is properly guarded by the state check ensuring it only executes when inside a string or key context. This maintains the same pattern as other escape handling code.
489-496
: LGTM: Appropriate mock implementationThe stub implementation of
extract_number
in the mock is appropriate for testing. Usingunimplemented!()
with a clear message is the correct approach for mock methods that aren't exercised in tests.picojson/src/slice_content_builder.rs (1)
85-129
: Excellent defensive programming in number extraction.The bounds clamping logic using
core::cmp::min()
prevents slice bounds errors when parsing standalone numbers at document end. The explicit acknowledgment of the unusedfinished
parameter with clear explanation is also good practice.picojson/src/stream_parser.rs (6)
3-38
: Excellent simplification of StreamParser structure.The consolidation from 6 internal fields to just 2 (
parser_core
andprovider
) significantly reduces state management complexity and aligns well with the unified parser architecture.
193-208
: Good handling of EOF for number parsing.The override of
validate_and_extract_number
correctly uses thefinished
state to handle number parsing at end-of-document, which is essential for streaming parsers.
210-226
: Clean delegation to unified parser core.The implementation properly delegates to
parser_core.next_event_impl_unified_with_accumulator
with appropriate timing (EscapeTiming::OnEnd
) for streaming parsers. The byte accumulation closure correctly delegates to the content builder.
232-245
: Correct implementation of PullParser trait.The implementation properly checks the
finished
flag and applies unescaped reset before delegating tonext_event_impl
, maintaining consistency with the SliceParser pattern.
1203-1214
: Good conversion to assertion-based testing.The change from printing to assertions improves test automation and makes failures more explicit.
289-1282
: Comprehensive test coverage for the refactored StreamParser.The test suite thoroughly covers:
- Basic JSON structures (objects, arrays, primitives)
- Escape sequence processing including Unicode
- Number parsing with various configurations
- Edge cases like buffer boundaries and EOF handling
- Error recovery scenarios
This extensive coverage provides confidence in the refactored implementation.
Ok(()) | ||
// Get byte and advance | ||
let byte = self.content_builder.stream_buffer().current_byte()?; | ||
let _current_pos = self.content_builder.stream_buffer().current_position(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused variable.
The _current_pos
variable is created but never used.
- let _current_pos = self.content_builder.stream_buffer().current_position();
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
let _current_pos = self.content_builder.stream_buffer().current_position(); |
🤖 Prompt for AI Agents
In picojson/src/stream_parser.rs at line 119, the variable _current_pos is
declared but never used. Remove the declaration of _current_pos entirely to
clean up the code and avoid unused variable warnings.
Consolidate to single parser
Summary by Sourcery
Unify StreamParser and SliceParser event loops by extracting common logic into a new ParserCore, and refactor parsing responsibilities into dedicated content builder and provider modules to eliminate duplication, simplify escape handling, and streamline byte provisioning across both parsers.
New Features:
Enhancements:
Build:
Tests:
Chores:
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests
Chores