Skip to content

Add JSON::ResumableParser#partial_value? and #empty?#1048

Merged
byroot merged 1 commit into
ruby:masterfrom
Watson1978:resumable-parser-incomplete-p
Jul 3, 2026
Merged

Add JSON::ResumableParser#partial_value? and #empty?#1048
byroot merged 1 commit into
ruby:masterfrom
Watson1978:resumable-parser-incomplete-p

Conversation

@Watson1978

@Watson1978 Watson1978 commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Fixes #1047

partial_value?

Returns whether a document is currently under construction (unclosed container, key awaiting its value, etc.). It answers the same question as !partial_value.nil?, but as a cheap predicate on the parser's internal value stack, without materializing the partially parsed Ruby objects.
A fully parsed document whose value hasn't been retrieved yet is not under construction, that state is covered by value?.

Note that a container whose first key or element hasn't been parsed yet (e.g. a stream ending right after {) has nothing registered on the parser's stacks so partial_value? is false there, consistently with partial_value returning nil.
The truncation is still observable through the buffer: eos? is false and rest isn't empty. The test suite asserts partial_value? == !partial_value.nil? across all cases.

empty?

Strict semantics: true only when the buffer is fully consumed, no document is under construction, and no parsed value awaits retrieval.

def empty?
  eos? && !partial_value? && !value?
end

This is the single call answering the truncation question from #1047:

Input (after draining with parse/value) empty?
'' true nothing fed
'{"a":1}' true
'{"a":1}{"b":2}' true
'{"a":1} ' true trailing whitespace
'{"a":1}{"b":2' false inside a number token
'{"a":1}{"b":' false right after a colon (token boundary)
'{"a":1}{' false right after an object open
'{"a":1,' false right after a comma (token boundary)
'"abc' false inside a string token
'[1,2' false unclosed array

As raised in review, empty? is meaningful mid-drain too: after feeding '{"a":1}{"b":2}' and a single successful parse, empty? is false (a document remains in the buffer, and the parsed value hasn't been retrieved);
it only turns true once both documents are parsed and retrieved. Likewise a fully parsed but unretrieved value keeps empty? false.

Comment thread ext/json/ext/parser/parser.c Outdated
/*
* call-seq: incomplete? -> true or false
*
* Returns whether the stream ends in the middle of a document.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your implementation does a bit more than that. Because it also returns true if there is more left to parse in the buffer:

parser << '{"a":1}{"b":2}'
parser.parse # => true
parser.incomplete? # => true
parser.parse # => true
parser.incomplete? # => false

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
I think this is inherent to a predicate-style API rather than an implementation detail. At the point of your example, the buffer holds '{"b":2}' as raw undecoded bytes; whether that remainder is a complete document or a truncated one cannot be known without parsing it, which is exactly what parse does. So a side-effect-free predicate can only give a meaningful truncation answer once parse has returned false (i.e., after the usual while parser.parse drain loop). At that point, no complete document can remain buffered, and the predicate coincides exactly with "the stream was cut mid-document".

I see two ways forward.

  1. Keep the current behavior, define the semantics explicitly as "there are unconsumed bytes or a partially built document", document that truncation detection requires draining first, and pin your example in a test as expected behavior. I'm also open to renaming if incomplete? reads misleading mid-drain.
  2. Replace the predicate with an explicit end-of-stream signal, e.g. parser.finish, which raises ParserError if a document is in progress (same shape as yajl's complete_parse or Zlib::Inflate#finish). This removes the timing ambiguity structurally, at the cost of forcing exception handling on callers that only want to log a warning.

I have a mild preference for (1) since our use case (and I suspect most) calls this after the drain loop anyway, but happy to go either way.
Which shape would you prefer?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's mostly a naming concern. I'm trying to think of a better alternative.

We can probably also define partial_value? as an efficient way to know if the parser stopped midway. (because using #partial_value requires to build the object graph, which is costly).

That would be more flexible as you could check something like:

def incomplete?
  parser.partial_value? || !parser.eof?
end

But I think ultimately the semantic most people would want is something like:

def complete?
  parser.eof? && !parser.partial_value?
end

Which is a less ambiguous way to express:

  • There is nothing less to parse
  • There is no bytes unaccounted for.

@byroot

byroot commented Jul 3, 2026

Copy link
Copy Markdown
Member

What about empty?, I think it's a bit less ambiguous about its meaning. no?

@Watson1978

Copy link
Copy Markdown
Contributor Author

empty? works for me — read as a pure state predicate ("nothing pending inside the parser") it's arguably cleaner than complete?, which smuggles in a notion of "the stream ended".

One edge it forces us to decide explicitly: a parsed-but-unretrieved value (parse returned true, value not yet called). For complete? semantics I'd have said true there; but a parser holding an unretrieved value doesn't read as "empty", so if we go with empty? I'd define it strictly: no unconsumed bytes, no partial document, and no unretrieved value. For the EOF-after-drain call site this makes no difference, and the strict version keeps the name honest. Does that match your intent?

@byroot

byroot commented Jul 3, 2026

Copy link
Copy Markdown
Member

Does that match your intent?

I think so yes.

Also ideally your PR only defines partial_value? in C, and empty? can be define in Ruby, making it easier for users to read.

There was no single API answering "does the stream end in the middle of
a document?" once all parseable fed bytes have been consumed. Callers
had to combine two complementary APIs:

  !parser.rest.empty? || !parser.partial_value.nil?

`rest` only reflects unconsumed tokenizer bytes, so it is empty when the
stream is truncated exactly on a token boundary (right after a ':' or
','), while `partial_value` is nil when truncation happens mid-token
before any container is registered. Neither alone covers all shapes,
and `partial_value` materializes the partially built Ruby objects just
to test for nil.

`partial_value?` answers the same question as `!partial_value.nil?` by
looking at the parser's internal value stack directly, without building
the partial Ruby object graph.

`empty?` is strict: true only when the buffer is fully consumed, no
document is under construction and no parsed value awaits retrieval
with `value`. It is defined in Ruby as the composition of the three
underlying predicates so its definition doubles as documentation:

  def empty?
    eos? && !partial_value? && !value?
  end

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Watson1978 Watson1978 force-pushed the resumable-parser-incomplete-p branch from b20f590 to 59936d4 Compare July 3, 2026 08:27
@Watson1978 Watson1978 changed the title Add JSON::ResumableParser#incomplete? Add JSON::ResumableParser#partial_value? and #empty? Jul 3, 2026
@byroot byroot merged commit 0864e83 into ruby:master Jul 3, 2026
42 checks passed
@Watson1978

Copy link
Copy Markdown
Contributor Author

Thanks

@Watson1978 Watson1978 deleted the resumable-parser-incomplete-p branch July 3, 2026 09:25
@byroot

byroot commented Jul 3, 2026

Copy link
Copy Markdown
Member

Welcome, thanks for the feedback on the API and the patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Question] ResumableParser: recommended way to detect a truncated stream at EOF?

2 participants