Add JSON::ResumableParser#partial_value? and #empty?#1048
Conversation
| /* | ||
| * call-seq: incomplete? -> true or false | ||
| * | ||
| * Returns whether the stream ends in the middle of a document. |
There was a problem hiding this comment.
I think your implementation does a bit more than that. Because it also returns true if there is more left to parse in the buffer:
parser << '{"a":1}{"b":2}'
parser.parse # => true
parser.incomplete? # => true
parser.parse # => true
parser.incomplete? # => falseThere was a problem hiding this comment.
Good catch!
I think this is inherent to a predicate-style API rather than an implementation detail. At the point of your example, the buffer holds '{"b":2}' as raw undecoded bytes; whether that remainder is a complete document or a truncated one cannot be known without parsing it, which is exactly what parse does. So a side-effect-free predicate can only give a meaningful truncation answer once parse has returned false (i.e., after the usual while parser.parse drain loop). At that point, no complete document can remain buffered, and the predicate coincides exactly with "the stream was cut mid-document".
I see two ways forward.
- Keep the current behavior, define the semantics explicitly as "there are unconsumed bytes or a partially built document", document that truncation detection requires draining first, and pin your example in a test as expected behavior. I'm also open to renaming if
incomplete?reads misleading mid-drain. - Replace the predicate with an explicit end-of-stream signal, e.g.
parser.finish, which raisesParserErrorif a document is in progress (same shape as yajl'scomplete_parseorZlib::Inflate#finish). This removes the timing ambiguity structurally, at the cost of forcing exception handling on callers that only want to log a warning.
I have a mild preference for (1) since our use case (and I suspect most) calls this after the drain loop anyway, but happy to go either way.
Which shape would you prefer?
There was a problem hiding this comment.
I think it's mostly a naming concern. I'm trying to think of a better alternative.
We can probably also define partial_value? as an efficient way to know if the parser stopped midway. (because using #partial_value requires to build the object graph, which is costly).
That would be more flexible as you could check something like:
def incomplete?
parser.partial_value? || !parser.eof?
endBut I think ultimately the semantic most people would want is something like:
def complete?
parser.eof? && !parser.partial_value?
endWhich is a less ambiguous way to express:
- There is nothing less to parse
- There is no bytes unaccounted for.
|
What about |
|
One edge it forces us to decide explicitly: a parsed-but-unretrieved value ( |
I think so yes. Also ideally your PR only defines |
There was no single API answering "does the stream end in the middle of
a document?" once all parseable fed bytes have been consumed. Callers
had to combine two complementary APIs:
!parser.rest.empty? || !parser.partial_value.nil?
`rest` only reflects unconsumed tokenizer bytes, so it is empty when the
stream is truncated exactly on a token boundary (right after a ':' or
','), while `partial_value` is nil when truncation happens mid-token
before any container is registered. Neither alone covers all shapes,
and `partial_value` materializes the partially built Ruby objects just
to test for nil.
`partial_value?` answers the same question as `!partial_value.nil?` by
looking at the parser's internal value stack directly, without building
the partial Ruby object graph.
`empty?` is strict: true only when the buffer is fully consumed, no
document is under construction and no parsed value awaits retrieval
with `value`. It is defined in Ruby as the composition of the three
underlying predicates so its definition doubles as documentation:
def empty?
eos? && !partial_value? && !value?
end
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
b20f590 to
59936d4
Compare
|
Thanks |
|
Welcome, thanks for the feedback on the API and the patch. |
Fixes #1047
partial_value?Returns whether a document is currently under construction (unclosed container, key awaiting its value, etc.). It answers the same question as
!partial_value.nil?, but as a cheap predicate on the parser's internal value stack, without materializing the partially parsed Ruby objects.A fully parsed document whose value hasn't been retrieved yet is not under construction, that state is covered by
value?.Note that a container whose first key or element hasn't been parsed yet (e.g. a stream ending right after
{) has nothing registered on the parser's stacks sopartial_value?isfalsethere, consistently withpartial_valuereturningnil.The truncation is still observable through the buffer:
eos?isfalseandrestisn't empty. The test suite assertspartial_value? == !partial_value.nil?across all cases.empty?Strict semantics:
trueonly when the buffer is fully consumed, no document is under construction, and no parsed value awaits retrieval.This is the single call answering the truncation question from #1047:
parse/value)empty?''true'{"a":1}'true'{"a":1}{"b":2}'true'{"a":1} 'true'{"a":1}{"b":2'false'{"a":1}{"b":'false'{"a":1}{'false'{"a":1,'false'"abc'false'[1,2'falseAs raised in review,
empty?is meaningful mid-drain too: after feeding'{"a":1}{"b":2}'and a single successfulparse,empty?isfalse(a document remains in the buffer, and the parsed value hasn't been retrieved);it only turns
trueonce both documents are parsed and retrieved. Likewise a fully parsed but unretrieved value keepsempty?false.