-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ijson.items(file, prefix) waits for EOF #104
Comments
Thanks @green-green-avk for the interesting report. Do you have an example file that can be used for testing this? In particular, I'd like to understand/confirm that you really don't get any values yielded until you hit the EOF. How are 100% certain of that? The example code looks correct of course, so I suppose your report is based on observations of some kind. If |
Also: does this happen with other backends? Just noticed this was reported for the python backend, so maybe (hopefully!) ithe issue, if there's one, is specific to this backend. |
Uh. Huh. It's about streams, not files at all. Please, consider this one-liner repro case: python3 -c "$(echo -en 'import time\nwhile(True): print("{\"test\":1}", flush=True);time.sleep(1)')" | python3 -c "$(echo -en 'import sys\nimport ijson\nfor v in ijson.items(sys.stdin.buffer, "", multiple_values=True): print(v)')" |
Ouch, it seems, I missed It works: python3 -c "$(echo -en 'import time\nwhile(True): print("{\"test\":1}", flush=True);time.sleep(1)')" | python3 -c "$(echo -en 'import sys\nimport ijson\nfor v in ijson.items(sys.stdin.buffer, "", multiple_values=True, buf_size=1): print(v)')" However, I think, the exact semantics of the |
@green-green-avk thanks for that reproducer and the clarification on it being a continuous, potentially infinite stream of data (you did mention it in the original description, and I was a bit puzzled about what you exactly meant, but didn't ask further). In any case, it being a "stream" or a file on disk is irrelevant: ijson is presented with a file object regardless, and the Am I understanding correctly that there is actually no issue, and that the actual problem was that your stream was generating data in chunks smaller than the default To clarify: my understanding is that ijson was blocking when reading data from stdin via On the matter of the documentation of |
Aha! In this case, we just need to add an option to use read1() instead. This way we could avoid the performance impact of By data streams, I mean cases such as a network socket with small portions of data passing each several seconds. ijson must be able yield parsed objects as soon as they are available, without waiting for the buffer to be full. |
Or try with I'd try to exhaust your options before jumping in and trying to implement a new feature that seems to cover only a very minor corner-case, but would take some effort to get right (and thus I'd be a bit unwilling to implement myself, although I'd be happy to review PRs). If you really want to go down that route, please let's track that on a separate issue to keep things separate. Also, could I ask again if you could address the questions I posed in my previous comment? I want to confirm there's no actual issue in ijson ATM, and that this is mostly about managing expectations when using stdin. |
Haven't heard back in a week for further feedback, and the issue wasn't really a problem with ijson itself, so I'm closing this one. |
Describe the bug
ijson.items(file, prefix)
waits for EOF before yielding any values.How to reproduce
The input (file object
f
) represents an infinite stream of JSON objects:{"a":1} {"b":2} ...
Expected behavior
Yield a value as soon as an object is parsed. Don't wait for EOF.
Execution information:
Additional context
It looks similar to #72 but I see no reason to use any
async
here as it is not an asynchronous situation at all: just a simplest stream pull parser use case.The text was updated successfully, but these errors were encountered: