Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full support for byte stream generator #92

Closed
prrvchr opened this issue Apr 13, 2023 · 9 comments
Closed

Full support for byte stream generator #92

prrvchr opened this issue Apr 13, 2023 · 9 comments
Labels

Comments

@prrvchr
Copy link

prrvchr commented Apr 13, 2023

Hi all,

I am trying to use your API under LibreOffice / OpenOffice UNO API with python. I already use the feed(byte sequence) method of the xml.etree.ElementTree.XMLPullParser python API and I would like to do the same for JSON data.

But I would like to use your api only with a stream of bytes that come from the iter_content(buffer_size, decode_unicode) method of the Python Requests API.

Going through the documentation I could read that byte streams are converted to python file object...

Maybe there is an optimized method for this type of input?

Thanks

@prrvchr prrvchr changed the title Full support for byte stream Full support for byte stream generator Apr 13, 2023
@rtobar
Copy link

rtobar commented Apr 13, 2023

@prrvchr thanks for opening this.

Could you have a look at #44? It sounds like this is exactly the same thing you're asking here in terms of what you'd like to see supported. As mentioned there, there is no such support at the moment for working with generators as sources of data, only file-like objects. Which is unfortunate, because internally we do turn the file-like objects into the generators.... You'll see in that issue too why it's not immediately trivial to support this, but it should be possible.

The only way at the moment you have to work around this limitation is to write a file-like class that internally advances your generator on every call to read, then pass an object of that class to ijson. I'm sure I've seen/used something like that before, I'll post some update here if I find something relevant.

Please confirm that these two issues are one and the same, and I'll close this as a duplicate of #44.

@prrvchr
Copy link
Author

prrvchr commented Apr 13, 2023

Hi rtobar,

Thanks for your API...
It does appear to be the same enhancement request.

I've looked at the code and it looks like if one could by pass the file_source(f, buf_size=64*1024) function and use directly the generator, the deal will be done...

@rtobar
Copy link

rtobar commented Apr 13, 2023

@prrvchr thanks for confirming that, I'm closing this issue as a duplicate of #44 then, we can continue further discussions there.

I'll say this before closing though: like I mentioned before, it's not immediately trivial to do the change. Yes, what you found is the generator I was mentioning earlier to wrap file-like objects. That's not the issue. The problem is that when the main API functions inspect their input argument to decide what mode to work on (i.e., is this a file-like object, an async file-like object, etc?), generators are already treated in a particular way to support event interception. See for example

elif is_iterable(source):
and around. It's this breakage in behaviour, and also the peculiars of the yajl2_c backend that is implemented in C that needs to be taken care of separately (it doesn't use that file-object-to-generator function, for instance), that makes it more difficult than you'd anticipate to add the support you'd like to see.

@rtobar rtobar closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2023
@rtobar
Copy link

rtobar commented Apr 13, 2023

See #58 (comment) for an (untested) example of a simple file-like wrapper around a generator.

@prrvchr
Copy link
Author

prrvchr commented Apr 13, 2023

I got lost, it's not the support of a generator that I'm looking for but just the possibility of parsing the content of an HTTP JSON response into several chunks.

I own the generator, it's the Requests iter_content() function that I get through the UNO API as a com.sun.star.container.XEnumeration interface.

But in fact the API does not seem to be designed to be used this way, since I would first have to be able to initialize the API with a parser (initialize a buffer) then after making successive calls, for each chunk of JSON, to an API's function allowing to parse these chunks.

It's a shame especially since I have to remain compatible with python 2.7 for OpenOffice and that there are not many choices left for JSON streaming...

@rtobar
Copy link

rtobar commented Apr 13, 2023

Yes, I get that you own the generator, but the problem is that you can't pass a generator to ijson -- but you can wrap it in a file-like class like the one I linked above, and pass that to ijson.

Alternatively (I should have mentioned this before) you can iterate over your iterator yourself, and pass the individual chunks to ijson, see the "push interfaces" section of the docs. That's a more complex API, but it should work if you rather go in that direction.

@prrvchr
Copy link
Author

prrvchr commented Apr 13, 2023

Thanks for the "push interface", I use the ijson.sendable_list and it works perfectly (sample code)

It was necessary for me to use a modified version (which only uses relative imports) in order to be able to integrate and use the ijson package in my LibreOffice / OpenOffice .oxt file extension (in order to make the ijson API usable using only relative import...)

Maybe it deserves a request for improvement to allow me to do without my modified version...

Anyway thank you for your help and your API.

@prrvchr
Copy link
Author

prrvchr commented Apr 13, 2023

I have a strange behavior:

I get a parser with parser = ijson.parse_coro(events)
if I leave my iterator prematurely, by a break for example, when I have obtained the necessary data, then the program block when I try to close the parser with parser.close()

The error is:

yajl2.py, line 50, in basic_parse_basecoro
    raise exception(error)
ijson.common.IncompleteJSONError: parse error: premature EOF

In fact I'm trying to perform lazy loading and quit as soon as I have the data I'm looking for...

@rtobar
Copy link

rtobar commented Apr 15, 2023

@prrvchr the error on parser.close is expected, as the underlying parser is being flushed but it realises that a full JSON document hasn't been pushed through. I think you can safely ignore the error if you know you are breaking prematurely; otherwise the error is useful for when you think you are pushing a full JSON document through, but you aren't actually.

I see you've opened another issue about the relative imports, which is great, I'd rather we don't mix too many topics in one issue. Likewise, if you have other issues with the push interface let's also discuss them in a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants