-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Return bytes at prefix #43
Comments
@jpmckinney if I understand your use case correctly from the description above, you basically want to iterate over your target object in the JSON content twice: once just to extract its raw bytes from the stream, then to actually parse it. The feature you propose is far from trivial to implement actually. Almost all (if not all) parsing backends return already-parsed values, so in order to return the bytes at the given prefix we'd have to reconstruct the original bytes off the given values -- which is not even possible all the times. Even if we had access to the underlying bytes and offsets in all backends required to keep track of the original contents, notes that "return the bytes from the given prefix, without parsing them" is self-contradictory: there is no way to detect the end of a prefix's content without actually parsing the content coming after it. You can get away from creating the actual objects, but the parsing and lexing will still be necessary. Finally, even if all of this was implemented, you'd end up parsing the JSON content twice. In your case it might not be much of a concern given that you are dealing with small amounts, but it just doesn't sound necessary. All in all this sounds like a huge amount of work, with high chances of no being possible to implement in the first place, and for little benefit. So unfortunately I'll have to mark this as "wont fix". Back to your problem at hand, it's not entirely clear to me is why is it undesirable to use the collapsing of the two steps into one. On the one hand you mentioned you already have some code that accepts the extra prefix, but on the other you say that results on having to maintain "multiple pipelines". I'm not saying you don't have valid reasons, it's just that I don't understand them with the given information. A third alternative worth considering is using events interception: use |
Thanks for the explanation! I’m convinced it will be simpler to just collapse the two steps in the end :) |
I don't know if this is too narrow a use case for this library, or if there is another way to do this.
I work with large remote JSON objects. If I know a JSON key occurs near the start of the file, I might download only a few kilobytes, and then use
ijson.items
to read that key.ijson
will not raise anIncompleteJSONError
, because it never had to read to the end of the file. So far, so good.While most of these JSON objects are standardized, some are not. For example, a publisher might wrap a standard object inside another object, like
{"result": { standardized content } }
. My data pipeline currently handles this, by being able to set the prefix for a given remote file (in this caseresult
). A pipeline step returns the data at that prefix, usingijson.items
.When I combine these two tactics, I run into trouble.
ijson.items(data, 'result')
tries to read entireresult
key, but the JSON is incomplete, so it raises an error.One solution is to collapse the two steps into one, i.e. index to
result.some_key
in one step. In my case, this is undesirable, because then I'll have multiple pipelines using similar code, which is more work to maintain.Another solution might be to have a method that just returns the bytes from the given prefix, without parsing them. (Since the prefix might include one or more
item
, I guess the return value would be a string of bytes.) In that case, I could then useijson.items
as usual on one of the return values.The text was updated successfully, but these errors were encountered: