Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default JSONEncoder won't serialize iterators, can't easily be extended. #1

Closed
etrepum opened this issue Feb 6, 2011 · 8 comments
Closed

Comments

@etrepum
Copy link
Member

etrepum commented Feb 6, 2011

http://code.google.com/p/simplejson/issues/detail?id=88

Reported by i...@simplegeo.com, Jan 17, 2011
What steps will reproduce the problem?

  1. simplejson.dumps(iter(range(10)))

What is the expected output? What do you see instead?
The iterator should generate a JSON serialized list.

This is a somewhat thorny issue. I have a large dataset which is lazily loaded in via a generator. I'd like to avoid materializing the entire dataset before serializing, ideally by having SimpleJSON consume from my input generator and write to a FD via dump().

The extension mechanism via `default=some_method' doesn't work, because I need to yield individual elements from the seq, not a materialized seq. The existing _iterencode_list() method is totally capable of serializing lazy seqs, but _iterencode() lacks the necessary conditional to invoke it for those types — and the number of possible types is extremely large. Every method in itertools, for example, returns a different type.

I was able to make this work with stdlib's json module, though it was not pretty. I had to subclass JSONEncoder, copy-paste _iterencode() in, and add the necessary conditional to send lazy seqs through _iterencode_list(). This approach is significantly more painful with SimpleJSON, since _iterencode() and friends are now hidden inside _make_iterencode(), where they can't be touched — I'd have to copy-paste the entire method (and its inner methods) to add one more conditional.

I understand that there's going to be some level of heuristic to detect these types, due to the lack of a shared base type for lazy seqs, but is there not some better way to handle this?

@etrepum
Copy link
Member Author

etrepum commented Sep 4, 2011

https://github.com/simplejson/simplejson/tree/iterable_as_array-gh1

Does this branch solve the problem sufficiently? Note that you'll still end up materializing all of the output before returning... unless you create a JSONEncoder and call iterencode(), which will use a slower Python implementation but give you "flow control".

@pferreir
Copy link

pferreir commented Jun 7, 2012

+1

@etrepum
Copy link
Member Author

etrepum commented Jun 8, 2012

Does your "+1" mean that you've tried the branch and it solves a problem you have?

@pferreir
Copy link

pferreir commented Jun 9, 2012

No, it was +1 for the issue, but I will give it a try.

@etrepum
Copy link
Member Author

etrepum commented Apr 6, 2013

Since nobody seems to care, I'm going to retire this issue and leave the branch in limbo.

@etrepum etrepum closed this as completed Apr 6, 2013
@nickbabcock
Copy link
Contributor

I realize that this issue is very old and has been closed for a long time, but I tried out the branch and it worked swimmingly. Combining iterencode with iterable_as_array=True, allowed for me to stream json output of several gigabytes while using under 20KB of memory without resorting to the Stackoverflow hack that I'm uncomfortable with (an iterator masquerading as a list!)

@etrepum
Copy link
Member Author

etrepum commented Jul 10, 2015

@nickbabcock Would you be interested in porting this branch back to the current master for a pull request?

@etrepum etrepum reopened this Jul 10, 2015
@nickbabcock
Copy link
Contributor

Yes, no problem. I'll create a pull request for review when I'm done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants