Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to parse a large gzip json file. #110

Closed
davies-w opened this issue May 21, 2024 · 2 comments
Closed

How to parse a large gzip json file. #110

davies-w opened this issue May 21, 2024 · 2 comments
Labels
question Further information is requested

Comments

@davies-w
Copy link

Description
How can I used ijson to parse a large gzipped json file?
Background: I have a 3 GB compressed json file, that will expand to 32 GB. I'd rather process it record by record.

Detailed description
It's actually handled completely transparently. Just open the gzip file with gzip

import gzip

with gzip.open("very_large_file.json.gz", 'rb') as f:
  parser = ijson.items(f,  'item')
  for item in parser:
    print(item)

Why is this not clear from the documentation
It's not mentioned, so I created this pseudo issue, so it's obvious to other people in the future.

@davies-w davies-w added the question Further information is requested label May 21, 2024
@jpmckinney
Copy link

Hmm, isn't that how the gzip module works for any file?

@rtobar
Copy link

rtobar commented May 22, 2024

@davies-w thanks for submitting the question/answer combo. I'll close this issue and won't take any further action, although it will remain for perpetuity in the system, and will eventually be indexed by search engines, hence potentially being found by whoever runs into this same issue.

As pointed out by @jpmckinney though, this is the expected behaviour of the gzip module (i.e., its open function returns an object with a file-like interface). I do agree that the ijson documentation could do better in specifying its requirements, for example providing signatures of methods that are expected in the input file-like objects, or having a more elaborate "Examples" section, but that's further work that I'm not planning to do at the moment (I'm open to discuss PRs though).

@rtobar rtobar closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants