-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Explosion Using Parser #57
Comments
@raeldor unfortunately without a code example and the document (or part of) you are trying to load there's little help that can be provided. Please provide more details, it could be that the memory explosion is happening somewhere else. |
@raeldor to answer your original question: no, there's no dictionary construction or anything like that going on at the level of |
Thank for the reply. Not sure code will help, since the line is literally just... I suspect you would need to have the same data to replicate. |
After reading the FAQ, I suspect this is happening... However if a text-mode file object is given then the library will automatically encode the strings into UTF-8 bytes |
Regardless of the memory taken, even just running through the parser like... parser = ijson.parse(cellset_string) Takes about 10 minutes for my roughly 320MB string of JSON. I feel like I'm doing something wrong here. |
@raeldor thanks for giving more details. Even though the code was simple, it actually helped me figure out what's going on. When a $> python
Python 3.9.5 (default, May 11 2021, 08:20:37)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tracemalloc
>>> import io
>>> tracemalloc.start()
>>> tracemalloc.get_traced_memory()
(14106, 36242)
>>> x = ' ' * 10**6
>>> len(x)
1000000
>>> tracemalloc.get_traced_memory()
(1014943, 1024853)
>>> i = io.StringIO(x)
>>> tracemalloc.get_traced_memory()
(5015725, 5025635) We could certainly simplify this on ijson to use a simpler file-like object wrapper that doesn't require copying the input string. I'll create an issue to remember doing that. As you found out in the FAQ, the best input you can give ijson is binary data, not textual data. Also, where is your in-memory string coming from? You must be loading it from a file, the network, or some other external source. In that case it's always better to just give a file object to ijson so it reads the data for you, instead of you reading the whole data and giving it to ijson. |
Thanks for the quick response. It's the response.text from an HTML call. Is there a way to wrap the string to prevent the conversion and improve the performance. For some reason the performance is also very slow. Or maybe it's because I'm in debug mode? |
@raeldor It seems you're using the requests library? I'm no expert in it, so take this with a grain of salt. If you are using the requests library you can access the response body as bytes via However, the best would be to find out how to use the
Are those 10 minutes spent only in ijson? Please check the performance section of the documentation. In particular make sure you have a fast backend available. 320 MB of JSON shouldn't take that long to parse (but who knows, maybe you have a particularly difficult JSON document to parse...) |
Thank you. I'll investigate the response options further. Writing the string out to a physical file as binary and opening removed the memory issue as suspected. |
Yes, 1 minute for reading and parsing sounds much better (still a bit high, but probably because of the extra I/O to disk). Note that you should be able to skip the writing to file though; just pass In any case, please close this issue if you're happy with the responses. I'll additionally deal with replacing our internal usage of |
Using response.content worked great without impacting memory. Really appreciate your fast assistance, thank you. |
Hi,
I am using this module because parsing using json.loads() results in my already large json string (about 900MB) memory usage going up by about 10x (over 9GB). I was expecting to be able to parse the JSON line by line. It works, but I was a little surprised that when I call ijson.parse() it grabs about 3GB of memory. May I ask why the memory usage is so large? More conversion to dictionaries behind the scenes?
Thanks
Ray
The text was updated successfully, but these errors were encountered: