-
-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault when trying to load large json file #11344
Comments
you need to use |
I get the same segfault when using read_json. |
well not sure how you generated it, its possible its not valid json / parseable by read_json |
I've generated the json file with both Python's built-in
|
Can you try to narrow it down to a smaller example that generates the segfault, and post that file? |
As I said originally, I think the issue is with the size of the json file. |
Can you
and see if either of those fail? |
I realized that the subset only took 200MB of RAM when loaded in the REPL. Once dumped to a json file on disk it was much smaller. 4 joined subsets did not present a problem for I incremented the number of subsets; 7 subsets at 561 MB, 11 at 881 MB worked fine. 14 joined subsets at 1.1 GB crashed ipython:
Might be worth your while to generate your own large json file to debug why |
@eddie-dunn it would be really helpful for you to show a copy-pastable example that reproduces the problem. These cases are almost always a function of the dtypes of the structure you are saving. This would make debugging much easier, as this is the first report I have ever seen for this type of bug. Further pls show |
Generate example code:
Test run example code:
Please note that if you try to run the script with PANDAS=False you will need approximately 8 GB RAM or it will exit with an out of memory exception.
|
pls try this with 0.17.0 and report back. |
I can reproduce it 0.17. |
Yes, pandas still segfaults on 0.17. |
ok will reopen if anyone cares to dig into the c-code |
Will hopefully submit a PR tonight |
xref #7641 as well |
does #11393 fix for you? (you have to build from the PR to test) |
I'm also having a segmentation fault when using 'read_csv' method to load a file with around 160Mb in pandas 0.17.0. |
@adri0 |
Okay, thanks. I just opened #11419 |
I'm seeing this traceback from the same cause on Pandas 0.23.4 + Python 3.7.0. |
I have a 1.1GB json file that I try to load with
pandas.json.load
:It breaks with the following output
I can load the file with Pythons built-in json module. What is going wrong here?
The text was updated successfully, but these errors were encountered: