Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Segmentation fault when trying to load large json file #11344
Comments
|
you need to use |
jreback
closed this
Oct 16, 2015
jreback
added IO JSON Compat
labels
Oct 16, 2015
eddie-dunn
commented
Oct 16, 2015
|
I get the same segfault when using read_json. |
|
well not sure how you generated it, its possible its not valid json / parseable by read_json |
eddie-dunn
commented
Oct 16, 2015
|
I've generated the json file with both Python's built-in
|
|
Can you try to narrow it down to a smaller example that generates the segfault, and post that file? |
eddie-dunn
commented
Oct 19, 2015
As I said originally, I think the issue is with the size of the json file. |
|
Can you
and see if either of those fail? |
eddie-dunn
commented
Oct 19, 2015
|
I realized that the subset only took 200MB of RAM when loaded in the REPL. Once dumped to a json file on disk it was much smaller. 4 joined subsets did not present a problem for I incremented the number of subsets; 7 subsets at 561 MB, 11 at 881 MB worked fine. 14 joined subsets at 1.1 GB crashed ipython:
Might be worth your while to generate your own large json file to debug why |
|
@eddie-dunn it would be really helpful for you to show a copy-pastable example that reproduces the problem. These cases are almost always a function of the dtypes of the structure you are saving. This would make debugging much easier, as this is the first report I have ever seen for this type of bug. Further pls show |
eddie-dunn
commented
Oct 20, 2015
|
Generate example code:
Test run example code:
Please note that if you try to run the script with PANDAS=False you will need approximately 8 GB RAM or it will exit with an out of memory exception.
|
|
pls try this with 0.17.0 and report back. |
|
I can reproduce it 0.17. |
eddie-dunn
commented
Oct 20, 2015
|
Yes, pandas still segfaults on 0.17. |
jreback
added this to the
Next Major Release
milestone
Oct 20, 2015
jreback
added Difficulty Intermediate Effort Medium
labels
Oct 20, 2015
jreback
reopened this
Oct 20, 2015
|
ok will reopen if anyone cares to dig into the c-code |
|
Will hopefully submit a PR tonight |
|
xref #7641 as well |
kawochen
referenced
this issue
Oct 21, 2015
Merged
BUG: GH11344 in pandas.json when file to read is big #11393
jreback
modified the milestone: 0.17.1, Next Major Release
Oct 21, 2015
|
does #11393 fix for you? (you have to build from the PR to test) |
adri0
commented
Oct 23, 2015
|
I'm also having a segmentation fault when using 'read_csv' method to load a file with around 160Mb in pandas 0.17.0. |
|
@adri0 |
adri0
commented
Oct 23, 2015
|
Okay, thanks. I just opened #11419 |
eddie-dunn commentedOct 16, 2015
I have a 1.1GB json file that I try to load with
pandas.json.load:It breaks with the following output
I can load the file with Pythons built-in json module. What is going wrong here?