Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug tojson memleak #45489

Merged
merged 5 commits into from
Jan 22, 2022
Merged

Bug tojson memleak #45489

merged 5 commits into from
Jan 22, 2022

Conversation

vernetya
Copy link
Contributor

Fix memory leak when calling to_json appeared in 1.1.0. From these changes, in get_values function, it looks like values from here are not cleaned in this path.

I run the following loop and plot the memory usage using memory_profiler:

def foo():
    data = {str(c): list(range(100_000)) for c in range(10)}
    for i in range(500):
        print(i)
        df = pd.DataFrame(data)
        df.to_json(orient='split', indent=0)

Here on linux (python 3.8.10)
pandas/main:
mem_x_main

with changes in this PR:
mem_x_pr

On Windows 10 (python 3.8.5):
pandas/main:
mem_win_main

with changes in this PR:
mem_win_fix

I run under valgrind (10 loops), less revealing but there're diff:
pandas/main:

...
==3116== LEAK SUMMARY:
==3116==    definitely lost: 94,024 bytes in 583 blocks
==3116==    indirectly lost: 7,248,888 bytes in 537 blocks
==3116==      possibly lost: 5,900,491 bytes in 40,529 blocks
==3116==    still reachable: 10,492,550 bytes in 67,907 blocks
==3116==                       of which reachable via heuristic:
==3116==                         stdstring          : 2,484 bytes in 62 blocks
==3116==         suppressed: 0 bytes in 0 blocks
==3116== Reachable blocks (those to which a pointer was found) are not shown.
==3116== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==3116== 
==3116== For lists of detected and suppressed errors, rerun with: -s
==3116== ERROR SUMMARY: 6706 errors from 6703 contexts (suppressed: 4 from 4)

with changes in this PR:

...
==3026== LEAK SUMMARY:
==3026==    definitely lost: 92,296 bytes in 565 blocks
==3026==    indirectly lost: 47,232 bytes in 483 blocks
==3026==      possibly lost: 5,099,615 bytes in 40,511 blocks
==3026==    still reachable: 10,492,630 bytes in 67,908 blocks
==3026==                       of which reachable via heuristic:
==3026==                         stdstring          : 2,484 bytes in 62 blocks
==3026==         suppressed: 0 bytes in 0 blocks
==3026== Reachable blocks (those to which a pointer was found) are not shown.
==3026== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==3026== 
==3026== For lists of detected and suppressed errors, rerun with: -s
==3026== ERROR SUMMARY: 6698 errors from 6695 contexts (suppressed: 8 from 4)

@WillAyd
Copy link
Member

WillAyd commented Jan 20, 2022

@vernetya great find! I think this will end up making its way into 1.4.1 whenever the whatsnew for that gets created. Code-wise lgtm @jreback @jbrockmendel @simonjayhawkins

@jreback jreback added this to the 1.4 milestone Jan 21, 2022
@jreback jreback added the IO JSON read_json, to_json, json_normalize label Jan 21, 2022
@@ -227,7 +227,7 @@ MultiIndex
I/O
^^^
- Bug in :meth:`DataFrame.to_stata` where no error is raised if the :class:`DataFrame` contains ``-np.inf`` (:issue:`45350`)
-
- Bug in :meth:`to_json` fix memory leak (:issue:`43877`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vernetya can you move to 1.4 (if you can do tonight will merge) otherwise can put in 1.4.1

@jreback
Copy link
Contributor

jreback commented Jan 21, 2022

@simonjayhawkins not a blocker for release (we can put in 1.4.1 as well)

@jreback jreback merged commit 8456833 into pandas-dev:main Jan 22, 2022
@jreback
Copy link
Contributor

jreback commented Jan 22, 2022

thanks @vernetya

@jreback
Copy link
Contributor

jreback commented Jan 22, 2022

@meeseeksdev backport 1.4.x

@lumberbot-app
Copy link

lumberbot-app bot commented Jan 22, 2022

Something went wrong ... Please have a look at my logs.

simonjayhawkins pushed a commit that referenced this pull request Jan 22, 2022
Co-authored-by: vernetya <52132110+vernetya@users.noreply.github.com>
@vernetya
Copy link
Contributor Author

Thanks for the backport

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: to_json memory leak (introduced in 1.1.0)
4 participants