Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(python): Memory usage test infrastructure, plus a test for #15098 #15285

Merged

Conversation

itamarst
Copy link
Contributor

Fixes #15231

This relies on the tracemalloc infrastucture built-in to Python. It is not supported by all libraries (notably PyArrow) but it is cross-platform, unlike other alternatives.

Followups:

  • I tried to also track PyArrow memory usage, but failed. Possibly because it's hard, possibly because there's a memory leak somewhere... I will try to poke at this some more.
  • Once this is merged, there are probably a bunch of other code paths that could benefit from this testing infrastructure.

@github-actions github-actions bot added internal An internal refactor or improvement python Related to Python Polars labels Mar 25, 2024
@itamarst itamarst marked this pull request as ready for review March 25, 2024 15:56
Copy link

codecov bot commented Mar 25, 2024

Codecov Report

Attention: Patch coverage is 86.36364% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 81.36%. Comparing base (03c5f73) to head (2cb081d).

Files Patch % Lines
py-polars/src/memory.rs 86.36% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #15285   +/-   ##
=======================================
  Coverage   81.36%   81.36%           
=======================================
  Files        1364     1365    +1     
  Lines      176612   176634   +22     
  Branches     2525     2525           
=======================================
+ Hits       143694   143716   +22     
  Misses      32434    32434           
  Partials      484      484           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member

This is cool stuff!

@itamarst
Copy link
Contributor Author

The lack of abi3 support for the tracemalloc API is apparently a problem on Windows 😢

@itamarst itamarst marked this pull request as draft March 25, 2024 17:13
@itamarst
Copy link
Contributor Author

It looks like (from looking at pyo3-ffi) that I need to specify the DLL link name on Windows, so going to try that. If I can't make that work, I will just skip these tests on Windows.

@itamarst itamarst changed the base branch from main to mixed-constructor-types March 25, 2024 17:14
@itamarst itamarst changed the base branch from mixed-constructor-types to main March 25, 2024 17:15
@itamarst itamarst marked this pull request as ready for review March 25, 2024 17:19
@itamarst itamarst marked this pull request as draft March 25, 2024 17:20
@itamarst
Copy link
Contributor Author

Asked pyo3 people with some help on the Windows linking errors, will hopefully have reply by tomorrow.

@itamarst itamarst marked this pull request as ready for review March 26, 2024 14:50
@itamarst
Copy link
Contributor Author

@stinodego OK it's ready I think.

@itamarst itamarst requested a review from ritchie46 March 26, 2024 20:50
Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty amazed that you've managed to set this up with so little code. Nice!

I left some minor comments on the organization of things on the Python side. And a few just for my own understanding.

py-polars/tests/unit/io/test_parquet.py Show resolved Hide resolved
py-polars/tests/unit/conftest.py Show resolved Hide resolved
py-polars/tests/unit/conftest.py Show resolved Hide resolved
Comment on lines +77 to +78
#[cfg(all(target_family = "unix", debug_assertions))]
static ALLOC: TracemallocAllocator<Jemalloc> = TracemallocAllocator::new(Jemalloc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So debug builds will now always use the tracemalloc, correct? Can we expect any impact on performance, or is it 'harmless'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be surprised if it has noticeable impact. In particular, it will involve an extra function call per allocation/deallocation, but unless tracemalloc is started, that function call won't do anything and will just return, so it'd be a pretty fast function. And I would expect in general Polars does not do a particularly large number of allocations, since most allocations would be large chunks.

py-polars/tests/unit/io/test_parquet.py Show resolved Hide resolved
itamarst and others added 2 commits March 27, 2024 09:08
Co-authored-by: Stijn de Gooijer <stijndegooijer@gmail.com>
@itamarst
Copy link
Contributor Author

@stinodego back to you.

@itamarst itamarst requested a review from stinodego March 27, 2024 13:15
Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I'm fine merging this as-is. Letting @ritchie46 make the final call, in case I missed something here.

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you @itamarst

@ritchie46 ritchie46 merged commit 9c46183 into pola-rs:main Mar 28, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Python test infrastructure for testing memory usage limits
4 participants