Skip to content

Very large test suites take *far* too much RAM #619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pytestbot opened this issue Oct 23, 2014 · 17 comments
Open

Very large test suites take *far* too much RAM #619

pytestbot opened this issue Oct 23, 2014 · 17 comments
Labels
status: help wanted developers would like help from experts on this topic type: performance performance or memory problem/improvement

Comments

@pytestbot
Copy link
Contributor

Originally reported by: Alex Gaynor (BitBucket: alex_gaynor, GitHub: alex_gaynor)


The cryptography test suite on OS X -- which is 149150 test functions (almost entirely generated with @pytest.mark.parametrize) -- results in a CPython 2.7 process which, after collection, takes 1.26GB or so.

This is about 8KB per test. This seems like way too much, it would be fantastic to bring this significantly down.


@pytestbot
Copy link
Contributor Author

Original comment by Alex Gaynor (BitBucket: alex_gaynor, GitHub: alex_gaynor):


Link to the repo: https://github.com/pyca/cryptography can easily be run with pip install -r dev-requirements.txt; py.test

@pytestbot
Copy link
Contributor Author

Original comment by holger krekel (BitBucket: hpk42, GitHub: hpk42):


Thanks for the report and repeat steps. did you try analyze yourself? What would you use to analyze RAM usage these days?

@pytestbot
Copy link
Contributor Author

Original comment by Alex Gaynor (BitBucket: alex_gaynor, GitHub: alex_gaynor):


Eep, email just showed up this morning!

I haven't tried yet, it works on py3k so I'd probably start with https://docs.python.org/3/library/tracemalloc.html

@pytestbot pytestbot added the type: bug problem that needs to be addressed label Jun 15, 2015
@DuncanBetts
Copy link
Contributor

Is this because Python is creating a first class object for every test function?
I ask because it brought this to mind for me.

@nicoddemus
Copy link
Member

Yes, it does at the very least create a Function object for each test function. I'm sure there are other utility objects that might be created depending on the features used by the test (for example parametrization).

In general pytest has been optimized for efficiency rather than memory usage (there's a lot of caches for example).

@Zac-HD Zac-HD added status: help wanted developers would like help from experts on this topic type: performance performance or memory problem/improvement and removed type: bug problem that needs to be addressed labels Oct 21, 2018
@ocehugo
Copy link

ocehugo commented Mar 4, 2019

sorry for resurrecting this issue but I'm trying to find an alternative way to reduce the memory usage of parametrizations:

import numpy as np
import pytest

one = range(5)
two = range(30)
three = range(9000)
four = [np.zeros([50, 2000, 2000]), np.ones([50, 2000, 2000]),
        np.zeros([50, 2000, 2000])+2, np.zeros([50, 2000, 2000])+3]


@pytest.mark.parametrize("A", four)
@pytest.mark.parametrize("B", three)
@pytest.mark.parametrize("C", two)
@pytest.mark.parametrize("D", one)
def test_highmem(A, B, C, D):
    assert all(A)
    assert B == C == D

This quad-loop code is coping the numpy arrays in every test, skyrocketing memory usage.
Removing the A parametrization still uses a lot of memory (~5.4gb in my machine). It's also very slow to collect the 1350000 functions.

Is there a special function call to avoid copies? Any other idea different than coding for loops inside a single test and printing the success of each in the test?

My use case scenario is a similar one to the above, but A is a list of objects where coverage should be complete [ 100%]. These objects are "read-only" in the sense they do not mutate during the test.

PS: All of the above is also supposedly to run under pytest-xdist to minimize pytest runtime.

@earonesty
Copy link
Contributor

earonesty commented Jan 9, 2020

A large portion of this problem is that pytest retains references to the stack for the purposes of error reporting. I couldn't find a way to disable. --tb=no and --assert=plain still causes a failure to garbage collect stuff.

@nicoddemus
Copy link
Member

@earonesty you are correct that pytest will retain a reference to the stack (actually to the exception object) for error reporting. For passing tests it should not retain anything thought.

pytest.mark.parametrize will always evaluate the parameter set during collection, so this will be a problem for huge parametrizations.

I would like to point out to pytest-subtests (which is still under development), which might be a temporary solution to this problem.

@earonesty
Copy link
Contributor

earonesty commented Jan 9, 2020

For this reason, I have a whole set of memory-clearing and __del__ tests that I still run under nosetests because they won't work under pytest.

@nicoddemus
Copy link
Member

For this reason, I have a whole set of memory-clearing and del tests that I still run under nosetests because they won't work under pytest.

Some questsions:

  1. Do your tests use parametrize?
  2. Fixtures?
  3. All tests pass and they still take too much RAM?

@earonesty
Copy link
Contributor

earonesty commented Apr 24, 2020

  1. Not yet, i'm trying to convert a test suite that passes under nose to pytest
  2. Yes, we have whole suites of complex fixtures... some of which spin up multiple websocket servers, etc.
  3. Never gotten that far...usually what happens is 100 pass, one fails, then 100 pass, then another fails... then more and more start failing until all i see are "EEEEEEEEEEE". It seems to be ok with 1 or 2 failures, but a dozen or more and i'm toast.
  4. I can try just organizing tests into folders and running isolated subsets on separate command lines.

@ordamariPalo
Copy link

Just tackled this bug. Making a session with 3k tests will result with ram usage being expended without garbage collection.

  1. We are using parametrization.
  2. We are using fixtures.
  3. It doesn't matter if the tests pass/failed.

Is there any solution/ progress with this bug?

@nicoddemus
Copy link
Member

Not at the moment, we would need a access to a repository which reproduces the problem.

TBH I don't think there's a bug in pytest per-se, but there must be some fixture/parametrization which is wrongly causing pytest to consume too much RAM.

At work we have complex fixtures at all scopes, parametrization, many test suites where 3k+ tests are usual, and we don't have any of the problems reported here, that's why I don't think it is necessarily a bug in pytest itself.

@earonesty
Copy link
Contributor

earonesty commented Feb 4, 2022

fyi: this repo does not reproduce the problem. it has a very simple "6 tests" https://github.com/AtakamaLLC/pysecbytes.

running with pytest works, which means memory is freed and this problem is definitely not intrinsic to pytest, but is more likely, as you said and interaction with a specific use of pytest.

if i can modify this fairly minimal test suite to make it fail in pytest (memory leaked), but succeed in nose (memory freed), then it's a way to isolate the real issue

@sawatts
Copy link

sawatts commented Aug 14, 2023

We have an issue with a single test case, undecorated, blowing up x100 memory usage when run under pytest.

This may not be your usual test case - it is a load test which generates 100 concurrent calls each to two systems, runs for an extended period, and reviews the call metrics. So 200 threads for, say, an hour each.

Baseline memory footprint is just under 1GB. When run with pytest this grows to over 100GB - which is why it kept failing in our automated test framework. (1GB baseline is excessive to start with!)

pytest 3.3.2 on python 3.8 (python version dictated by the framework), on Linux (RHEL 7).

@RonnyPfannschmidt
Copy link
Member

Pytest 3 is so phenomenally outdated that we can't do support for it, please get help from your vendor

@gurka-adrian
Copy link

Hello, I hope this is relevant. We have similar issue with our project. We have a rather large test suite and when more tests fail (which is common), we run out of memory very quickly. We believe this is because pytest stores variables for failed test cases, which is generally desired behavior, this however causes issues for us, as even 120 GB RAM is not really enough to run the whole test suite.

Is there any way we could run the test suite without having states of failed test cases stored? Really, we'd like to be able to run the test suite with failed test cases treated the same as passed test cases - without storing the failing state.

We're running on python 3.11 and pytest 7.4.0. Most of the tests are parametrized, if it's relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: help wanted developers would like help from experts on this topic type: performance performance or memory problem/improvement
Projects
None yet
Development

No branches or pull requests

10 participants