Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infection fails with "Allowed memory size ... exhausted" for a project with a really big coverage #705

Closed
maks-rafalko opened this issue Jun 17, 2019 · 33 comments · Fixed by #1106
Assignees

Comments

@maks-rafalko
Copy link
Member

Question Answer
Infection version 0.13.2
Test Framework version PHPUnit
PHP version 7.2
Github Repo https://github.com/vimeo/psalm

Extracted from https://twitter.com/sorsoup/status/1140021198765658118

Infection seems to be crashing when I try to run it to check the test suite of Psalm - output at (link: https://circleci.com/gh/bdsl/psalm/232) circleci.com/gh/bdsl/psalm/…

Don't know if I'm doing anything wrong.

Command line:

php -d memory_limit=12G ~/sites/infection/bin/infection --coverage=build/phpunit --only-covered --threads=4

Output:

     ____      ____          __  _
    /  _/___  / __/__  _____/ /_(_)___  ____
    / // __ \/ /_/ _ \/ ___/ __/ / __ \/ __ \
  _/ // / / / __/  __/ /__/ /_/ / /_/ / / / /
 /___/_/ /_/_/  \___/\___/\__/_/\____/_/ /_/

Running initial test suite...

PHPUnit version: 7.5.12

    1 [============================] < 1 sec

Generate mutants...

Processing source code files:   0/543

PHP Fatal error:  Allowed memory size of 268435456 bytes exhausted (tried to allocate 20480 bytes) in /infection/src/TestFramework/PhpUnit/Coverage/CoverageXmlParser.php on line 170

So it fails with when the coverage data is being accumulated in the following lines of the code (depending on how much memory_limit is set

Unfortunately, code coverage takes 2+ hours for Psalm, so here is a compressed archive of needed coverage files: https://www.dropbox.com/s/66tvd0tctfbdki3/phpunit.bz?dl=0 for commit vimeo/psalm@4c57c67

@bdsl
Copy link
Contributor

bdsl commented Jun 17, 2019

Unfortunately, code coverage takes 2+ hours for Psalm,

I guess this is using xdebug. In the Psalm CircleCI set up coverage data is collected using PCOV, and takes around ten minutes.

@bdsl
Copy link
Contributor

bdsl commented Jun 17, 2019

Do you know why I don't see the message about allowed memory size? I did see that before I added -d memory_limit=4G but it isn't there in my recent builds, e.g. https://circleci.com/gh/bdsl/psalm/353

@sanmai
Copy link
Member

sanmai commented Jun 18, 2019

Allowed memory size of 268435456 bytes exhausted

That's 256 Mb for short. It seems like memory_limit didn't stick through.

@sanmai
Copy link
Member

sanmai commented Jun 18, 2019

@bdsl if you ssh into that image, will be there more details? It may be hitting some outside limits, like if that VM has no swap, and not 4 Gb of RAM.

@maks-rafalko
Copy link
Member Author

That's 256 Mb for short. It seems like memory_limit didn't stick through.

I've copied it from local machine. When I run it with php -dmemory_limit=12G it fails with Allowed memory size of 12884901888 bytes exhausted. So there is no issue with passing memory_limit to the infection process. At least locally.

The issue is indeed in the very big coverage data files (~2Gb in total).

@maks-rafalko
Copy link
Member Author

I've created a PR with some minor changes that allowed me to run Infection for Psalm with memory_limit=12G without that fatal error.

This is not a final fix, just a few minor (but still useful) improvements.

@bdsl I'm pretty sure, you need to run Infection for Psalm only for changed files. This will dramatically reduce the mutation time. I'm not sure Infection is ready for such many mutations:

Processing source code files: 543/543
Creating mutated files and processes:    0/20516

You can see how we do it for Infection itself.

The further investigation of what is going on with coverage data is still needed, the issue is not resolved.

@bdsl
Copy link
Contributor

bdsl commented Jun 18, 2019

@sanmai I've re-run the job and sshd into the docker container at CircleCI. Not sure what details you need though. Maybe these will help:

docker@8aa2179df9f7:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            68G         12G         14G        343M         41G         54G
Swap:            0B          0B          0B

docker@8aa2179df9f7:~$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
10  0      0 8389180 12677688 36648404    0    0   163  2581   25   17 18  6 76  0  0

docker@8aa2179df9f7:~$ swapon -s    
docker@8aa2179df9f7:~$ 

@bdsl
Copy link
Contributor

bdsl commented Jun 18, 2019

Thank you @maks-rafalko . I've subscribed to that PR and will try to give it a go, probably after it's merged to master and maybe not until it's in a release of Infection - I don't have a lot of time to dedicate to this unfortunately.

Will also try to give running only for changed files a go soon.

Roughly how long did it take you to run Infection for Psalm with the 12GB limit?

@maks-rafalko
Copy link
Member Author

maks-rafalko commented Jun 18, 2019

Roughly how long did it take you to run Infection for Psalm with the 12GB limit?

I ctrl+c'd it because it was too slow :(

As I said, that PR is just a small starting step to fix performance issues. Files are being processed too slow because of a big coverage array. We need to run Infection with some profilers and brainstorm about possible ways to improve it. I have some crazy ideas with APCu / Redis, but this is just early thoughts.

@bdsl
Copy link
Contributor

bdsl commented Jun 26, 2019

@maks-rafalko Thanks again for looking in to this. I finally gave your suggestion of just checking changed files a go, but without much luck, as you can see at https://circleci.com/gh/bdsl/psalm/596 . After that I ran into some bash syntax errors, so for now I'm afraid I've made a PR to stop running infection in the Psalm pipeline. vimeo/psalm#1850

@sanmai
Copy link
Member

sanmai commented Jun 27, 2019

Let's not hurry, please give us a chance. We'll get to the bottom of it. For once we can even start using Circle CI with Infection.

@bdsl
Copy link
Contributor

bdsl commented Jun 27, 2019

Hi @sanmai . I wasn't trying to hurry you, but I also didn't want to keep an ineffective step in the Psalm pipeline for too long, since it adds complexity for other Psalm contributors wanting to understand that pipeline.

I'm still subscribed to this issue and very happy to give it another go in future.

@theofidry theofidry added Bug and removed Performance labels Jan 7, 2020
@helpr helpr bot added the Has PR label Mar 5, 2020
sanmai added a commit to sanmai/infection that referenced this issue Mar 6, 2020
This PR is a continuation of infection#1082. Primary goal of this PR is memory optimization. Fixes infection#705.

With big projects with huge coverage reports Infection still has a lot of difficulty. Major part of this problem stems from the fact that Infection loads virtually all PHPUnit coverage reports at the start, and then goes one by one over source files looking up each file in the reports to understand if it need mutating, where, how, and so on. Not only it is expensive to load all reports at once, but also working with them isn't fast. Basically, it is one big hash table, where Infection tries to access rows from here and from there.

- Yes, it may worth adding a cache, because, say, methods of XMLLineCodeCoverage are called sequentially over data for the same file, but this isn't what I want to propose.
- Yes, with some careful ArrayAccess application you can make the $coverage array clever to load data only as needed, and neither this is what I want to propose.

What's the idea?

If you'd look at the PHPUnit coverage reports, and how we parse them, you would notice that the coverage reports have the same file names we need. Therefore, instead of iterating over files on disk, we can iterate over coverage reports, parsing them one by one, and discarding them once we finish mutating a file.

There are some problems:

- We not only consider these reports, we also consider JUnit reports, joining them together. This procedure had to be split into two. First we load coverage files, select files we want to be mutated (covered, uncovered, filtered), and only then we add JUnit timings, proceeding with mutation.
- For BC reasons we have to consider files outside of coverage reports, therefore we collect them too, adding missing files at the end, but only if needed. Suffice to say if we'd went half-way towards proposal infection#1064, this part could be removed from our pipeline, chances are together with the configuration step where user enters source paths.
- Overall I had to reshuffle a lot of things, while trying to not to rename them too much. E.g. old tests for the new parser should probably work, and so on.
sanmai added a commit that referenced this issue Mar 10, 2020
* Use coverage report as a primary source of files to mutate

This PR is a continuation of #1082. Primary goal of this PR is memory optimization. Fixes #705.

With big projects with huge coverage reports Infection still has a lot of difficulty. Major part of this problem stems from the fact that Infection loads virtually all PHPUnit coverage reports at the start, and then goes one by one over source files looking up each file in the reports to understand if it need mutating, where, how, and so on. Not only it is expensive to load all reports at once, but also working with them isn't fast. Basically, it is one big hash table, where Infection tries to access rows from here and from there.

- Yes, it may worth adding a cache, because, say, methods of XMLLineCodeCoverage are called sequentially over data for the same file, but this isn't what I want to propose.
- Yes, with some careful ArrayAccess application you can make the $coverage array clever to load data only as needed, and neither this is what I want to propose.

What's the idea?

If you'd look at the PHPUnit coverage reports, and how we parse them, you would notice that the coverage reports have the same file names we need. Therefore, instead of iterating over files on disk, we can iterate over coverage reports, parsing them one by one, and discarding them once we finish mutating a file.

There are some problems:

- We not only consider these reports, we also consider JUnit reports, joining them together. This procedure had to be split into two. First we load coverage files, select files we want to be mutated (covered, uncovered, filtered), and only then we add JUnit timings, proceeding with mutation.
- For BC reasons we have to consider files outside of coverage reports, therefore we collect them too, adding missing files at the end, but only if needed. Suffice to say if we'd went half-way towards proposal #1064, this part could be removed from our pipeline, chances are together with the configuration step where user enters source paths.
- Overall I had to reshuffle a lot of things, while trying to not to rename them too much. E.g. old tests for the new parser should probably work, and so on.

* Bring back FileCodeCoverageProvider

* s/createFor/provideFor/ as in #1107

* Integrate FileCodeCoverageProvider

* PhpUnitXmlCoveredFileDataFactory -> Provider

* Configuration/ConfigurationTest

* ConfigurationFactoryTest

* Fix for updated configuration

* Fix PhpUnitXmlCoveredFileDataProviderTest

* FileCodeCoverageProviderFactoryTest

* FileMutationGeneratorTest

* Add stubs for new tests

* Fix MutationGeneratorTest

* Fix CoverageChecker

* WIP

* FileCodeCoverageProviderTest

* Update XmlCoverageParser

* Update src/FileSystem/SourceFileFilter.php

Co-Authored-By: Théo FIDRY <theo.fidry@gmail.com>

* CS

* CS

* SourceFileFilterTest

* TestFileDataAdderTest

* CoveredFileNameFilterTest

* Use parse() instead of parseLazy(), LegacyXmlCoverageParser

* CoveredFileDataTest

* Introduce SourceFileInfoProvider

* CS

* CoveredFileDataFactoryTest

* SourceFileInfoProviderTest

* Update XmlCoverageFixtures

* Update SourceFileInfoProviderTest

* XmlCoverageParserTest

* CS

* XmlCoverageParserTest

* CoveredFileDataFactoryTest

* Bump MSI

* CS

* CS

* CS

* CS

* XPathFactory

* Update ProjectCodeProvider

* Simplify XmlCoverageParser

* Fix ProjectCodeProvider

* Revert change

* Update XmlCoverageParserTest

* Simplify XmlCoverageParser

* Remove @see

* Update src/TestFramework/Coverage/CoveredFileData.php

* CS

* SourceFileFilter: comment

* Make testExecutionInfoAdder a Generator

* Update src/FileSystem/SourceFileFilter.php

Co-Authored-By: Théo FIDRY <theo.fidry@gmail.com>

* Update tests/phpunit/FileSystem/SourceFileFilterTest.php

Co-Authored-By: Théo FIDRY <theo.fidry@gmail.com>

* Update src/TestFramework/PhpUnit/Coverage/IndexXmlCoverageParser.php

Co-Authored-By: Théo FIDRY <theo.fidry@gmail.com>

* CS

* CoverageFileData -> CoverageReport

* s/seenFiles/filesSeen/g

* s/TestFileDataAdder/JUnitTestExecutionInfoAdder/

* CS

* Update tests/phpunit/FileSystem/SourceFileFilterTest.php

Co-Authored-By: Théo FIDRY <theo.fidry@gmail.com>

* CS

* s/assertFilter/assertCanFilterInput/g

* SourceFileFilterTest

* FileMutationGenerator

* CoveredFileData -> SourceFileData

* s/coveredFileData/sourceFileData/g

* MutationGenerator

* Document MutationGenerator

* Remove CoveredFileNameFilter

* s/coverageFileData/coverageReport/g

* MutationGenerator: just files

* ->getSourceFilesFilter()

* SourceFileFilter: missing imports

* SourceFileFilter: notes

* CS

* SourceFileData: remove getRealPath

* Add TODOs

* Update src/TestFramework/Coverage/SourceFileDataProvider.php

* Update src/TestFramework/Coverage/SourceFileDataFactory.php

* Revert "SourceFileData: remove getRealPath"

This reverts commit 26a0c4e.

* SourceFileData

* $testFrameworkKey can now be removed since it's no longer in use

* CS

Co-authored-by: Théo FIDRY <theo.fidry@gmail.com>
@helpr helpr bot added PR merged and removed Has PR labels Mar 10, 2020
@sanmai
Copy link
Member

sanmai commented Mar 10, 2020

@bdsl in the master, and in the next version, huge coverage files are no longer an issue.

You may try to integrate Infection with a smaller subset of mutators:

{
    ....
    "mutators": {
        "@removal": true,
        "@return_value": true,
        "@zero_iteration": true
    }
}

Although I can tell you right away it isn't going to be quick for reasons beyond our control. It should be smooth yet, but not quick.

@bdsl
Copy link
Contributor

bdsl commented Mar 10, 2020

@sanmai Great, thanks for pinging me. I'm trying it out now with Psalm: https://github.com/bdsl/psalm/commits/start-mutation-testing-again

Assuming this works I'll most likely make a PR to add infection back into the Psalm test pipeline once thse memory optimisations are in a full release.

@maks-rafalko
Copy link
Member Author

@sanmai, shouldn't @bdsl use --no-progress & --only-covered to get the maximum of your optimizations?

@bdsl
Copy link
Contributor

bdsl commented Mar 10, 2020

I'm already using --only-covered. Happy to try adding --no-progress and see how that affects things.

https://github.com/bdsl/psalm/blob/699cb3e284a4abdba3e16686f7fa1d05dc12489e/.circleci/config.yml#L93 (not sure why github doesn't want to automatically show that line):

          command: php -d memory_limit=4G ~/.composer/vendor/bin/infection --coverage=build/phpunit --only-covered --threads=2 || echo 'Temporarily ignoring unexplained infection failure'

@maks-rafalko
Copy link
Member Author

more context for --no-progress #1082

@bdsl
Copy link
Contributor

bdsl commented Mar 10, 2020

Looks like maybe infection no longer works via a composer global install? https://app.circleci.com/pipelines/github/bdsl/psalm/218/workflows/5dfa1e19-20de-492c-9a58-95c9a03f61f9/jobs/934

#!/bin/bash -eo pipefail
php -d memory_limit=4G ~/.composer/vendor/bin/infection --coverage=build/phpunit --only-covered --threads=2 || echo 'Temporarily ignoring unexplained infection failure'


Fatal error: Uncaught OutOfBoundsException: Required package "infection/infection" is not installed: check your ./vendor/composer/installed.json and/or ./composer.lock files in /home/docker/project/vendor/ocramius/package-versions/src/PackageVersions/Versions.php:89
Stack trace:
#0 /home/docker/.composer/vendor/infection/infection/src/Console/Application.php(81): PackageVersions\Versions::getVersion('infection/infec...')
#1 /home/docker/.composer/vendor/infection/infection/bin/infection(83): Infection\Console\Application->__construct(Object(Infection\Container))
#2 {main}
  thrown in /home/docker/project/vendor/ocramius/package-versions/src/PackageVersions/Versions.php on line 89
Temporarily ignoring unexplained infection failure

CircleCI received exit code 0

@maks-rafalko
Copy link
Member Author

I see Log In to CircleCI opening this link. Could you please past the result here or open a new issue?

@bdsl
Copy link
Contributor

bdsl commented Mar 10, 2020

@maks-rafalko Edited my comment above.

@maks-rafalko
Copy link
Member Author

ok, extracted to #1148

we also have another issue related to ocramius/package-versions that might be related #876

@sanmai
Copy link
Member

sanmai commented Mar 10, 2020

@bdsl

  • Please definitely use --no-progress and --only-covered will help too.
  • I can't see from your environment, but PCOV is also a must.
  • If you can, try limiting mutation testing to unit-tests only.

If you have several long integration tests, covering a lot of lines, Infection will run them for every line they cover. This is going to take as much time as these tests take, multiplied by the number of mutations, divided by the number of threads.

You can exclude a certain group with --test-framework-options="--exclude-group=long", or by using a specialised phpunit.xml.

I should have mentioned these points before, but for some reason thought you'd be waiting for a release. My mistake.

@sanmai
Copy link
Member

sanmai commented Mar 10, 2020

I'm pretty sure there will be no memory-related issues, but lets have @bdsl confirm this is indeed the case.

@sanmai sanmai reopened this Mar 10, 2020
@bdsl
Copy link
Contributor

bdsl commented Mar 10, 2020

Psalm isn't well covered with actual unit tests - most features are tested simply by passing samples of PHP code that should have and not have issues detectable by psalm and asserting on the returned set of issues - see vimeo/psalm#1788 (comment)

We are using pcov.

I have infection part way through a run in an ssh session to circleci right now - so far it's tested just over 400 mutants, so it's looking good.

@bdsl
Copy link
Contributor

bdsl commented Mar 10, 2020

... and now it's died I'm afraid:

docker@6ac31042464b:~/project$ php -d memory_limit=4G vendor/bin/infection --coverage=build/phpunit --no-progress --only-covered --threads=2

    ____      ____          __  _
   /  _/___  / __/__  _____/ /_(_)___  ____
   / // __ \/ /_/ _ \/ ___/ __/ / __ \/ __ \
 _/ // / / / __/  __/ /__/ /_/ / /_/ / / / /
/___/_/ /_/_/  \___/\___/\__/_/\____/_/ /_/

Infection - PHP Mutation Testing Framework dev-master@42b722334a3818a9156ff361dc081d23cf5e6528

Running initial test suite...

PHPUnit version: 8.5.2

.: killed, M: escaped, S: uncovered, E: fatal error, T: timed out

....MM.M...M.T....M..M.....MMMMMM..TM...MMM.......   (   50)
..................................................   (  100)
.................T................TT..TT..T.......   (  150)
...........................T......................   (  200)
....................T..T..MM..MTT..M.....MT.......   (  250)
T.......TTTT...T....T.T...T....M.T.TMT.T.T..TTT.T.   (  300)
.........TMMM.TTTT..................T.MM.......TM.   (  350)
..M.M.M..M..TT.....MM..M.M......TT..TMM..........T   (  400)
..M..T...T.T..T
Fatal error: Allowed memory size of 4294967296 bytes exhausted (tried to allocate 4294967304 bytes) in /home/docker/project/vendor/infection/infection/src/Mutator/NodeMutationGenerator.php on line 128

@sanmai
Copy link
Member

sanmai commented Mar 10, 2020

Fun times. Just to confirm, are you using a master version for Psalm?

@bdsl
Copy link
Contributor

bdsl commented Mar 10, 2020

@sanmai Yes, the direct comparison between my branch and Psalm master is https://bit.ly/2Q4LpEx . Only the circleci config file is changed. I had to re-post the comment because the permalink bot is editing the link repeatedly and getting it wrong if I don't use bit.ly

@bdsl
Copy link
Contributor

bdsl commented Mar 10, 2020

Circle CI only provides 4GB ram in containers on the free plan, so I'm not sure whether this would work if we upped the memory limit. I should be able to try it on my own machine but I need to sleep and do other stuff first. Have a good day / night Maks and Alexey, let's talk again soon.

@sanmai
Copy link
Member

sanmai commented Mar 11, 2020

Have a good night, and good morning. It's early morning for me, so no worries here.

Thank you for linking vimeo/psalm#1788.

Now, if you take a closer look at the report:

find build/*/ -type f  | xargs ls -sGSrh | tail

You'll notice there are several big coverage reports:

  69M build/logs/coverage-xml/Psalm/Internal/Type/ParseTree.php.xml
  80M build/logs/coverage-xml/Psalm/Internal/Codebase/Reflection.php.xml
  85M build/logs/coverage-xml/Psalm/Internal/Type/TypeCombination.php.xml
  89M build/logs/coverage-xml/Psalm/Internal/Analyzer/TypeAnalyzer.php.xml
 146M build/logs/coverage-xml/Psalm/Type.php.xml
 163M build/logs/coverage-xml/Psalm/Internal/Analyzer/CommentAnalyzer.php.xml
 171M build/logs/coverage-xml/Psalm/Internal/Codebase/Populator.php.xml
 382M build/logs/coverage-xml/Psalm/Internal/Visitor/ReflectorVisitor.php.xml

As a rule of thumb, Infection needs twice the RAM for a given coverage report. E.g. it'll need 800M for 400M report. We see here that a single report shouldn't be a concern with 4GB available.

Before Infection would try to load all reports at once. What Infection does now it loads these reports one by one, while trying to work with them, and it can so happen that Infection has several these reports loaded at once, blowing the memory limit.

Simplest thing here to do is to remove largest of the reports, and remove their entires from the index.xml.

rm build/*/coverage-xml/Psalm/Type.php.xml
rm build/*/coverage-xml/Psalm/Internal/Analyzer/CommentAnalyzer.php.xml
rm build/*/coverage-xml/Psalm/Internal/Codebase/Populator.php.xml
rm build/*/coverage-xml/Psalm/Internal/Visitor/ReflectorVisitor.php.xml
edit build/*/coverage-xml/index.xml

This will make corresponding files effectively uncovered, but would let Infection shuffle along.

What we can do here Infection-wise:

  • The proper thing to do is to use a different, streaming, XML parser. There are several of them. It's a big job, and I wish I was mistaken here.

  • As a workaround, we can control for memory limit, and do not load/queue further jobs.

    • ParallelProcessRunner controls the queue, but if done inside ParallelProcessRunner this'll make things slower sometimes without a reason.
    • IndexXmlCoverageParser can postpone loading certain files if it knows there's no way to fit them into memory. But this will require it to either pass a No-Op mutations, or communicate with ParallelProcessRunner thought events to trigger freeTerminatedProcesses.
    • Both workarounds will require a memory limit set in the first place, which is not guaranteed.

I cannot say I'm a fan of any of the workarounds.

@sanmai
Copy link
Member

sanmai commented Mar 11, 2020

With Psalm there's another problem: there are hidden dependencies between tests. Therefore Infection would not reliably detect all mutations. This is a much bigger problem, so I guess we'll have to just keep that in mind. Verify with:

php vendor/bin/phpunit --order-by=random --stop-on-defect

If it fails, means there's an undeclared dependency.

@theofidry
Copy link
Member

@sanmai can it be that we have a leak still?

@sanmai
Copy link
Member

sanmai commented Mar 11, 2020

@theofidry I was able to run Infection with a smaller subset of profiles, and with timeout of one second.

@maks-rafalko
Copy link
Member Author

we did many changes to run Infection for big projects like Psalm, I don't think we need to keep it open.

Really big projects can/should run Infection for a diff (modified and/or added files) only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants