New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infection fails with "Allowed memory size ... exhausted" for a project with a really big coverage #705
Comments
I guess this is using xdebug. In the Psalm CircleCI set up coverage data is collected using PCOV, and takes around ten minutes. |
Do you know why I don't see the message about allowed memory size? I did see that before I added |
That's 256 Mb for short. It seems like |
@bdsl if you ssh into that image, will be there more details? It may be hitting some outside limits, like if that VM has no swap, and not 4 Gb of RAM. |
I've copied it from local machine. When I run it with The issue is indeed in the very big coverage data files (~2Gb in total). |
I've created a PR with some minor changes that allowed me to run Infection for Psalm with This is not a final fix, just a few minor (but still useful) improvements. @bdsl I'm pretty sure, you need to run Infection for Psalm only for changed files. This will dramatically reduce the mutation time. I'm not sure Infection is ready for such many mutations: Processing source code files: 543/543
Creating mutated files and processes: 0/20516 You can see how we do it for Infection itself. The further investigation of what is going on with coverage data is still needed, the issue is not resolved. |
@sanmai I've re-run the job and sshd into the docker container at CircleCI. Not sure what details you need though. Maybe these will help:
|
Thank you @maks-rafalko . I've subscribed to that PR and will try to give it a go, probably after it's merged to master and maybe not until it's in a release of Infection - I don't have a lot of time to dedicate to this unfortunately. Will also try to give running only for changed files a go soon. Roughly how long did it take you to run Infection for Psalm with the 12GB limit? |
I As I said, that PR is just a small starting step to fix performance issues. Files are being processed too slow because of a big coverage array. We need to run Infection with some profilers and brainstorm about possible ways to improve it. I have some crazy ideas with APCu / Redis, but this is just early thoughts. |
@maks-rafalko Thanks again for looking in to this. I finally gave your suggestion of just checking changed files a go, but without much luck, as you can see at https://circleci.com/gh/bdsl/psalm/596 . After that I ran into some bash syntax errors, so for now I'm afraid I've made a PR to stop running infection in the Psalm pipeline. vimeo/psalm#1850 |
Let's not hurry, please give us a chance. We'll get to the bottom of it. For once we can even start using Circle CI with Infection. |
Hi @sanmai . I wasn't trying to hurry you, but I also didn't want to keep an ineffective step in the Psalm pipeline for too long, since it adds complexity for other Psalm contributors wanting to understand that pipeline. I'm still subscribed to this issue and very happy to give it another go in future. |
This PR is a continuation of infection#1082. Primary goal of this PR is memory optimization. Fixes infection#705. With big projects with huge coverage reports Infection still has a lot of difficulty. Major part of this problem stems from the fact that Infection loads virtually all PHPUnit coverage reports at the start, and then goes one by one over source files looking up each file in the reports to understand if it need mutating, where, how, and so on. Not only it is expensive to load all reports at once, but also working with them isn't fast. Basically, it is one big hash table, where Infection tries to access rows from here and from there. - Yes, it may worth adding a cache, because, say, methods of XMLLineCodeCoverage are called sequentially over data for the same file, but this isn't what I want to propose. - Yes, with some careful ArrayAccess application you can make the $coverage array clever to load data only as needed, and neither this is what I want to propose. What's the idea? If you'd look at the PHPUnit coverage reports, and how we parse them, you would notice that the coverage reports have the same file names we need. Therefore, instead of iterating over files on disk, we can iterate over coverage reports, parsing them one by one, and discarding them once we finish mutating a file. There are some problems: - We not only consider these reports, we also consider JUnit reports, joining them together. This procedure had to be split into two. First we load coverage files, select files we want to be mutated (covered, uncovered, filtered), and only then we add JUnit timings, proceeding with mutation. - For BC reasons we have to consider files outside of coverage reports, therefore we collect them too, adding missing files at the end, but only if needed. Suffice to say if we'd went half-way towards proposal infection#1064, this part could be removed from our pipeline, chances are together with the configuration step where user enters source paths. - Overall I had to reshuffle a lot of things, while trying to not to rename them too much. E.g. old tests for the new parser should probably work, and so on.
* Use coverage report as a primary source of files to mutate This PR is a continuation of #1082. Primary goal of this PR is memory optimization. Fixes #705. With big projects with huge coverage reports Infection still has a lot of difficulty. Major part of this problem stems from the fact that Infection loads virtually all PHPUnit coverage reports at the start, and then goes one by one over source files looking up each file in the reports to understand if it need mutating, where, how, and so on. Not only it is expensive to load all reports at once, but also working with them isn't fast. Basically, it is one big hash table, where Infection tries to access rows from here and from there. - Yes, it may worth adding a cache, because, say, methods of XMLLineCodeCoverage are called sequentially over data for the same file, but this isn't what I want to propose. - Yes, with some careful ArrayAccess application you can make the $coverage array clever to load data only as needed, and neither this is what I want to propose. What's the idea? If you'd look at the PHPUnit coverage reports, and how we parse them, you would notice that the coverage reports have the same file names we need. Therefore, instead of iterating over files on disk, we can iterate over coverage reports, parsing them one by one, and discarding them once we finish mutating a file. There are some problems: - We not only consider these reports, we also consider JUnit reports, joining them together. This procedure had to be split into two. First we load coverage files, select files we want to be mutated (covered, uncovered, filtered), and only then we add JUnit timings, proceeding with mutation. - For BC reasons we have to consider files outside of coverage reports, therefore we collect them too, adding missing files at the end, but only if needed. Suffice to say if we'd went half-way towards proposal #1064, this part could be removed from our pipeline, chances are together with the configuration step where user enters source paths. - Overall I had to reshuffle a lot of things, while trying to not to rename them too much. E.g. old tests for the new parser should probably work, and so on. * Bring back FileCodeCoverageProvider * s/createFor/provideFor/ as in #1107 * Integrate FileCodeCoverageProvider * PhpUnitXmlCoveredFileDataFactory -> Provider * Configuration/ConfigurationTest * ConfigurationFactoryTest * Fix for updated configuration * Fix PhpUnitXmlCoveredFileDataProviderTest * FileCodeCoverageProviderFactoryTest * FileMutationGeneratorTest * Add stubs for new tests * Fix MutationGeneratorTest * Fix CoverageChecker * WIP * FileCodeCoverageProviderTest * Update XmlCoverageParser * Update src/FileSystem/SourceFileFilter.php Co-Authored-By: Théo FIDRY <theo.fidry@gmail.com> * CS * CS * SourceFileFilterTest * TestFileDataAdderTest * CoveredFileNameFilterTest * Use parse() instead of parseLazy(), LegacyXmlCoverageParser * CoveredFileDataTest * Introduce SourceFileInfoProvider * CS * CoveredFileDataFactoryTest * SourceFileInfoProviderTest * Update XmlCoverageFixtures * Update SourceFileInfoProviderTest * XmlCoverageParserTest * CS * XmlCoverageParserTest * CoveredFileDataFactoryTest * Bump MSI * CS * CS * CS * CS * XPathFactory * Update ProjectCodeProvider * Simplify XmlCoverageParser * Fix ProjectCodeProvider * Revert change * Update XmlCoverageParserTest * Simplify XmlCoverageParser * Remove @see * Update src/TestFramework/Coverage/CoveredFileData.php * CS * SourceFileFilter: comment * Make testExecutionInfoAdder a Generator * Update src/FileSystem/SourceFileFilter.php Co-Authored-By: Théo FIDRY <theo.fidry@gmail.com> * Update tests/phpunit/FileSystem/SourceFileFilterTest.php Co-Authored-By: Théo FIDRY <theo.fidry@gmail.com> * Update src/TestFramework/PhpUnit/Coverage/IndexXmlCoverageParser.php Co-Authored-By: Théo FIDRY <theo.fidry@gmail.com> * CS * CoverageFileData -> CoverageReport * s/seenFiles/filesSeen/g * s/TestFileDataAdder/JUnitTestExecutionInfoAdder/ * CS * Update tests/phpunit/FileSystem/SourceFileFilterTest.php Co-Authored-By: Théo FIDRY <theo.fidry@gmail.com> * CS * s/assertFilter/assertCanFilterInput/g * SourceFileFilterTest * FileMutationGenerator * CoveredFileData -> SourceFileData * s/coveredFileData/sourceFileData/g * MutationGenerator * Document MutationGenerator * Remove CoveredFileNameFilter * s/coverageFileData/coverageReport/g * MutationGenerator: just files * ->getSourceFilesFilter() * SourceFileFilter: missing imports * SourceFileFilter: notes * CS * SourceFileData: remove getRealPath * Add TODOs * Update src/TestFramework/Coverage/SourceFileDataProvider.php * Update src/TestFramework/Coverage/SourceFileDataFactory.php * Revert "SourceFileData: remove getRealPath" This reverts commit 26a0c4e. * SourceFileData * $testFrameworkKey can now be removed since it's no longer in use * CS Co-authored-by: Théo FIDRY <theo.fidry@gmail.com>
@bdsl in the master, and in the next version, huge coverage files are no longer an issue. You may try to integrate Infection with a smaller subset of mutators:
Although I can tell you right away it isn't going to be quick for reasons beyond our control. It should be smooth yet, but not quick. |
@sanmai Great, thanks for pinging me. I'm trying it out now with Psalm: https://github.com/bdsl/psalm/commits/start-mutation-testing-again Assuming this works I'll most likely make a PR to add infection back into the Psalm test pipeline once thse memory optimisations are in a full release. |
I'm already using https://github.com/bdsl/psalm/blob/699cb3e284a4abdba3e16686f7fa1d05dc12489e/.circleci/config.yml#L93 (not sure why github doesn't want to automatically show that line):
|
more context for |
Looks like maybe infection no longer works via a composer global install? https://app.circleci.com/pipelines/github/bdsl/psalm/218/workflows/5dfa1e19-20de-492c-9a58-95c9a03f61f9/jobs/934
|
I see |
@maks-rafalko Edited my comment above. |
If you have several long integration tests, covering a lot of lines, Infection will run them for every line they cover. This is going to take as much time as these tests take, multiplied by the number of mutations, divided by the number of threads. You can exclude a certain group with I should have mentioned these points before, but for some reason thought you'd be waiting for a release. My mistake. |
I'm pretty sure there will be no memory-related issues, but lets have @bdsl confirm this is indeed the case. |
Psalm isn't well covered with actual unit tests - most features are tested simply by passing samples of PHP code that should have and not have issues detectable by psalm and asserting on the returned set of issues - see vimeo/psalm#1788 (comment) We are using pcov. I have infection part way through a run in an ssh session to circleci right now - so far it's tested just over 400 mutants, so it's looking good. |
... and now it's died I'm afraid:
|
Fun times. Just to confirm, are you using a master version for Psalm? |
@sanmai Yes, the direct comparison between my branch and Psalm master is https://bit.ly/2Q4LpEx . Only the circleci config file is changed. I had to re-post the comment because the permalink bot is editing the link repeatedly and getting it wrong if I don't use bit.ly |
Circle CI only provides 4GB ram in containers on the free plan, so I'm not sure whether this would work if we upped the memory limit. I should be able to try it on my own machine but I need to sleep and do other stuff first. Have a good day / night Maks and Alexey, let's talk again soon. |
Have a good night, and good morning. It's early morning for me, so no worries here. Thank you for linking vimeo/psalm#1788. Now, if you take a closer look at the report:
You'll notice there are several big coverage reports:
As a rule of thumb, Infection needs twice the RAM for a given coverage report. E.g. it'll need 800M for 400M report. We see here that a single report shouldn't be a concern with 4GB available. Before Infection would try to load all reports at once. What Infection does now it loads these reports one by one, while trying to work with them, and it can so happen that Infection has several these reports loaded at once, blowing the memory limit. Simplest thing here to do is to remove largest of the reports, and remove their entires from the
This will make corresponding files effectively uncovered, but would let Infection shuffle along. What we can do here Infection-wise:
I cannot say I'm a fan of any of the workarounds. |
With Psalm there's another problem: there are hidden dependencies between tests. Therefore Infection would not reliably detect all mutations. This is a much bigger problem, so I guess we'll have to just keep that in mind. Verify with:
If it fails, means there's an undeclared dependency. |
@sanmai can it be that we have a leak still? |
@theofidry I was able to run Infection with a smaller subset of profiles, and with timeout of one second. |
we did many changes to run Infection for big projects like Psalm, I don't think we need to keep it open. Really big projects can/should run Infection for a diff (modified and/or added files) only |
Extracted from https://twitter.com/sorsoup/status/1140021198765658118
Command line:
php -d memory_limit=12G ~/sites/infection/bin/infection --coverage=build/phpunit --only-covered --threads=4
Output:
So it fails with when the coverage data is being accumulated in the following lines of the code (depending on how much
memory_limit
is setsrc/TestFramework/PhpUnit/Coverage/CoverageXmlParser.php:170
src/TestFramework/Coverage/CodeCoverageData.php:220
Unfortunately, code coverage takes 2+ hours for Psalm, so here is a compressed archive of needed coverage files: https://www.dropbox.com/s/66tvd0tctfbdki3/phpunit.bz?dl=0 for commit vimeo/psalm@4c57c67
The text was updated successfully, but these errors were encountered: