Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large memory comsumption #4072

Closed
LuciferDevStar opened this issue Nov 11, 2020 · 52 comments
Closed

Large memory comsumption #4072

LuciferDevStar opened this issue Nov 11, 2020 · 52 comments

Comments

@LuciferDevStar
Copy link

LuciferDevStar commented Nov 11, 2020

Large memory cunsumption in bitbucket pipeline

I have 6GB RAM available for phpstan and I run phpstan with following command:

vendor/bin/phpstan analyse -l 1 app -c app/config/phpstan.neon

My phpstan.neon
`parameters:

parallel:
	maximumNumberOfProcesses: 1
	processTimeout: 3000.0`

screen bitbucket

Do somebody known why is 6GB RAM not enough? It is some way to decrease memory limit consumption?

@ondrejmirtes
Copy link
Member

Hi, how big is the project you're analysing? How many files? What does PHPStan report at the end when you run it with -vvv?

@spaceemotion
Copy link

spaceemotion commented Nov 12, 2020

Same issue here, I am at 8 threads max, using 6gb RAM on a project with 6969 files (stuck at 1620).


Edit: I've rerun the generation on a computer with a lot more RAM, it maxed out at 30GB consumption and 32 threads full usage.

@ondrejmirtes
Copy link
Member

I'm running PHPStan on a project with 10k files and 3GB RAM is enough for it. Most likely it has a problem with a few big files with some structure that it gets stuck on. For example files like this https://github.com/arron0/phpstan-issue-4026/blob/master/src/TexyConvert.php can consume a lot of RAM.

You should run with --debug and see which file it takes a lot of time to process. If you report to me how these files look like, I have more datapoints to optimize PHPStan. Also, you can exclude_analyse until I get around to optimizing these scenarios.

@pyguerder
Copy link

@ondrejmirtes Facing the same problem (excessive RAM usage making BB pipelines fail), I would like to investigate, but --debug only prints the list of files and not how much time or memory it takes to process each... Would it be possible to print more information in debug to help investigate this? Thanks in advance

@ondrejmirtes
Copy link
Member

Alright, if you run phpstan/phpstan dev-master with --debug -vvv, it will show output similar to this:

/Users/ondrej/Development/phpstan/build/PHPStan/Build/ServiceLocatorDynamicReturnTypeExtension.php
--- consumed 24 MB, total 78 MB
/Users/ondrej/Development/phpstan/src/PhpDoc/LazyTypeNodeResolverExtensionRegistryProvider.php
--- consumed 4 MB, total 82 MB
/Users/ondrej/Development/phpstan/src/PhpDoc/TypeNodeResolverExtensionRegistry.php
--- consumed 0 B, total 82 MB
/Users/ondrej/Development/phpstan/src/PhpDoc/PhpDocInheritanceResolver.php
--- consumed 8 MB, total 90 MB
/Users/ondrej/Development/phpstan/src/PhpDoc/TypeNodeResolverExtensionRegistryProvider.php
--- consumed 0 B, total 90 MB
/Users/ondrej/Development/phpstan/src/PhpDoc/TypeStringResolver.php
--- consumed 2 MB, total 92 MB
/Users/ondrej/Development/phpstan/src/PhpDoc/StubPhpDocProvider.php
--- consumed 8 MB, total 100 MB
/Users/ondrej/Development/phpstan/src/PhpDoc/TypeNodeResolverAwareExtension.php
--- consumed 0 B, total 100 MB
/Users/ondrej/Development/phpstan/src/PhpDoc/ConstExprNodeResolver.php
--- consumed 4 MB, total 104 MB
/Users/ondrej/Development/phpstan/src/PhpDoc/StubValidator.php
--- consumed 32 MB, total 136 MB

Implemented in: phpstan/phpstan-src@29f8938

@LuciferDevStar
Copy link
Author

Phpstan scan in my project 3626 files and when I try --debug -vvv my output looks like this:

presenters/SettingLoyaltyClientProgramPresenter.php
--- consumed 0 B, total 250.5 MB
presenters/AdvertCarTagPresenter.php
--- consumed 0 B, total 250.5 MB
presenters/CarGearboxPresenter.php
--- consumed 2 MB, total 252.5 MB
presenters/SupervisorPresenter.php
--- consumed 0 B, total 252.5 MB
presenters/RentServicePresenter.php
--- consumed 2 MB, total 254.5 MB
presenters/SettingCarPresenter.php
--- consumed 4 MB, total 258.5 MB
presenters/OperationCategoryPresenter.php
--- consumed 0 B, total 258.5 MB
presenters/RentOfferedServicePresenter.php
--- consumed 0 B, total 258.5 MB
presenters/RentOccupancyPresenter.php
--- consumed 6 MB, total 264.5 MB

Most files consume 0-10MB but some my presenters consume 220 MB but I don`t know what is wrong with these files because look like similar as each other files. Memory is not free continuously?

@pyguerder
Copy link

Same observation here: I have a file that shows "0 B" and it is quite similar to another that shows ~ 120 MB.

@ondrejmirtes can I send you those two files privately?

@KacerCZ
Copy link

KacerCZ commented Nov 17, 2020

I have project with 5500 files organized into several modules, memory limit for PHPStan set to 2 GB.
Analysis of whole project runs out of memory, but analysis of individual modules goes fine.
I believe thee is some memory leak, not a single file causing excessive memory usage.

@ondrejmirtes
Copy link
Member

Does anyone have any tool recommendation to investigate memory leaks?

@szepeviktor
Copy link
Contributor

https://blackfire.io/ is the best on the market.
Has a free plan.

@ondrejmirtes
Copy link
Member

I tried it but the memory output isn't very useful. Can you use it to find the memory leak @szepeviktor?

@KacerCZ
Copy link

KacerCZ commented Nov 27, 2020

Using level 0/1/2/3 didn't change consumed memory.
Returning from empty StatementResult at begining of PHPStan\Analyzer\NodeScopeResolver::processStmtNode() greatly reduced consumed memory.
Now I'm trying to figure out what is happenig there and if there can be any problem with storing references to objects which prevents them from being disposed.

@fezfez
Copy link

fezfez commented Jan 20, 2021

Maybe fixes by #4401?

@pyguerder
Copy link

@fezfez Thank you very much for your investigation.

I can see a clear improvement using 0.12.70 compared to 0.12.68, but still not enough for me to use it on Bitbucket Pipelines.

With my project, it's working fine in BB Pipelines with 0.12.25 and it works most of the time with 0.12.48, but any version after 0.12.48 have a excessive memory footprint...

On your results in #4435, would you mind also comparing with versions 0.12.25, 0.12.48, 0.12.49? How exactly do you get the memory footprint? Do you use -debug -vvv or another way? I can't get precise data with those versions... Thanks in advance!

@fezfez
Copy link

fezfez commented Jan 28, 2021

@pyguerder i use -debug -vvv and i try with 0.12.25 but it dont give memory info....

@ondrejmirtes
Copy link
Member

@pyguerder You can probably lower your total memory footprint by using less threads: https://phpstan.org/config-reference#parallel-processing

@pyguerder
Copy link

@ondrejmirtes thanks, I did that already. Here are my observations:

With version 0.12.25, it works with 2 threads in 2-3 min.
With versions < 0.12.48, it works with 1 or 2 threads in 3-4 min.
With versions > 0.12.49, it fails even with one thread.
With version 0.12.70, it works with one thread in ~5 min.

I'm thinking that more memory optimizations could bring versions > 0.12.70 to the memory footprint of 0.12.25 so I could use more threads and fasten the pipelines...

Thanks for all your work! Best regards

@KacerCZ
Copy link

KacerCZ commented Jan 29, 2021

@ondrejmirtes Thanks for recent memory optimizations, it helps a lot.
Have you considered restarting worker process after certain count of processed files?
Something similar does Apache HTTP server - worker process is ended after given count of served requests to free resources.

@ondrejmirtes
Copy link
Member

@KacerCZ It'd make matters worse.

  1. In already running process the caches are warmed up, so periodic restarts would slow the analysis down.
  2. Periodic restarts are band aid. I'd rather address the underlying memory leaks rather than do this.

@ondrejmirtes
Copy link
Member

PHPStan 0.12.71 should be a bit better again performance and memory-wise, please test it and report back, thank you :)

@pyguerder
Copy link

@ondrejmirtes Thanks! I have set up a small script to report memory consumption and execution time on two of my project (a small one and a big one).

You can see the script here: https://pastebin.com/kRgmx1Sf

And the results here: https://pastebin.com/WQ6LqWdF (big project)
and here: https://pastebin.com/17DQbuBt (small project)

I hope it can provide useful information. Best regards

@alfredbez
Copy link

Does anyone have any tool recommendation to investigate memory leaks?

Maybe https://github.com/arnaud-lb/php-memory-profiler from @arnaud-lb can help finding memory leaks here.

I have a codebase, that needs a total of 4.21 GB (PHPStan 0.12.70) or 4.15 GB (PHPStan 0.12.71). The file that consumed the most memory had 397.56 MB and I found it like this:

$ docker run --rm -i -w /app -v $PWD:/app <DOCKERIMAGE:TAG> vendor/bin/phpstan analyse --memory-limit=6G --debug -vvv -- > /tmp/phpstan-memory.txt
$ grep -B1 $(grep consumed /tmp/phpstan-memory.txt | awk '{print $3}' | grep -v -w '0' | sort -nr | head -1) /tmp/phpstan-memory.txt

I was not able to collect profiling data, because I had no idea where to put memprof_dump_callgrind(), since phpstan ships with a phar file and I can't edit that. Adding that line to the end of vendor/bin/phpstan didn't help.

If someone points me to the right direction I can profile that on my machine.

@ondrejmirtes
Copy link
Member

@alfredbez If you were able to isolate that file that consumes 400 MB on its own to a separate repository, I might be able to look into it and come up with an optimization.

@arnaud-lb
Copy link
Contributor

@alfredbez I had a memprof branch that can dump the profile when the memory is exceeded, so I've just pushed it here: arnaud-lb/php-memory-profiler#47. With this change, you don't need to call memprof_dump_callgrind() explicitly :)

@alfredbez
Copy link

Nice! I'll take a look into that tomorrow👍

@alfredbez
Copy link

I managed to profile the file I mentioned with the branch @arnaud-lb provided (thanks for that 👍), not sure if that can help here:
memprof.callgrind.zip

Looks like PhpParser\NodeAbstract::setAttribute was called more than a million times in that run, using 25.49% exclusively, but I'm not that experienced with memory profilings, so maybe I just interpret that wrong:
image

@alfredbez If you were able to isolate that file that consumes 400 MB on its own to a separate repository, I might be able to look into it and come up with an optimization.

I tried that yesterday, but it failed. I could give you access to the private repository if that helps.

@alfredbez
Copy link

I also analyzed that with blackfire, you can have a look at this here: https://blackfire.io/profiles/568084d9-776d-4593-b552-64a27142be37/graph

@szepeviktor
Copy link
Contributor

I also analyzed that with blackfire

Now we're talking.

@alfredbez
Copy link

Note, that the callgrind-file shows a run that was running into the memory-limit (400M) and the blackfire-graph was running with much larger memory-limit (1G) without running into the memory-limit.

@arnaud-lb
Copy link
Contributor

arnaud-lb commented Feb 4, 2021

Thanks @alfredbez!

The profile data shows who caused the allocation of all the memory at the time of the exhaustion.

The "Inc." and "Excl." columns show the exact amount of memory for each function. The number of calls is usually less significant.

Sorting by "Incl" or "Excl both point at the parser. Intuitively, I would say that:

  • Either there is one source file generating a big syntax tree, causing a spike in memory usage
  • Or there is some nested parsing, e.g. one source file causes the parsing of many other source files
  • Or some ASTs / Nodes are retained / accumulate overtime in some case, causing the memory usage to slowly increase

Note that memprof eliminates freed memory from the profile data, so the profile shows "live" memory only, not memory that was already freed.

@alfredbez
Copy link

btw, here's the file I was analyzing, it basically overwrites this codeblock:

<?php declare(strict_types = 1);

namespace Pyz\Zed\Stock\Persistence;

use Generated\Shared\Transfer\StoreTransfer;
use Orm\Zed\Stock\Persistence\SpyStockProductQuery;
use Spryker\Zed\PropelOrm\Business\Runtime\ActiveQuery\Criteria;
use Spryker\Zed\Stock\Persistence\StockRepository as SprykerStockRepository;

class StockRepository extends SprykerStockRepository
{
    /**
     * @param string $abstractSku
     * @param \Generated\Shared\Transfer\StoreTransfer $storeTransfer
     *
     * @return \Orm\Zed\Stock\Persistence\SpyStockProductQuery
     */
    protected function queryStockProductByProductAbstractSkuAndStore(string $abstractSku, StoreTransfer $storeTransfer): SpyStockProductQuery
    {
        return $this->getFactory()
            ->createStockProductQuery()
            ->useSpyProductQuery(null, Criteria::LEFT_JOIN)
                ->filterByIsActive(true)
                ->useSpyProductAbstractQuery(null, Criteria::LEFT_JOIN)
                    ->filterBySku($abstractSku)
                ->endUse()
            ->endUse()
            ->useStockQuery(null, Criteria::LEFT_JOIN)
                ->filterByIsActive(true)
                ->useStockStoreQuery(null, Criteria::LEFT_JOIN)
                    ->useStoreQuery(null, Criteria::LEFT_JOIN)
                        ->filterByName($storeTransfer->getName())
                    ->endUse()
                ->endUse()
            ->endUse();
    }
}

Two of the used classes are also on Github:

A third one just extends AbstractSpyStockProductQuery:

<?php declare(strict_types = 1);

namespace Orm\Zed\Stock\Persistence;

use Spryker\Zed\Stock\Persistence\Propel\AbstractSpyStockProductQuery as BaseSpyStockProductQuery;

class SpyStockProductQuery extends BaseSpyStockProductQuery
{
}

@ondrejmirtes
Copy link
Member

@alfredbez Please create a small reproducing repository that shows the high memory consumption.

@pyguerder
Copy link

@alfredbez that's very interesting: I am using the same structure on half of my projects, based on the Propel ORM (Propel2 in my case).

I'm wondering if changes like this one can reduce (or increase) PHPStan memory consumption: https://github.com/propelorm/Propel2/pull/1657/files

@alfredbez
Copy link

@pyguerder I tried that on my project, but it still shows the high memory consumption. The memory usage increased even (but only a bit, from 543 MB to 545 MB on my test).

@pyguerder
Copy link

I investigated on my side and found a simple explanation concerning memory consumption in my case.
I cloned phpstan-src and ran phpstan-src/bin/phpstan analyze src --memory-limit=3G --debug --xdebug -vvv with the xdebug extension enabled on my PHP container.

I observed that nearly 60% of the RAM consumption is due to applying preg_replace to the PHPDoc:

Calls-list
Calls-graph

I suspected some memory leak in preg_replace but I could not find any (after quick search).

I tried to replace
$docComment = \Nette\Utils\Strings::replace($docComment, '#\s+#', ' ');
with
$docComment = preg_replace('#\s+#', ' ', $docComment);
in /src/Type/FileTypeMapper.php:522 but it did not change memory consumption.

As I wrote, the explanation is quite simple: I use the Propel2 ORM, which generates classes with a lot of PHPDoc. For simple models, the base Query class contains a single PHPDoc of 7.6 KB. For my most complex Query class, there is a single PHPDoc of 107 KB!

Example:

/**
 * Base class that represents a query for the 'quota' table.
 *
 *
 *
 * @method     ChildQuotaQuery orderById($order = Criteria::ASC) Order by the id column

[skipped lines... there are 70 lines here for simple models and 800 for complex ones!] 

 * @method     ChildQuota[]|\Propel\Runtime\Util\PropelModelPager paginate($page = 1, $maxPerPage = 10, ConnectionInterface $con = null) Issue a SELECT query based on the current ModelCriteria and uses a page and a maximum number of results per page to compute an offset and a limit
 *
 */

I think those lines are essential to PHPStan because they define explicitly some methods that will be actually executed by the __call method in https://github.com/propelorm/Propel2/blob/master/src/Propel/Runtime/ActiveQuery/ModelCriteria.php

@ondrejmirtes Do you think there is a way to optimize memory consumption in that case? Maybe by clearing some references after getPhpDocKey is performed?

I hope this provides useful information. Sorry if it is too specific to my case and may not benefit to the more general case of this issue...

@MartinMystikJonas
Copy link
Contributor

@pyguerder Memory used by preg_replace is most probably released immediately. Your report shows sum of how many memory it allocated during entire run not how much memory it helds simultaneously at any given time.

@MartinMystikJonas
Copy link
Contributor

@pyguerder Could you try to get xhprof snapshots? It could give better info where memory is allocated and not released.

@ondrejmirtes
Copy link
Member

@pyguerder I just commited a little microoptimization that does no longer call getReformattedText, please fetch the latest phpstan-src and call composer install and see if the TOTAL memory consumption at the end of the run (when you run with --debug -vvv) improves.

@pyguerder
Copy link

@ondrejmirtes using the last commit of phpstan-src (using ondrejmirtes/better-reflection 4.3.49) instead of the previous one (using 4.3.48), memory consumption is unchanged, at 1.13 GB (I tested two times. Of course, I ran composer install in phpstan-src in-between to have the right version of ondrejmirtes/better-reflection). Best regards

@ondrejmirtes
Copy link
Member

Yeah so @MartinMystikJonas is right, this isn't the reason. Better to revert that.

@pyguerder
Copy link

@MartinMystikJonas @ondrejmirtes I did not manage to generate memory snapshots with xhprof but I could do it with Blackfire: https://blackfire.io/profiles/032a8d59-5e65-4d18-935d-b01b3d1b3eee/graph

Does this provide useful information? Best regards

@matweew
Copy link

matweew commented May 28, 2021

phpstan 0.12.88

Tried to remove phpstan phpdoc from the file (6 methods like this) with Propel2 classes in use
@phpstan-return \Orm\Zed\ProductImage\Persistence\SpyProductImageQuery<\Orm\Zed\ProductImage\Persistence\SpyProductImage>

and memory consumption decreased from 247.16 MB to 46.5 MB

@matweew
Copy link

matweew commented May 28, 2021

But the real problem is still when nested query objects are used:

$priceProductDefaultQuery->usePriceProductStoreQuery()
                ->usePriceProductQuery()
                    ->filterByFkProduct($priceProductTableCriteriaTransfer->getIdProductConcrete())
                    ->useProductQuery()
                        ->useSpyProductAbstractQuery()
                            ->useSpyMerchantProductAbstractQuery()
                                ->filterByFkMerchant($priceProductTableCriteriaTransfer->getIdMerchantOrFail())
                            ->endUse()
                        ->endUse()
                    ->endUse()
                ->endUse()
            ->endUse()

@ondrejmirtes
Copy link
Member

Can everyone here try PHPStan 1.0 to see if memory consumption improved? There was quite a lot of improvements in this direction.

@kdckrs
Copy link

kdckrs commented Dec 10, 2021

@ondrejmirtes just upgraded our CI/CD flow to work with the latest version and it started failing because of excessive memory usage. Running verion 0.12.x did not have this issue. Any idea?

@ondrejmirtes
Copy link
Member

@kdckrs Please run PHPStan with --debug -vvv. You'll see how much memory each file consumes. You'll eventually find the bottleneck.

@pyguerder
Copy link

pyguerder commented Dec 12, 2021

@ondrejmirtes on my side, unfortunately, it is not significantly better than with 0.12.x, meaning my pipelines will only work if I reduce the number of threads. With my projects, high memory consumption is clearly related to big Propel queries as above. Splitting them in multiple blocks can improve the situation, but in my opinion, it reduces code readability. Regards

@kdckrs
Copy link

kdckrs commented Dec 13, 2021

@ondrejmirtes when I run it with --debug -vvv it somehow passed our CI/CD flow without any (memory) issues. After removing the --debug -vvv again it somehow break again? Any idea?

@ondrejmirtes
Copy link
Member

@kdckrs With --debug it runs in a single thread and takes a much longer, but the total memory consumed comes from the one thread. With parallel run, it's gonna finish much faster, but it's gonna consume more memory, multiplied by the number of processes.

You should go through the --debug -vvv output and see which file consumes a lot of memory.

@alfredbez
Copy link

Just checked that on our project and it looks better now with version 1.4.2, cut's the memory usage in half 🎉

@ondrejmirtes
Copy link
Member

Awesome! I feel good about closing this.

@github-actions
Copy link

github-actions bot commented Mar 1, 2022

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests