Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce --threads=max option value to automatically detect the number of CPU cores #1723

Merged
merged 3 commits into from Sep 4, 2022

Conversation

maks-rafalko
Copy link
Member

@maks-rafalko maks-rafalko commented Sep 3, 2022

Implements #1722

Doc PR: infection/site#240

@maks-rafalko maks-rafalko linked an issue Sep 3, 2022 that may be closed by this pull request
Copy link
Member

@sanmai sanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few comments, mostly nits. The only real deal is LogicException at the end of provide().

@sidz
Copy link
Member

sidz commented Sep 3, 2022

My 2 cents: it is not Infection responsibility. We just make our codebase more complicated. -j$(nproc) or -j$(sysctl -n hw.ncpu) works fine. End user can use Make or composer scripts if he don't want to type --threads=

@maks-rafalko
Copy link
Member Author

Addressed your comments, initial version was blindly copied from Psalm.

My 2 cents: it is not Infection responsibility. We just make our codebase more complicated.

Well, yes and no. IMO it's all about "those tiny things for DX". You set --threads=max and

  • don't care about any potential cases where $(nproc) might not work because of incorrect escaping (in Jenkinsfile you will have to prepend it like sh 'infection ... --threads=\$(nproc)'`
  • don't care and just let anyone copy this command line from Linux to MacOS machines and vice-versa

Yes it's completely not critical and both points can be ignored, but it's a little thing that might be more convenient for someone (which is why this feature request was born from our end user, not me :) ).

Copy link
Member

@sanmai sanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a very dangerous option, yet if people want this I guess that's what we have to add 🤔

@maks-rafalko maks-rafalko merged commit 60361a7 into master Sep 4, 2022
@maks-rafalko maks-rafalko deleted the feature/threads-max branch September 4, 2022 13:36
@dkarlovi
Copy link

dkarlovi commented Sep 4, 2022

@sanmai what makes it dangerous?

@sanmai
Copy link
Member

sanmai commented Sep 4, 2022

  • Infection does surprising things with the code. For one, it creates infinite loops. And could include something consuming memory. It could make you machine unresponsive for prolonged periods of time and generally lead to bad experiences, especially if that's the first time you're using Infection.
  • Running Infection with more threads not necessary leads to better performance. Unlike, say, pure CPU-bound tasks such as static analysis. Thus this option could give a false promise of fast execution where in fact it could work in the opposite way, or not work at all.

I would seriously object enabling this option by default, or recommending it to new users.

@dkarlovi
Copy link

dkarlovi commented Sep 5, 2022

That means the threads upper limit should be restricted / validated regardless of how you specify it, which further proves the idea Infection should know the upper limit.

For example, if max=12, what difference is it if I say --threads=12 or --threads=max, from the user POV this is totally equivalent. If running it like that is "dangerous", infection should disallow or warn the user about it.

Adding --threads=max seems unrelated (with using max threads being "dangerous") to me.

@sanmai
Copy link
Member

sanmai commented Sep 5, 2022

Infection can't know the upper limit. Say, if your test suite is predominantly IO-bound, then it makes sense to run Infection with more threads than you have CPU cores.

Do you think there's a sure way what to do in this situation? How can we know if a test suite is CPU-bound or IO-bound to get the best results?

User should be responsible for setting the thread number, and if we're setting the number for the user we're setting them up for potentially unpleasant consequences.

@dkarlovi
Copy link

dkarlovi commented Sep 5, 2022

This is knowledge you have, the user doesn't. It sounds like Infection could have named threads calculator strategies which the user can try out and use some which make sense.

For example, conservative, per-cpu, io-bound, etc.

@dkarlovi
Copy link

dkarlovi commented Sep 5, 2022

@maks-rafalko what do you think here?

@sanmai
Copy link
Member

sanmai commented Sep 5, 2022

threads calculator strategies

How should we approach this idea, given Infection is a PHP app?

@dkarlovi
Copy link

dkarlovi commented Sep 5, 2022

@sanmai not sure what you mean here?

Since now infection knows the number of CPUs available, I can imagine you could do something like

  • --threads=conservative = number of CPUs * 0.8
  • --threads=per-cpu = number of CPUs, which is current max
  • --threads=io-bound = number of CPUs * 1.5

All names and numbers are illustrational, of course.

maks-rafalko added a commit to infection/site that referenced this pull request Sep 8, 2022
/**
* @internal
*/
final class CpuCoresCountProvider
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of CpuInfo::getCount() as a name instead?

*/
public static function provide(): int
{
if (defined('PHP_WINDOWS_VERSION_MAJOR')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth caching the result? Although I don't expect it to be called twice, you also don't expect the system to change the number of CPU cores available within the same process and it's only a int value

final class CpuCoresCountProvider
{
/**
* Copied and adapter from Psalm project: https://github.com/vimeo/psalm/blob/4.x/src/Psalm/Internal/Analyzer/ProjectAnalyzer.php#L1454
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance Psalm is willing to extract it to a dedicated package? Otherwise provided they would be interested in a package, I'd be happy to provide one

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be interesting as some sort of tiny "system information" HAL package which could be reused across the ecosystem indeed. 👍

@theofidry
Copy link
Member

Sorry for the late review, great addition! Happy to do the changes myself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support --threads=max
5 participants