Skip to content

Experimental run-tests.php parallelisation (2019 rebase) #3838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

hikari-no-yume
Copy link
Contributor

@hikari-no-yume hikari-no-yume commented Feb 16, 2019

====⚡️===========================================================⚡️====
====⚡️==== WELCOME TO THE FUTURE: run-tests PARALLEL EDITION ====⚡️====
====⚡️===========================================================⚡️====

Careful rebase and slight cleanup of #2822. It had this internals thread: http://news.php.net/php.internals/100838

The short and sweet summary is that if you have an 8-core 16-thread CPU and do run-tests.php -j16 it will run… well, not quite 16x faster, but much closer to that than to the current speed. ^^

Works on Windows and *nixen. Compatibility with all run-tests.php flags not guaranteed, but it's not on by default.

@hikari-no-yume
Copy link
Contributor Author

hikari-no-yume commented Feb 16, 2019

@nikic said this on the previous PR:

From what I remember, this PR implements parallelization on the directory level and adds marker files to exclude directories from parallelization.

Basically, but the other way round. It implements parallelisation on the directory level — to be precise, tests from the same directory will by default not be run concurrently — but uses marker files to opt in a particular directory to parallelisation on the file level.

I wanted to propose an alternative approach for handling tests that can't be run together, which I think should be both relatively simple to implement and allows a smooth transition from coarse-grained (to start with) to fine-grained (once we have time to review individual tests) handling:

Add a new --CONFLICTS-- section to the phpt format, which contains a list of strings (or possibly just one string to start with). All tests which have a non-empty intersection in CONFLICTS cannot be run together (but may still be run together with other tests). For example, ext/mysqli and ext/pdo_mysql would use --CONFLICTS-- \n mysql, while ext/pgsql and ext/pdo_pgsql would use --CONFLICTS -- \n pgsql. This prevents MySQL tests from running in parallel, but they can still run in parallel with PgSQL tests.

Additionally a CONFLICTS file can be added to a whole directory, so this doesn't need to be specified per-file.

This would be possible. It would be a bit more annoying to implement, though, because the current parallelisation approach doesn't require reading any test files, it batches everything at the finding-files step.

@hikari-no-yume
Copy link
Contributor Author

There's a significant bottleneck in the current code that becomes obvious when handling file tests, where the all the worker processes get bottlenecked by the supervisor process, probably because it's waiting on one particular worker process (to write to it?). This is probably easily fixed by me being less lazy and making the code more non-blocking.

@nikic
Copy link
Member

nikic commented Feb 16, 2019

Basically, but the other way round. It implements parallelisation on the directory level — to be precise, tests from the same directory will by default not be run concurrently — but uses marker files to opt in a particular directory to parallelisation on the file level.

Programmer thinking. I assumed that !CAN_BE_PARALLELISED was your peculiar was of spelling CAN_NOT_BE_PARALLELISED :D

@bwoebi
Copy link
Member

bwoebi commented Feb 16, 2019

Same here - better use another leading symbol than something similar to a not-symbol :-D

However, very nice and thanks for the patch.

$shameList
----⚠️-----------------------------------------------------------⚠️----

NAME_AND_SHAME;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you ever heard of the heredoc indentation improvements in PHP 7.3?

@nikic
Copy link
Member

nikic commented Feb 16, 2019

Can you please add -j2 to Travis & AppVeyor to get a test / timing run?

@petk petk added the Feature label Feb 16, 2019
@hikari-no-yume
Copy link
Contributor Author

Ahaha, I understand now. I used ! because I wanted the files to appear first in sort ordering and look less like a normal file, if that makes sense. There's another problem with the ! though, it's a special character that means something (I have no idea what :D) in bash and needs escaping. I'll change it to something else.

@petk
Copy link
Member

petk commented Feb 16, 2019

To put the meta files (!CAN_BE_PARALLELISED) at the top of the directory an underscore (_CAN_BE_PARALLELISED) might be another way probably instead of special characters, yes. Or something like a configuration file is even better.

Maybe just a side note/recap (and I really don't want to stop this essential and important missing feature in progress). I find the run-tests.php file with 3000 lines too long to do proper coding on it anymore. It might be a good time to consider going into a separate repository for this script. Like once this was a plan: https://github.com/php/phpruntests
Separation of concerns is very important and results are better and more manageable code. Even unit testing on the run-tests script can be done with that... Otherwise, just an idea. There of course comes many issues as well like phar requirement, or delivery to the php-src repository and similar.

Anyhow, thank you for working on this. :)

error("'$workers' is not a valid number of workers, try e.g. -j16 for 16 workers");
}
$workers = intval($workers, 10);
$environment['SKIP_IO_CAPTURE_TESTS'] = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to skip IO capture tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual answer to this is “I wish I knew”. They behave completely incorrectly when parallelised testing is in use and for no obvious reason. I gave up trying to figure out why.

error("Could not find worker stdout in array of worker stdouts, THIS SHOULD NOT HAPPEN.");
}
while (FALSE !== ($rawMessage = fgets($workerSock))) {
// work around fgets truncating things
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checked whether this is expected. From the stream_set_blocking documentation:

If mode is FALSE, the given stream will be switched to non-blocking mode, and if TRUE, it will be switched to blocking mode. This affects calls like fgets() and fread() that read from the stream. In non-blocking mode an fgets() call will always return right away while in blocking mode it will wait for data to become available on the stream.

Which makes sense, of course a non-blocking socket can't just wait until a full line is received.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this also makes me think that you shouldn't be just looping over fgets() here. That should only be done if there's actually data available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what the stream_select is supposed to be for, I guess.

echo $resultText;
show_test($test_idx, "⚡️[" . count($workerProcs) . "/$workers concurrent test workers running]⚡️");

if (!is_array($name) && $result != 'REDIR') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does an array $name happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was something to do with redirections, but I don't remember exactly.

@carusogabriel
Copy link
Contributor

carusogabriel commented Feb 16, 2019

The short and sweet summary is that if you have an 8-core 16-thread CPU and do run-tests.php -j16 it will run

@hikari-no-yume The number we are supposed to pass to -j is the result for

nproc

in Linux and

sysctl -n hw.physicalcpu

in macOS?

@hikari-no-yume
Copy link
Contributor Author

@carusogabriel If that's what you'd pass to make, then probably. I haven't tested the optimal value and it will vary by system.

@hikari-no-yume hikari-no-yume force-pushed the parallelised-run-tests-2019 branch 2 times, most recently from 8319a3d to 24d5931 Compare February 17, 2019 18:26
@hikari-no-yume
Copy link
Contributor Author

I renamed !CAN_BE_PARALLELISED to @CAN_BE_PARALLELISED. @ doesn't mean “not” to programmers or anything special to Bash :)

@hikari-no-yume hikari-no-yume force-pushed the parallelised-run-tests-2019 branch from 24d5931 to f93556a Compare February 17, 2019 18:47
@hikari-no-yume
Copy link
Contributor Author

I added -j2 to Travis and AppVeyor. Based on the commit history from October 2017, I think I previously determined -j4 wasn't worthwhile for those.

@nikic
Copy link
Member

nikic commented Feb 17, 2019

Timings from CI runs:

Travis: 905s -> 587s
AppVeyor: 1696s -> 920s

There are also some test failures. On Travis it's some tests using the built-in server. My guess would be that this is due to the assumption that all directories can be run in parallel, but here likely many directories contain tests using the built-in server and some probably have overlapping ports. On AppVeyor there are also some multi-byte filesystem tests failing, not sure if that can be related to the parallelization (some kind of environment difference?)

@nikic
Copy link
Member

nikic commented Feb 18, 2019

While this functionality isn't perfect yet, and not stable enough for use in CI, it seems to be reliable for Zend tests (aka the only important tests!), so I went ahead and applied the first commit via 39792f5.

We can implement further improvements based on this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants