Skip to content

Parallel test improvements #3851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Parallel test improvements #3851

wants to merge 3 commits into from

Conversation

nikic
Copy link
Member

@nikic nikic commented Feb 18, 2019

This a) reenables IO capture tests by inheriting std streams from the main process and b) implements the --CONFLICTS-- mechanism mentioned in #2822.

Some more CONFLICTS very likely missing, but I hope that we can start using this in CI and fix issues as they come up.

cc @hikari-no-yume

@nikic nikic force-pushed the parallel-tests branch 2 times, most recently from d01e1a4 to 43b6cb9 Compare February 19, 2019 09:37
@nikic nikic changed the base branch from master to PHP-7.4 February 19, 2019 12:42
@nikic
Copy link
Member Author

nikic commented Feb 19, 2019

@weltling @cmb69 There are some Windows specific failures here that I can't figure out. The failing tests are windows_mb_path tests, with diffs like this:

========DIFF========
001+ failed to create dir 'C:\projects\php-src\ext\standard\tests\file\windows_mb_path\dir_cp1252\tschüß3'
006- string(9) "tschüß4"
007- bool(true)
007+ string(7) "tsch��4"
008+ bool(false)
========DONE========
FAIL Test mkdir/rmdir cp1252 to UTF-8 path [C:\projects\php-src\ext\standard\tests\file\windows_mb_path\test_cp1252_to_utf8_1.phpt]

In the current setup, what happens is that all the windows_mb_path tests themselves do not run in parallel (i.e. no two windows_mb_path tests will run at the same time). However other tests run in parallel to the windows_mb_path tests. Which tests run can be seen in the AppVeyor log from the [1] and [2] prefixes.

Do you have any ideas why these failures happen? Can it be that that the codepage settings bleed over across processes with a common parent or something?

run-tests.php Outdated

escape:
while ($testDirsToGo || ($testDirsInProgress > 0)) {
while ($test_files || $testsInProgress > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why test_files and not testFiles?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$test_files is an existing variable. Would have to touch a lot of code (which all seems to use underscore style) to rename it.

@hikari-no-yume
Copy link
Contributor

The --CONFLICTS-- mechanism's implementation looks pleasantly simple! ^^

@hikari-no-yume
Copy link
Contributor

If/when this is merged, please squash it.

@cmb69
Copy link
Member

cmb69 commented Feb 19, 2019

I have not been able to reproduce the Windows test failures on my machine. Will have a closer look tomorrow.

@@ -6,9 +6,9 @@ if (!extension_loaded('sockets')) {
die('SKIP sockets extension not available.');
}
$s = socket_create(AF_INET, SOCK_DGRAM, SOL_UDP);
$br = socket_bind($s, '0.0.0.0', 58381);
$br = socket_bind($s, '0.0.0.0', 58379);
Copy link
Contributor

@staabm staabm Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to declare CONFLICTS for this tests based on the used port number?
That way you dont need to check the whole codebase which tests use which port

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to adjust the test (if it's very simple, like here), as it allows it to run in parallel. Adding a port-based CONFLICT would prevent spurious failures, but reduce parallelization opportunities.

In the future we might want to switch code to use dynamic port assignment instead (this is what HHVM did in their tests), but that's a lot more effort than changing a fixed port :)

@KalleZ
Copy link
Member

KalleZ commented Feb 20, 2019

@staabm I think checking the codebase gradually is worth while as it allows better parallelization in the long run

@nikic nikic force-pushed the parallel-tests branch 2 times, most recently from 8cae72a to d87c1a3 Compare February 20, 2019 10:31
@nikic
Copy link
Member Author

nikic commented Feb 20, 2019

I've merged the conflict handling implementation in c0e15a3. Keeping this open until the Windows issue is resolved.

I'll probably experimentally enable use of -j2 on Travis as well, and will keep an eye on spurious build failures.

@nikic
Copy link
Member Author

nikic commented Feb 20, 2019

I've enabled -j2 on Travis now.

Also got a SOAP failure that I can't reproduce from that (https://travis-ci.org/php/php-src/jobs/495937131):

========DIFF========
002+ ��������0���>���*z#H�	7�����������j��mAj�Eo�ә���������[����������X��a�ɱ�����"JEQC��`h���.�}E�@�;%��_QD��������DisJ�@m'�f����t�&�kg��>C!s,��sb�	C	=����$��؞����(ZC�F��|c�j9F~��0�G'��B8�r���b'b_��
003+ 
004+ 
005+ -----------
006+ 
007+ <?xml version="1.0" encoding="UTF-8"?>
008+ <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Body><SOAP-ENV:Fault><faultcode>SOAP-ENV:Client</faultcode><faultstring>Bad Request</faultstring></SOAP-ENV:Fault></SOAP-ENV:Body></SOAP-ENV:Envelope>
002- <?xml version="1.0" encoding="ISO-8859-1"?>
003- <SOAP-ENV:Envelope
004-   SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"
005-   xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
006-   xmlns:xsd="http://www.w3.org/2001/XMLSchema"
007-   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
008-   xmlns:si="http://soapinterop.org/xsd">
009-   <SOAP-ENV:Body>
010-     <ns1:test xmlns:ns1="http://testuri.org" />
011-   </SOAP-ENV:Body>
012- </SOAP-ENV:Envelope>
013- 
014- 
015- -----------
016- 
017- <?xml version="1.0" encoding="UTF-8"?>
018- <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="http://testuri.org" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"><SOAP-ENV:Body><ns1:testResponse><return xsi:type="xsd:string">Hello World</return></ns1:testResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>
019- ok
========DONE========
FAIL SOAP Server 29-CGI: new/addfunction/handle [ext/soap/tests/server029.phpt] 

It looks like we're reading garbage from php://input, which seems pretty bad...

@pcrov
Copy link
Contributor

pcrov commented Feb 20, 2019

That garbage looks like the gzipped post from server019.phpt.

@nikic
Copy link
Member Author

nikic commented Feb 20, 2019

@pcrov Ooooh, thanks, that makes a lot of sense. "uniq"id strikes again: https://github.com/php/php-src/blob/master/run-tests.php#L1967

@nikic
Copy link
Member Author

nikic commented Feb 20, 2019

The SOAP issue should be fixed with 967fa51.

@cmb69 I am able to occasionally reproduce these locally using run-tests.php ext/standard/tests/file/windows_mb_path/ ext/standard/tests/general_functions/ -j2. The most common failure is ext\standard\tests\file\windows_mb_path\bug75063_cp1251.phpt.

@cmb69
Copy link
Member

cmb69 commented Feb 20, 2019

@nikic Sorry for the delay; still handicaped by a flu. Anyhow, it seems to me, that for CLI API, the codepage is set for the console which is inherited by all these processes. Therefore we're facing these race-conditions. I don't see a way to fix this, but to mark the respective tests, which appear to be many, as conflicting. :(

@nikic
Copy link
Member Author

nikic commented Feb 20, 2019

For the purposes of test parallelization I guess the solution will be to add an extra mode where a test conflicts with everything and thus enforce that no other tests may run in parallel with it.

This behavior seems pretty broken to me though. It means that it's impossible to safely switch PHP's internal CP, because it will also always switch the console CP as well (on CLI that is).

@nikic nikic force-pushed the parallel-tests branch 2 times, most recently from 6b339eb to 9cff793 Compare February 21, 2019 10:09
@nikic
Copy link
Member Author

nikic commented Feb 21, 2019

The all conflict is now implemented in 152e539 and I've enabled -j2 on AppVeyor in e3d502f.

Reusing this PR to check whether there's any benefit to using -j3...

@cmb69
Copy link
Member

cmb69 commented Feb 21, 2019

I agree that the inherent coupling of the internal and console CP is a limitation which we may want to abolish sometime, but for most practical purposes it appears not to be a problem. After all, it's better than having no way to influence the CP as it was in PHP 7.0. :)

Thanks for taking care of the test framework work-around, anyway!

@nikic
Copy link
Member Author

nikic commented Feb 21, 2019

Using -j3 seems to be slower than -j2 (which is reasonable, as we only have 2 cores on AppVeyor). I got quite a few failures though where the test simply has no output at all: https://ci.appveyor.com/project/php/php-src/builds/22539918/job/r10v7qsx8qqx4y1s

@nikic
Copy link
Member Author

nikic commented Feb 22, 2019

It looks like the failures with no output always occur in the THREAD_SAFE=1, OPCACHE=1, INTRINSICS=AVX build. I'm wondering if this might be related to opcache. Are there any problems with trying to instantiate multiple opcache instances at the same time on Windows?

@cmb69
Copy link
Member

cmb69 commented Feb 22, 2019

Are there any problems with trying to instantiate multiple opcache instances at the same time on Windows?

I though that a single OPcache instance would be used for all test processes. :S

@nikic
Copy link
Member Author

nikic commented Feb 22, 2019

@cmb69 At least on Linux it's going to be a separate one for each test (shm opcache is never shared on cli sapi). Is that different on Windows?

@cmb69
Copy link
Member

cmb69 commented Feb 22, 2019

At least two instances of the built-in Webserver would share a single SHM Cache.

@nikic
Copy link
Member Author

nikic commented Feb 22, 2019

@cmb69 Interesting, didn't know that opcache behaves so differently on Windows. Do you know if there's any way to avoid this? Looking at shared_alloc_win32.c probably not -- seems like the only way to get a separate cache is to switch users?

@cmb69
Copy link
Member

cmb69 commented Feb 22, 2019

It should be possible to change the temporary directory to enforce a new mmap file; can give that a try later.

PS: no, that won't work ("Unable to open base address file").

@nikic
Copy link
Member Author

nikic commented Feb 27, 2019

@cmb69 I've now tried to set a per-worker TEMP, TMP and TMPDIR. It does create separate base address files in that case. However, it still tries to use the same base address (wat?). I then tried explicitly setting opcache.mmap_base for each worker to a separate address. I see the base address change in the file, but overall this doesn't help with the test failures -- in fact there are more of them :(

@cmb69
Copy link
Member

cmb69 commented Feb 27, 2019

However, it still tries to use the same base address (wat?).

Guess that comment explains it:

/* Starting from windows Vista, heap randomization occurs which might cause our mapping base to
be taken (fail to map). So under Vista, we try to map into a hard coded predefined addresses
in high memory. */

@weltling
Copy link
Contributor

weltling commented Mar 1, 2019

Most likely this Opcache scenario will need to be covered, as we want to support the parallel tests with Opcache as well. The current expectation is, that same user would run on just one instance of the shared memory. This approach might have to be changed to support an arbitrary number of separate Opcache instances for the same user.

It is a complex topic though, which needs a careful consideration. It involves both shared memory and mutex handling. I can imagine, why is was done the way it is right now - having automatic namings has an advantage of less user errors and thus less false bug reports. Possible hard configuration mistakes can impact security and system stability. Some conversations about this happened in the past, which might be useful to read https://bugs.php.net/bug.php?id=72645.

Thanks.

@nikic
Copy link
Member Author

nikic commented Mar 4, 2019

Going to close this testing PR. The current state is that parallel testing is enabled on Travis, and the non-opcache job on AppVeyor. I'm stilling fixing occasional failures, but generally the functionality seems fairly stable :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants