Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@dataProvider tests parallelization in functional mode #154

Merged
merged 7 commits into from
Mar 29, 2015

Conversation

sormy
Copy link
Contributor

@sormy sormy commented Feb 13, 2015

  • @dataProvider support
  • --filter support (functional mode only)
  • --max-batch-size support (functional mode only)
  • print progress as vanilla phpunit
  • print progress without blank lines on windows default console
  • dump process command line for debug purposes if process do not return any results
  • one failed vanilla test: PHPUnitTest::testStopOnFailurePreventsStartingFurtherTestsAfterFailure
  • no auto tests for new features for now

  * @dataProvider support
  * --filter support (functional mode only)
  * --max-batch-size support (functional mode only)
  * print progress as vanilla phpunit
  * print progress without blank lines on windows default console
  * dump process command line for debug purposes if process do not return any results
  * one failed vanilla test: PHPUnitTest::testStopOnFailurePreventsStartingFurtherTestsAfterFailure
  * no auto tests for new features for now
@sormy
Copy link
Contributor Author

sormy commented Feb 13, 2015

Just to note, one failed test marked as skipped because it seems like problem on test side (fix test) and not in code. Failed test is related to stop-on-failure feature. Previously it stops after method execution, now it stops after batch execution which can have more then one method.

@julianseeger
Copy link
Contributor

Phew, thanks a lot. It's going to take some time to review this because this pr includes a couple of changes at once.

You added a lot of code but I don't see any new tests, which is probably a bad sign. Would be nice if you could add some ... or a lot... . Maybe some tests could help to debug it because at least the output contains a bug:

I took a testsuite that looked like this:

julian@x1:~/PhpstormProjects/Metrics$ ./vendor/bin/paratest -f

Running phpunit in 5 processes with /home/julian/PhpstormProjects/Metrics/vendor/bin/phpunit. Functional mode is on

Configuration read from /home/julian/PhpstormProjects/Metrics/phpunit.xml

.........................................

Time: 630 ms, Memory: 6.25Mb

OK (41 tests, 134 assertions)

Then I added a dataProvider with 200 data sets to one of the methods which looks like this in phpunit:

julian@x1:~/PhpstormProjects/Metrics$ ./vendor/bin/phpunit 
PHPUnit 4.7-g7c1de6a by Sebastian Bergmann and contributors.

Configuration read from /home/julian/PhpstormProjects/Metrics/phpunit.xml

Deprecated configuration setting "strict" used

...............................................................  63 / 240 ( 26%)
............................................................... 126 / 240 ( 52%)
............................................................... 189 / 240 ( 78%)
...................................................

Time: 465 ms, Memory: 7.25Mb

OK (240 tests, 333 assertions)

And the current brianium/paratest master looks like this:

julian@x1:~/PhpstormProjects/Metrics$ ./vendor/bin/paratest -f

Running phpunit in 5 processes with /home/julian/PhpstormProjects/Metrics/vendor/bin/phpunit. Functional mode is on

Configuration read from /home/julian/PhpstormProjects/Metrics/phpunit.xml

................................................................. 65 ( 27%)
................................................................. 130 ( 54%)
................................................................. 195 ( 81%)
.............................................

Time: 1.39 seconds, Memory: 7.00Mb

OK (240 tests, 333 assertions)

Not exactly the same but consistent. (it looks the same if you omit the -f).

Then I updated to sormy/paratest and it looked like this:

julian@x1:~/PhpstormProjects/Metrics$ ./vendor/bin/paratest -f

Running phpunit in 5 processes with /home/julian/PhpstormProjects/Metrics/vendor/bin/phpunit. Functional mode is on

Configuration read from /home/julian/PhpstormProjects/Metrics/phpunit.xml

...............................................................  63 / 240 ( 26%)
............................................................... 126 / 240 ( 52%)
............................................................... 189 / 393 ( 48%)
............................................................... 252 / 393 ( 64%)
............................................................... 315 / 393 ( 80%)
............................................................... 378 / 393 ( 96%)
...............

Time: 964 ms, Memory: 6.75Mb

OK (393 tests, 486 assertions)

There are 240 tests but it 393 tests are displayed. The total test count increases, too. When I omit -f, then it looks exactly like phpunit.

I'm not yet sure what the max-batch-size does, but it has an influence on the output:

julian@x1:~/PhpstormProjects/Metrics$ ./vendor/bin/paratest -f --max-batch-size=1

Running phpunit in 5 processes with /home/julian/PhpstormProjects/Metrics/vendor/bin/phpunit. Functional mode is on

Configuration read from /home/julian/PhpstormProjects/Metrics/phpunit.xml

...............................................................  63 / 350 ( 18%)
............................................................... 126 / 360 ( 35%)
............................................................... 189 / 420 ( 45%)
............................................................... 252 / 480 ( 52%)
............................................................... 315 / 530 ( 59%)
............................................................... 378 / 530 ( 71%)
............................................................... 441 / 530 ( 83%)
............................................................... 504 / 530 ( 95%)
...........................

Time: 8.91 seconds, Memory: 12.00Mb

OK (531 tests, 625 assertions)
julian@x1:~/PhpstormProjects/Metrics$ ./vendor/bin/paratest -f --max-batch-size=5000

Running phpunit in 5 processes with /home/julian/PhpstormProjects/Metrics/vendor/bin/phpunit. Functional mode is on

Configuration read from /home/julian/PhpstormProjects/Metrics/phpunit.xml

...............................................................  63 / 240 ( 26%)
............................................................... 126 / 240 ( 52%)
............................................................... 189 / 240 ( 78%)
...................................................

Time: 765 ms, Memory: 6.50Mb

OK (240 tests, 333 assertions)

Would be nice if you could extend the readme somehow to make the max-batch-size option more clear. Does it only apply to dataProviders?

Edit: and with some tests, you could start refactoring some overextended methods which have become pretty hard to read like SuiteLoader::getMethodBatches . Some method extractions could point out your intent.

@sormy
Copy link
Contributor Author

sormy commented Feb 13, 2015

Thanks, it's just initial attempt to get feature work. We have suite which test DICOM stack with over 4000 tests which works for me exactly as phpunit and speedup is around 2x (18 min vs 38 min) on my macbook 2013.

If you can write fail test for your example it will be perfect!

Big max-batch-size can broke phpunit, because command line became too huge or regular expression became too hard to be parsed without changing php's pcre settings.

@sormy
Copy link
Contributor Author

sormy commented Feb 14, 2015

Julian, it seems like new code have problems with dependent tests. Could you try run on your end something without dependent tests?

@julianseeger
Copy link
Contributor

If you mean without an @depends annotation: I never use it. The original tests are from https://github.com/julianseeger/Metrics and there are no tests that depend on any other test.

@sormy
Copy link
Contributor Author

sormy commented Feb 18, 2015

Ok, I found problem and fixed it. Also I added tests. Need your feedback before refactor code.
max-batch-size affects on amount of atomic tests which will run in one isolated process - 1 atomic test is 1 test method without dataProvider or 1 dataset time for 1 method. Even if there are no datasets it will group different test() methods in batches - it also give boost on big amount of small tests because paratest will spent less time on spawning phpunit. I will add that note If you like that explanation.

@sormy
Copy link
Contributor Author

sormy commented Feb 18, 2015

Also, I don't know what scrutinizer means: 1 failure condition was met by this inspection: issues.new.exists

@sormy
Copy link
Contributor Author

sormy commented Feb 18, 2015

In my test scenario:

Artems-MBP:emsow sormy$ phpunit --filter Dicom
PHPUnit 4.5.0 by Sebastian Bergmann and contributors.

Configuration read from /Users/sormy/htdocs/project/phpunit.xml

.............................................................   61 / 5013 (  1%)
.............................................................  122 / 5013 (  2%)
.............................I...............................  183 / 5013 (  3%)
<cut>
............................................................. 4941 / 5013 ( 98%)
............................................................. 5002 / 5013 ( 99%)
..........I

Time: 41.03 minutes, Memory: 241.25Mb

OK, but incomplete, skipped, or risky tests!
Tests: 5013, Assertions: 83719, Incomplete: 2.
Artems-MBP:emsow sormy$ ../paratest/bin/paratest --filter Dicom

Running phpunit in 5 processes with /Users/sormy/htdocs/paratest/vendor/bin/phpunit. Functional mode is on

Configuration read from /Users/sormy/htdocs/project/phpunit.xml

.............................................................   61 / 5013 (  1%)
.............................................................  122 / 5013 (  2%)
<cut>
............................................................. 1098 / 5013 ( 21%)
............................................................. 1159 / 5013 ( 23%)
............................................................. 1220 / 5012 ( 24%)
............................................................. 1281 / 5012 ( 25%)
<cut>
............................................................. 4819 / 5012 ( 96%)
............................................................. 4880 / 5012 ( 97%)
............................................................. 4941 / 5011 ( 98%)
............................................................. 5002 / 5011 ( 99%)
.........

Time: 17.38 minutes, Memory: 20.25Mb

OK (5011 tests, 83719 assertions)

Difference related to 2 incomplete tests and speedup is 2.35x on mobile core i7 with 2 cores

@julianseeger
Copy link
Contributor

About scrutinizer: take a look here https://scrutinizer-ci.com/g/brianium/paratest/inspections/91f9f970-f009-404d-a5c6-8114eb97fb4a and you see that the code quality of a couple of classes has decreased (but that didn't cause the failure) and that you introduced 82 new issues. An issue is something like a PSR violation. And PSR2 is the coding standard we stick to (https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-2-coding-style-guide.md PSR2 implicitly includes PSR1, which includes PSR0).

Unfortunately, these new issues doesn't actually seem to be introduced by you. Scrutinizer shows

Analysis Tools Updated
Some analysis tools were updated to newer versions. This might result in changes which are unrelated to your code changes!
PHP Analyzer: 1.6.0 -> 1.6.1 (changelog)

and I can't find any issue that is ~11h old. The PHPAnalyzer Changelog has nothing important in it but I guess that scrutinizer marks every issue as new because of the new analyzer version. So I will merge this PR even if scrutinizer screams about these new issues. The classes that got worse can be beautified with a small refactoring which you already mentioned.

I'm sorry to say that you have to wait until tomorrow for feedback from me on your new code because I am handicapped today. But is looks good. Meanwhile you could take a look at the testStopOnFailurePreventsStartingFurtherTestsAfterFailure. Maybe it is enough to add a max-batch-size of 1 or make 1 the default for maximum BC. Whether or not the test still makes sense, it represents a feature that we don't want to loose. If one test has a failure and the --stop-on-failure flag is used, no new tests should be scheduled. With -p 1 and a max-batch-size of 1 that means not a single test should run after the failing one. And every feature we don't want to loose needs a test ;)

@sormy
Copy link
Contributor Author

sormy commented Feb 18, 2015

No problem, will wait for full feedback.

$this->path = $suitePath;
$this->name = $name;
$this->path = $testPath;
$this->filters = (array)$filters;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this cast to array for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for compatibility with other code (tests), which can pass string (one filter) instead of array.

@julianseeger
Copy link
Contributor

note on scrutinizer:
On #157 , scrutinizer was disabled temporarily, so further commits should pass on scrutinizer and give more senseful feedback.

feedback:
Overall, it looks fine with the tests. Maybe you should consider writing more test on unit level, but this is easier with tdd then when writing them afterwards and especially hard on this project. But it is generally worth it because tests on unit level are more stable.

stuff that should be fixed:

  • re-enable the skipped tests in the PHPUnitTest.php
  • remove the total count from the ResultPrinter.php (not possible without reported skipped/incompletes)
  • and of course refactoring is needed. Especially on SuiteLoader::getMethodBatches
  • ...and the docblocks at the TestMethod.php

stuff to be discussed:

  • probably the default for max-batch-size should be 1. This is the closest to the current behavior.
    It is not even guaranteed that a batch-size>1 is faster because the slow tests are not distributed equally between the batches, so a big batch full of slow tests may be slower than a couple of small, slow batches.

Again, thanks for your effort! This a much appreciated and often requested feature. But this process is neccessary to keep some quality standard for the project. And if you think what I'm saying is bs, I'm always to constructive feedback ;)

* Functional mode is on => Functional mode is ON.
* refactoring to simplify code
* PSR-related fixes
* docblock comments fixes
* method visibility fixes
* implemented printing skipped and incomplete tests as skipped on console
  when group and group-exclude options are not passed to paratest
* dataset-related optimizations by default disabled and can be activated with max-batch-size set,
  so this version of paratest works the same as vanilla paratest
* max batch size is 0 by default
* enabled auto test: PHPUnitTest::testStopOnFailurePreventsStartingFurtherTestsAfterFailure()
* TestMethod::prepareOptions() fix to get the same behaviour as vanilla paratest when max-batch-size set to 0
* functional mode with working group and group-exclude filters
* added assert then process command line length acceptable by operating system, only for windows for now
* readme fixes to decribe max-batch-size option
@sormy
Copy link
Contributor Author

sormy commented Feb 20, 2015

probably the default for max-batch-size should be 1.

default max-batch-size set to 0 and paratest works in that case the same as before my modifications

re-enable the skipped tests in the PHPUnitTest.php

done

remove the total count from the ResultPrinter.php (not possible without reported skipped/incompletes)

keep, implemented skipped/incomplete reporting as skipped for situations where group/group-exclude options are not passed to paratest

and of course refactoring is needed. Especially on SuiteLoader::getMethodBatches

did some work to make code better

...and the docblocks at the TestMethod.php

done

Overall, it looks fine with the tests. Maybe you should consider writing more test on unit level

only functional tests at least for now

waiting for your feedback

@sormy
Copy link
Contributor Author

sormy commented Feb 20, 2015

new code base produce the following result in my test scenario, you can compare with my previous comment.
You can see that incomplete tests reported as skipped and total test count is constant during test execution.

Artems-MBP:project sormy$ ../paratest/bin/paratest --filter Dicom --max-batch-size 100

Running phpunit in 5 processes with /Users/sormy/htdocs/paratest/vendor/bin/phpunit. Functional mode is ON.

Configuration read from /Users/sormy/htdocs/project/phpunit.xml

.............................................................   61 / 5013 (  1%)
.............................................................  122 / 5013 (  2%)
<skip>
............................................................. 2074 / 5013 ( 41%)
............................................................. 2135 / 5013 ( 42%)
......................S...................................... 2196 / 5013 ( 43%)
............................................................. 2257 / 5013 ( 45%)
<skip>
............................................................. 4697 / 5013 ( 93%)
............................................................. 4758 / 5013 ( 94%)
....................................S........................ 4819 / 5013 ( 96%)
............................................................. 4880 / 5013 ( 97%)
............................................................. 4941 / 5013 ( 98%)
............................................................. 5002 / 5013 ( 99%)
...........

Time: 18.54 minutes, Memory: 19.75Mb

OK, but incomplete, skipped, or risky tests!
Tests: 5013, Assertions: 83719, Incomplete: 2.

@julianseeger
Copy link
Contributor

To wrap this up here, I understand if you don't have time to close the last gaps. If you leave it at this state, I will of course merge it anyways but would fix the last "comments" myself before releasing it.
Thanks again for this massive +777-141

@sormy
Copy link
Contributor Author

sormy commented Feb 21, 2015

I don't know project code rules, so I'm coding based on my experience.
If you think that it's better to remove refactoring todo notes, it's ok - I will remove them. It's not a big question.
Just would like to hear if I can do something else to make my part of work better without going deep into sources and rewriting some strange parts.
My current list:

  • remove todos
  • replace @ to isset
  • unit tests?
  • more functional tests? is there any other functional tests i can code to test?

Let's merge that when you will have no any unresolved questions regarding my code.
Also I understand that code review requires time too, so It's your time too.

@julianseeger
Copy link
Contributor

I'm just trying to keep the contribution experience as enjoyable as possible while keeping an eye on the quality, which is - to some extend - subjective. About the todos: most pull-requests at paratest are about features people want/need or changes they have to make in order to do subsequent pull-requests (which then are related to their features). Thats probably normal for an open source project but as a result, not many people will address any todo they stumble upon. As a result, if it is possible to create a consistent solution with reasonable effort, I would always prefer such a solution over a todo. If it is not possible with reasonable effort, then let's leave the todo where it is. It always leaves the question "who is going to address it?". (With consistent I mean: because group and group-exclude are commented out, they need the @ suppression, which others don't need).

It's not that TODOs are always a bad thing. The todo for the maximum command size for example is helpful because it gives an hint on how to implement this (echo | xargs --show-limits etc). But if you take for example TODOs in this style:
//TODO some tests didn't work without/without the following line
this should be true for every line in well tested code. Take well tested code, remove any sign and either some tests are failing or the code is not tested very well. So it seems pretty redundant to me. Maybe it would make sense to collapse all TODOs concerning group and group-exclude to a single todo and add it to the docblock of the defaults() method (or something like that).

So for your list:

  • remove/collapse todos (that don't contain more information than "tests are failing if/if not")
  • replace @ to isset
  • functional tests: I think you covered all important paths on functional level. If you can think of any others, feel free to add them but, I don't think there is a need for more
  • unit tests: looks like you covered all of your code. hard to tell because the coverage-reports seem to not recognize the code covered by subprocesses (spawned by the functional tests) but I don't think it adds much value to add additional unit tests for already logically covered code. If you want, you can take a look at https://scrutinizer-ci.com/g/brianium/paratest/inspections/60bbd396-9902-4816-96cf-e67c9f3532ce/code-structure/test-coverage which shows how the coverage changes with your diff. Like I said, it doesn't work very well with the functional tests but it says much about the unit tests.

PS: about the max-batch-size: I'm not sure if --max-batch-size=0 is the optimal default value from a performance point of view. Probably it's not but currently it's for the sake of BC. We are driving towards a 1.0.0 release and thats the point where we could do a complete makeover of the default values. But that has to be based on real data (like a testrun for every bug php-project on github with differen parameter sets) and not only a feeling that it may be more performant in some cases.

@sormy
Copy link
Contributor Author

sormy commented Feb 22, 2015

julianseeger, you are right regarding useless comments, I need either make them more addressed to problem or remove them if everything works fine without them.

@sormy
Copy link
Contributor Author

sormy commented Feb 24, 2015

I would like to find a time to try to fix group, group-exclude and skipped/incomplete print support or leave to in right place to make this changeset complete

@julianseeger
Copy link
Contributor

Any progress here?

@sormy
Copy link
Contributor Author

sormy commented Mar 18, 2015

Sorry Julian. I think I will have time on Saturday to complete everything and prepare for merge.

* README usage section fixes
* Fixes for inline docs based on Code Review
* Options refactoring to do not use @
* --filter option in non functional mode will raise exception
* incomple/skipped functional tests for default test mode
* --group/--exclude-group fixes
* code refactoring to make code less complex
* removed unused function arguments
@sormy
Copy link
Contributor Author

sormy commented Mar 27, 2015

Ok, Julian, I think, I'm done =) Could you please check if everything is ok.
PS: Travis lies =)

@julianseeger
Copy link
Contributor

I didn't go through all >1k loc of the change but it looked good before and the open points are addressed.
Thanks :)

Oh and there will be a separate issue about the PHP5.3 support. Thanks for addressing the compatibility issues, I hope we can get rid of that soon.

julianseeger added a commit that referenced this pull request Mar 29, 2015
@dataProvider tests parallelization in functional mode
@julianseeger julianseeger merged commit b107f32 into paratestphp:master Mar 29, 2015
@chbiel
Copy link
Contributor

chbiel commented Apr 13, 2015

@sormy @julianseeger does this also address the issue (without -f option), that classes with dataproviders does not output correct % numbers of passed tests?

at the moment I see my test repeatly come from 79% to 99% and back to 79% when using @dataProvider

If not I will try to debug this on my side but feedback at first would be nice :)

@sormy
Copy link
Contributor Author

sormy commented Apr 13, 2015

In non-functional mode it will work the same as before. Actually, I think that if we add a different types of parallelization (per test method, per test case, per test file, per test suite) for functional mode then we can completely remove non-functional mode and get good percentage during test execution. Paratest in non-functional mode do not try to look into tests and estimate amount of tests and it's the reason why it works like it works.

@julianseeger
Copy link
Contributor

@chbiel yes. With functional mode and max-batch-size > 0, this feature is activated and progress is tracked for methods*dataproviders. So there should be no incorrect progress because of dataProviders.

@sormy In most cases, per test file and per test case should be the same and thats what non-functional mode does. per test method is functional-mode + max-batch-size=0. per dataProvider is functional + max-batch-size > 0. And per suite does not exist because it is so high level, that you can probably achieve this more easily by using per testcase concurrency or just use your build-tool to execute multiple commands concurrently.

Maybe we should always use this new feature to get always correct progress and split parsing / execution.

Your input is welcome on what the default should be and what options/combinations should be offered to make the use easier (and make the "optimizing speed" section of the readme intuitive). The current state is based on backwards compatibility. By default, nothing should have changed in the last releases. But when optimizing, you can turn of -f and --batch-max-size and look what does the job best. From my experience, you have like a 50% chance that data-provider-concurrency inceases or significally decreases performance (depends on thinks like bootstrapping and array-caches) and about a 80% chance to have a better performance in functional mode over non-functional. So there are no best options for everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants