Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Parallelize Tests #197

Merged
merged 1 commit into from Jul 31, 2023
Merged

Conversation

rahulsom
Copy link
Contributor

This will use all available cores to run tests, and parallelize by all - suite/class/method. On machines with multiple cores, this will vastly improve test performance. These times are on my M1 MBP with 10 (8P + 2E) cores. They were reported by maven on running mvn verify.

I first ran mvn verify and ignored the time.
Then I ran it thrice without this change, and thrice with this change.

All times in seconds.

Run 1 Run 2 Run 3 Average
Before 876 873 878 876
After 300 300 301 300
Savings 576
% 66

Testing done

All existing tests pass

Submitter checklist

Edit tasklist title
Beta Give feedback Tasklist Submitter checklist, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
    Options
  2. Ensure that the pull request title represents the desired changelog entry
    Options
  3. Please describe what you did
    Options
  4. Link to relevant issues in GitHub or Jira
    Options
  5. Link to relevant pull requests, esp. upstream and downstream changes
    Options
  6. Ensure you have provided tests - that demonstrates feature works or fixes the issue
    Options

This will use all available cores to run tests, and parallelize by all - suite/class/method.
On machines with multiple cores, this will vastly improve test performance.
These times are on my M1 MBP with 10 (8P + 2E) cores. They were reported by maven on running `mvn verify`.

I first ran `mvn verify` and ignored the time.
Then I ran it thrice without this change, and thrice with this change.

All times in seconds.

|        | Run 1 | Run 2 | Run 3 | Average |
| ------ | ----: | ----: | ----: | ------: |
| Before |   876 |   873 |   878 |     876 |
| After  |   300 |   300 |   301 |     300 |
| Savings|       |       |       |     576 |
| %      |       |       |       |      66 |
@rahulsom rahulsom requested a review from a team as a code owner July 17, 2023 14:55
@allancth
Copy link
Contributor

allancth commented Jul 18, 2023

I would love to merge this PR but the tests are failing when building in my Quad cores notebook laptop. Not sure why but I am looking into it but I am quite sure it's not introduced in this PR. If other maintainers has no issue with this, please help merge or let me know I can merge it. Thanks.

@MarkEWaite
Copy link
Contributor

Thanks for checking @allancth. One of the benefits of parallel tests it that they sometimes expose undetected environment assumptions or execution order dependencies.

I don't see test failures for the pull request on any of the machines where I tested the pull request, including:

  • 4 core Intel i5-2400 running Debian Linux testing with 32 GB RAM
  • 4 core Intel i5-6400 running Ubuntu Linux 22.04 with 32 GB RAM
  • 12 core AMD Ryzen 5 5600X running Red Hat Enterprise Linux 8 with 32 GB RAM
  • 8 core Intel Atom C2750 running FreeBSD 13.1 with 32 GB RAM

Do the tests fail consistently for you or intermittently? Anything particular about the machine that is different from the configurations that I was running?

@allancth
Copy link
Contributor

allancth commented Jul 18, 2023

Thanks @MarkEWaite for the response.

I am still running the tests, making sure I am checking out the correct repo, tinkering with this and that. The laptop that is running the tests has a Quad core i7 Ivy Bridge chipset running Ubuntu 22.04 + 16GB RAM.

One thing that I think might have an effect on this PR is that I have disabled the HyperThreading SpeedStep feature in BIOS so that I can cap the CPU to run at 1.2GHz to prevent over heating, which happens easily. Having said that, as I am responding and running the tests in the background, I don't think this is the cause.

The number of tests that is failing is inconsistent, but so do the commands I am running. I think I will start the investigation of the warning "The POM for org.jenkins-ci.tools:maven-hpi-plugin:jar:3.47 is missing, no dependency information available" when I did a mvn verify or a mvn clean test. Maybe if you know the reason or if this warning could be the cause of the failing tests?

CORRECTION: Not HyperThreading but Intel's SpeedStep technology.

@allancth
Copy link
Contributor

allancth commented Jul 18, 2023

@MarkEWaite I've enabled SpeedStep in BIOS and reran the test. In one instance, there are 2 failures when running mvn verify. Next, I reran the test with for the particular failed test case and the test passed.

I think I can conclude the speed of the CPU as in my case here, affects the outcome of the test. It takes about ~30 mins (last test took ~26 mins) for each mvn verify. I'm not sure what else I can do.

Strange thing is even after I've commented out the change (surefire-plugin), the test still fail. I need to test more though.

EDIT: I have deleted the .m2 repo for jenkinsci and jenkins-ci and rebuilding the module. This might take a while. Do not let me stop this PR from being merged since it has no issue on other tested machines. It is very likely the problem is all at my end here.

@MarkEWaite
Copy link
Contributor

I think that it would be unwise for this to be merged when tests are failing for you as the maintainer. The tests are most valuable for you as the maintainer. Others gain some small benefit from the existence of the tests, but you need them to be confident that any changes you make or any changes that you accept are in good condition.

Could you share the names of the tests that are failing and the test failure messages in case others can investigate? It may be possible to better understand the failure modes by reading the source code. The Jenkins test harness was recently enhanced to fail tests if they have not correctly ended jobs at the end of the test. That type of failure can depend on processor performance, memory performance, and disc speed.

I'll be out of the office until the end of the week, so I won't be able to provide more help until I return, but others may be able to explore further.

@allancth
Copy link
Contributor

allancth commented Jul 19, 2023

The result of mvn verify, 2 tests are failing

  • TriggeredBuildSelectorTest#testUseNewest
  • TriggeredBuildSelectorTest#testUseOldestNested

2 tests have error

  • LegacyJobConfigMigrationMonitorTest#workflowJob_param_copy_legacy_migration
  • LegacyJobConfigMigrationMonitorTest#workflowJob_param_copy_legacy_production

I've uploaded the build log here. I am still running the tests to make sure I am getting consistent outcome.

UPDATE: Running mvn verify with the changes commented out returns success. Build log here. I'll run a few more tests over the next few days to verify the outcome is consistent.

@MarkEWaite
Copy link
Contributor

MarkEWaite commented Jul 19, 2023 via email

@allancth
Copy link
Contributor

I ran mvn verify another 2 times.

  • Result for second test
    • 2 failures, 1 error
    • Failures:
      • CopyArtifactTest#testTriggeredBuildSelectorToMatrix:908
      • CopyArtifactTest#testTriggeredBuildSelectorWithParentOfParent:836
    • Error:
      • LegacyJobConfigMigrationMonitorTest#workflowJob_param_copy_legacy_migration:577
  • Result of third test
    • 0 failure, 1 error
    • Error:
      • LegacyJobConfigMigrationMonitorTest#remove_succeeded:838
      • LegacyJobConfigMigrationMonitorTest#workflowJob_param_copy_legacy_migration:577

All 3 tests have the error LegacyJobConfigMigrationMonitorTest#workflowJob_param_copy_legacy_migration:577. The blanket solution I think we could have is to have the plugin move under a profile. By the default the profile is set to active and can be deactivated with mvn verify -P -parallel.

Meanwhile, I am looking into the surefire-plugin to see if it can specify specific tests or group certain tests to run in parallel, but I think a more thorough understanding of the test structure is required.

@allancth
Copy link
Contributor

I think there's a workaround for my case. If the property jenkins.test.timeout is 0, jenkins-test-harness will have infinite timeout. I guess I can specify this parameter when I am building the plugin. There's nothing that needs to be changed, I think.

@allancth allancth merged commit d689167 into jenkinsci:master Jul 31, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants