-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradle and CI improvements #2156
Conversation
build.gradle.kts
Outdated
testRuntimeOnly(files("../build.properties")) | ||
testRuntimeOnly(files("../ssltest.properties")) | ||
if (file("../build.local.properties").exists()) { | ||
testRuntimeOnly(files("../build.local.properties")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use inputs.file(...) API: https://docs.gradle.org/current/userguide/more_about_tasks.html#sec:task_input_output_runtime_api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example? I'm not familiar with those APIs.
build.gradle.kts
Outdated
@@ -188,6 +188,14 @@ allprojects { | |||
if (file("../build.local.properties").exists()) { | |||
testRuntimeOnly(files("../build.local.properties")) | |||
} | |||
// CI platform can populate these files to impact the respective task cache key. | |||
// For example, adding the PG server information to ensure tests are not considered cached across different versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use regular input.property('pg.version', ...) instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does that get populated and what value would that have? Is that supposed to be the PG_VERSION
matrix value? If so then it does not work because the value would only be the major versions, e.g. "10". It would not reflect patch version changes as new versions are released and the major version Docker tag gets repointed to newer releases.
@sehrope , please clarify what you are trying to achieve here.
Do you have evidence of such a case? pgjdbc/pgjdbc/build.gradle.kts Lines 72 to 74 in 02cc5ba
|
Two objectives, both related to testing. The first was to ensure that properties file changes are reflected in test runs. The second was to ensure that environment changes (e.g. the server changing from 10.0.20 to 10.0.21) triggered rerunning tests.
Right now if you run build the project and then run same test twice, gradle will cache the output unless you explicitly clean the test outputs. First run:
Change some properties:
Second run of same test is still cached:
I had thought the same is happening in CI based on the run times for the scheduled actions but now I'm thinking there's a different issue. GitHub does not set the default values for inputs when doing scheduled cron of actions runs so GRADLE_ARGS is empty and it's just not doing anything during that step. I'm going to fix that in a different branch then come back to this again after. |
That is valid, and we should fix that. However, adding the file to the classpath is the wrong approach to tackle the issue. |
What other API or approach should it be using? This one seemed to work well as it allows gradle to continue to cache all the test class compile operations but still consider the runs stale if a properties file changes. |
For instance: https://docs.gradle.org/current/userguide/more_about_tasks.html#sec:runtime_api_for_adhoc
Well, if you put a properties file on the classpath, then I would expect that there's some code that really fetches the resource from the classpath and use it somehow. However, if you do only for staleness check, then it is using the wrong tool for the job. For instance, if you add files to the classpath, then the files could be silently included into the If you use |
cd225e3
to
2e9203f
Compare
Updated to move the properties files checks into task inputs. I removed the CI specific DB version info file as I want to see how this all plays out before adding that back in. If it's required we can add it in a similar way. While testing this I cleaned up the naming for the omni jobs so each one better reflects the slice of the environment it's supposed to be testing. I don't think it'll show up in the repo's action until this is actually merged so see here for an example: https://github.com/sehrope/pgjdbc/runs/2612337762?check_suite_focus=true That clean up also lead to figuring out a way to get the replication tests to pass that partly involves running first by themselves. With no other tests writing WAL to the test DB yet they consistently succeed. It feels clumsy compared to running the entire suite, but it's better than it consistently failing due to that that buffer issue hanging the tests. |
What is the purpose of splitting the test jobs into three? It somewhat defeats Gradle build scan which aggregates test outcomes from a single execution, and if you split the tests, then you need to aggregate test results. |
To force the replication tests to go first and the slow tests to go last. Without overriding the dispatch inputs, the majority of the matrix jobs only run the main set of tests. Right now there's no scan enabled for these builds. Is there a way to run everything but enforce a specific order for the tests? |
Glad you asked :) JUnit5 has |
An alternative option is to create several Gradle tasks, however, I am not sure they would play well with the build scan. |
Oh that looks like it'd be perfect. I'll try it out. |
Is I guess it was there to make the tests faster. Does it make a difference? Can we move WAL logs to tmpfs/ramfs or something like that? |
Yes it was causing issues as the tests expect all the WAL to be flushed prior to starting. I was trying out some ideas of forcing the replication connections to enable it and force a sync, but that still had issues. There's enough other randomness causing those replication tests to fail that I decided to simplify it and deal it it later. Plus we already have fsync disabled too. |
Dang. Looks like class ordering is only in 5.8 snapshot. The method level annotation is available in 5.6 but we'd have to add it to every replication method individually. |
Actually maybe we can use a custom MethodOrderer that does our own sorting. |
Does the ordering exist in 5.8.0-M1 ? |
Just wondering: why do you want to impose ordering on WAL tests? I agree it might be worth ordering slow tests last (to get faster feedback for developers) or first (to get faster CI build times) However, I do not understand why do you want to order replication tests. |
Yes looks like it's there in the snapshot build: https://junit.org/junit5/docs/snapshot/api/org.junit.jupiter.api/org/junit/jupiter/api/ClassOrderer.html I want to order them because if they run first (or alone) they'll actually execute successfully. Right now those tests always fail and we just mark them as ignored. Try spinning up a clean test DB and running just the replication tests. They'll all pass. But if you run the full suite something it fails. My theory is that it's something to do with existing WAL prior to the tests starting. While it'd be great to either have that bug fixed server side or fix the test itself, it's relatively low hanging fruit to just run those tests first so we can see they succeed in CI. |
It is really great you dig that, however, it would be cool if you documented
Frankly speaking, it is hard to tell if your commit is just a debug only or if you intend to merge it. If the commit is one-time for debugging only, then it does not require review. However, if that is the case, you could probably do that in your own fork to avoid raising unexpected reviews. You could add "WIP, do not review" label, etc. On the other hand, if you indeed intend merging the commit, then please add the relevant explanations on why you do something. For instance, you mention that you split the tests in three, however, there was no explanation on why do you do that. You probably spent at least a day, and it would be sad for others to spend another couple of days later trying to figure out the intention. |
My mistake on that. I think the commit messages describing splitting out the replication tests got removed when I was squashing things. As a general rule I try to keep commits as isolated as possible to simplify review and I only force push to the PR branch when something is ready for wider review. I really appreciate your feedback on all of this. It's been very helpful! My goal with all these CI related items has been to get us back to an all-green CI system that actually runs all the tests, preferably in a timely manner. I think we're almost there too. I'm going to try out that ordering you mentioned in a separate branch and I'll pull it all in when it's ready for review. If it doesn't seem like it's going to pan out then I'll probably keep the broken tests marked experimental so we get the rest of the nice renaming etc. Then on to Travis... |
Ok, great we are on the same page. I just saw a well-prepared list of commits that is almost ready for merge, so my fear was "sehrope is going to merge that" :)
Just in case, if the replication tests fail because they run concurrently with something else, then the resolution is to add something like |
Adds backs enabling synchronous_commit by default to the test postgres server container as having it disabled causes some replication tests to fail due to unflushed WAL.
Adds .properties files to gradle test tasks as file inputs so that changes to any one of those files will invalidate cached test runs. Previously gradle did not consider a property file change so adjusting build properties would not actually re-run tests without explicitly cleaning cached test output.
2e9203f
to
bba6c86
Compare
This latest version should have all the recent feedback incorporated into it. The commit messages go into more detail but here's the high level:
|
Naming is way better now, thank you. I think we could drop "Other JDK -" from the label (I guess it adds no value). |
I thought you might like them :). Yes that's a good idea with the JDK name. I'm going to add it and the actual PG version as the suffix across the board. Anything with a more specific name will be up front., e.g. "Adopt Hotspot 8 x PG 13 " or "Slow Tests - Zulu 11 x PG 13" |
…dalone Improves names of omni actions jobs and present them in a sorted manner so that all the experimental jobs appear last. Also changes the job execution so that a single job performs the replication test on their own. Running them combined with other tests causes them to randomly fail and this is a hopefully temporary measure to get the tests running successfully even if it is in isolation. Gradle args as a dispatch parameter have been removed. All jobs now run test suite with code coverage and enable Gradle build scans. By default slow and replication tests are not run on the majority of matrix jobs. This can be overridden with the new "Default test group" input to specify which tests they will be running: fast - Only fast tests slow - Fast and slow tests all - All tests including replication tests
Separates omni action gradle cache so that each combination of JDK distribution and version has its own cache. That way two different JDK 11 distributions do not share the same cache.
bba6c86
to
4c0a6b3
Compare
I think we got a winner: https://github.com/sehrope/pgjdbc/actions/runs/861226522 |
First commit marks two more "big" methods as slow tests.
Second teaches gradle to include
build.properties
,build.local.properties
(if it exists), andssltest.properties
as runtime dependencies for tests. Otherwise changes to those files don't actually trigger re-running tests because gradle thinks all the inputs match its cached values.Third commit teaches gradle to include the files
gradle.ci.test-runtime-only
andgradle.ci.test-implementation
as dependencies and has the CI actions populate the test-runtime file with the matrix, OS, and PostgreSQL server version information. The actual contents of the file don't matter, it just needs to include something to uniquely identify that environment.I noticed that similar to
build.properties
, changing the PostgreSQL server version didn't trigger a new test run as it's the same cache key. I thought about having it actually connect to the test DB to determine the version at runtime but figured there's going to be other external details that we may want to include so I went with this approach. Additional information can be appended to the same file.Prior to this change, if there's no code change in the repo then the daily omni action would not actually run the tests if the PG server changed, i.e. if core releases a new patch version. Now it should reflect those and any other changes to the matrix while still preserve caching across similar runs.