Gradle and CI improvements #2156

sehrope · 2021-05-17T20:02:52Z

First commit marks two more "big" methods as slow tests.

Second teaches gradle to include build.properties, build.local.properties (if it exists), and ssltest.properties as runtime dependencies for tests. Otherwise changes to those files don't actually trigger re-running tests because gradle thinks all the inputs match its cached values.

Third commit teaches gradle to include the files gradle.ci.test-runtime-only and gradle.ci.test-implementation as dependencies and has the CI actions populate the test-runtime file with the matrix, OS, and PostgreSQL server version information. The actual contents of the file don't matter, it just needs to include something to uniquely identify that environment.

I noticed that similar to build.properties, changing the PostgreSQL server version didn't trigger a new test run as it's the same cache key. I thought about having it actually connect to the test DB to determine the version at runtime but figured there's going to be other external details that we may want to include so I went with this approach. Additional information can be appended to the same file.

Prior to this change, if there's no code change in the repo then the daily omni action would not actually run the tests if the PG server changed, i.e. if core releases a new patch version. Now it should reflect those and any other changes to the matrix while still preserve caching across similar runs.

vlsi · 2021-05-17T20:11:40Z

build.gradle.kts

+            testRuntimeOnly(files("../build.properties"))
+            testRuntimeOnly(files("../ssltest.properties"))
+            if (file("../build.local.properties").exists()) {
+                testRuntimeOnly(files("../build.local.properties"))


Please use inputs.file(...) API: https://docs.gradle.org/current/userguide/more_about_tasks.html#sec:task_input_output_runtime_api

Can you give an example? I'm not familiar with those APIs.

vlsi · 2021-05-17T20:12:35Z

build.gradle.kts

@@ -188,6 +188,14 @@ allprojects {
            if (file("../build.local.properties").exists()) {
                testRuntimeOnly(files("../build.local.properties"))
            }
+            // CI platform can populate these files to impact the respective task cache key.
+            // For example, adding the PG server information to ensure tests are not considered cached across different versions.


Please use regular input.property('pg.version', ...) instead.

Where does that get populated and what value would that have? Is that supposed to be the PG_VERSION matrix value? If so then it does not work because the value would only be the major versions, e.g. "10". It would not reflect patch version changes as new versions are released and the major version Docker tag gets repointed to newer releases.

vlsi · 2021-05-18T07:17:21Z

@sehrope , please clarify what you are trying to achieve here.

I noticed that similar to build.properties, changing the PostgreSQL server version didn't trigger a new test run as it's the same cache key

Do you have evidence of such a case?
I am quite confident that the current test task is not cacheable:

pgjdbc/pgjdbc/build.gradle.kts

Lines 72 to 74 in 02cc5ba

    
           outputs.cacheIf("test results on the database configuration, so we can't cache it") { 
        
               false 
        
           }

sehrope · 2021-05-18T10:57:45Z

@sehrope , please clarify what you are trying to achieve here.

Two objectives, both related to testing. The first was to ensure that properties file changes are reflected in test runs. The second was to ensure that environment changes (e.g. the server changing from 10.0.20 to 10.0.21) triggered rerunning tests.

Do you have evidence of such a case?

Right now if you run build the project and then run same test twice, gradle will cache the output unless you explicitly clean the test outputs.

First run:

$ ./gradlew postgresql:test --tests org.postgresql.test.jdbc4.UUIDTest
Starting a Gradle Daemon, 1 stopped Daemon could not be reused, use --status for details

> Configure project :
Building pgjdbc 42.3.0-SNAPSHOT

> Task :postgresql:test

UUIDTest > testUUID[binary=REGULAR, stringType=UNSPECIFIED] STANDARD_OUT
          0.8sec,    8 completed,   0 failed,   0 skipped, org.postgresql.test.jdbc4.UUIDTest
          1.8sec,    8 completed,   0 failed,   0 skipped, Gradle Test Run :postgresql:test

Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.8.3/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 14s
6 actionable tasks: 1 executed, 5 up-to-date

Change some properties:

$ echo 'preferQueryMode=simple' >>build.local.properties

Second run of same test is still cached:

$ ./gradlew postgresql:test --tests org.postgresql.test.jdbc4.UUIDTest

> Configure project :
Building pgjdbc 42.3.0-SNAPSHOT

Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.8.3/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 1s
6 actionable tasks: 6 up-to-date

I had thought the same is happening in CI based on the run times for the scheduled actions but now I'm thinking there's a different issue. GitHub does not set the default values for inputs when doing scheduled cron of actions runs so GRADLE_ARGS is empty and it's just not doing anything during that step. I'm going to fix that in a different branch then come back to this again after.

vlsi · 2021-05-18T11:17:04Z

Right now if you run build the project and then run same test twice, gradle will cache the output unless you explicitly

That is valid, and we should fix that.

However, adding the file to the classpath is the wrong approach to tackle the issue.
If you want to explain Gradle that "test" task depends on the contents of a properties file, then please do so. There are APIs to do that, and adding the file to the classpath is just using the wrong tool for the purpose. Even if it works now, it would make the maintenance hard in the long run.

sehrope · 2021-05-18T11:56:10Z

What other API or approach should it be using?

This one seemed to work well as it allows gradle to continue to cache all the test class compile operations but still consider the runs stale if a properties file changes.

vlsi · 2021-05-18T12:08:20Z

For instance: https://docs.gradle.org/current/userguide/more_about_tasks.html#sec:runtime_api_for_adhoc

This one seemed to work well as it allows gradle to continue to cache all the test class compile operations but still consider the runs stale if a properties file changes.

Well, if you put a properties file on the classpath, then I would expect that there's some code that really fetches the resource from the classpath and use it somehow.

However, if you do only for staleness check, then it is using the wrong tool for the job.

For instance, if you add files to the classpath, then the files could be silently included into the -tests.jar (if we build it), and they might leak credentials (if any).

If you use inputs.property(...) or inputs.file(..) then it would be quite obvious what the thing means, the impact would be confined to staleness check, and ./gradlew -i ... would even print the reason of task re-execution.

sehrope · 2021-05-18T17:01:14Z

Updated to move the properties files checks into task inputs. I removed the CI specific DB version info file as I want to see how this all plays out before adding that back in. If it's required we can add it in a similar way.

While testing this I cleaned up the naming for the omni jobs so each one better reflects the slice of the environment it's supposed to be testing. I don't think it'll show up in the repo's action until this is actually merged so see here for an example: https://github.com/sehrope/pgjdbc/runs/2612337762?check_suite_focus=true

That clean up also lead to figuring out a way to get the replication tests to pass that partly involves running first by themselves. With no other tests writing WAL to the test DB yet they consistently succeed. It feels clumsy compared to running the entire suite, but it's better than it consistently failing due to that that buffer issue hanging the tests.

vlsi · 2021-05-18T17:28:10Z

What is the purpose of splitting the test jobs into three?

It somewhat defeats Gradle build scan which aggregates test outcomes from a single execution, and if you split the tests, then you need to aggregate test results.

sehrope · 2021-05-18T17:38:39Z

To force the replication tests to go first and the slow tests to go last. Without overriding the dispatch inputs, the majority of the matrix jobs only run the main set of tests. Right now there's no scan enabled for these builds.

Is there a way to run everything but enforce a specific order for the tests?

vlsi · 2021-05-18T17:41:42Z

Is there a way to run everything but enforce a specific order for the tests?

Glad you asked :)

JUnit5 has @TestMethodOrder and @TestClassOrder annotations: https://junit.org/junit5/docs/5.5.0/api/org/junit/jupiter/api/TestMethodOrder.html

vlsi · 2021-05-18T17:42:59Z

An alternative option is to create several Gradle tasks, however, I am not sure they would play well with the build scan.

sehrope · 2021-05-18T17:55:41Z

Oh that looks like it'd be perfect. I'll try it out.

vlsi · 2021-05-18T18:07:35Z

Is synchronous_commit really an issue for replication tests?

I guess it was there to make the tests faster. Does it make a difference? Can we move WAL logs to tmpfs/ramfs or something like that?

sehrope · 2021-05-18T18:10:29Z

Yes it was causing issues as the tests expect all the WAL to be flushed prior to starting. I was trying out some ideas of forcing the replication connections to enable it and force a sync, but that still had issues. There's enough other randomness causing those replication tests to fail that I decided to simplify it and deal it it later. Plus we already have fsync disabled too.

sehrope · 2021-05-18T18:17:32Z

Dang. Looks like class ordering is only in 5.8 snapshot. The method level annotation is available in 5.6 but we'd have to add it to every replication method individually.

sehrope · 2021-05-18T18:28:13Z

Actually maybe we can use a custom MethodOrderer that does our own sorting.

vlsi · 2021-05-18T18:39:53Z

Does the ordering exist in 5.8.0-M1 ?

vlsi · 2021-05-18T18:42:31Z

Just wondering: why do you want to impose ordering on WAL tests?

I agree it might be worth ordering slow tests last (to get faster feedback for developers) or first (to get faster CI build times)

However, I do not understand why do you want to order replication tests.

sehrope · 2021-05-18T18:46:01Z

Yes looks like it's there in the snapshot build: https://junit.org/junit5/docs/snapshot/api/org.junit.jupiter.api/org/junit/jupiter/api/ClassOrderer.html

I want to order them because if they run first (or alone) they'll actually execute successfully. Right now those tests always fail and we just mark them as ignored.

Try spinning up a clean test DB and running just the replication tests. They'll all pass. But if you run the full suite something it fails. My theory is that it's something to do with existing WAL prior to the tests starting. While it'd be great to either have that bug fixed server side or fix the test itself, it's relatively low hanging fruit to just run those tests first so we can see they succeed in CI.

vlsi · 2021-05-18T18:53:59Z

It is really great you dig that, however, it would be cool if you documented why is something done rather than whats done.

it's relatively low hanging fruit to just run those tests first so we can see they succeed in CI

Frankly speaking, it is hard to tell if your commit is just a debug only or if you intend to merge it.

If the commit is one-time for debugging only, then it does not require review. However, if that is the case, you could probably do that in your own fork to avoid raising unexpected reviews. You could add "WIP, do not review" label, etc.

On the other hand, if you indeed intend merging the commit, then please add the relevant explanations on why you do something. For instance, you mention that you split the tests in three, however, there was no explanation on why do you do that. You probably spent at least a day, and it would be sad for others to spend another couple of days later trying to figure out the intention.

sehrope · 2021-05-18T19:09:35Z

My mistake on that. I think the commit messages describing splitting out the replication tests got removed when I was squashing things. As a general rule I try to keep commits as isolated as possible to simplify review and I only force push to the PR branch when something is ready for wider review.

I really appreciate your feedback on all of this. It's been very helpful!

My goal with all these CI related items has been to get us back to an all-green CI system that actually runs all the tests, preferably in a timely manner. I think we're almost there too.

I'm going to try out that ordering you mentioned in a separate branch and I'll pull it all in when it's ready for review. If it doesn't seem like it's going to pan out then I'll probably keep the broken tests marked experimental so we get the rest of the nice renaming etc. Then on to Travis...

vlsi · 2021-05-18T19:18:52Z

I really appreciate your feedback on all of this. It's been very helpful!

Ok, great we are on the same page. I just saw a well-prepared list of commits that is almost ready for merge, so my fear was "sehrope is going to merge that" :)
I do not want to be nit-picky when it comes to debug-only commits.

I'm going to try out that ordering you mentioned in a separate branch

Just in case, if the replication tests fail because they run concurrently with something else, then the resolution is to add something like @Isolated rather than enforce ordering.

Adds backs enabling synchronous_commit by default to the test postgres server container as having it disabled causes some replication tests to fail due to unflushed WAL.

Adds .properties files to gradle test tasks as file inputs so that changes to any one of those files will invalidate cached test runs. Previously gradle did not consider a property file change so adjusting build properties would not actually re-run tests without explicitly cleaning cached test output.

sehrope · 2021-05-20T16:44:44Z

This latest version should have all the recent feedback incorporated into it. The commit messages go into more detail but here's the high level:

Improves omni job naming: https://github.com/sehrope/pgjdbc/actions/runs/860925787
Splits out the omni replication and slow tests so they run as their own matrix item. The action still has a single task to run the gradle build; it's all controlled by the include / exclude tags. Isolating the replication tests allows them to consistently run successfully. Code coverage for all the tasks gets merged in the codedov report too we can see that they actually ran: https://codecov.io/gh/sehrope/pgjdbc/tree/bba6c86ba9796beb2d5de877bf676c36fe531ade/pgjdbc/src/main/java/org/postgresql
Prevents the seed task from running if their no S3 access keys (See Question regarding "Seed Build Cache" action #2159 for more detail)
Removes ACTIONS_ALLOW_UNSECURE_COMMANDS=true from all the actions. Was not needed so let's remove it while we can.
Improves the main action job naming so the entire line fits in the web UI and does some re-ordering / clean up there as well.

vlsi · 2021-05-20T17:01:27Z

Naming is way better now, thank you.
However, do you think you could display the JDK version itself instead of "Latest JDK"?
For instance, Zulu 11, OpenJ9 15, etc.

I think we could drop "Other JDK -" from the label (I guess it adds no value).

sehrope · 2021-05-20T17:13:15Z

I thought you might like them :).

Yes that's a good idea with the JDK name. I'm going to add it and the actual PG version as the suffix across the board. Anything with a more specific name will be up front., e.g. "Adopt Hotspot 8 x PG 13 " or "Slow Tests - Zulu 11 x PG 13"

…dalone Improves names of omni actions jobs and present them in a sorted manner so that all the experimental jobs appear last. Also changes the job execution so that a single job performs the replication test on their own. Running them combined with other tests causes them to randomly fail and this is a hopefully temporary measure to get the tests running successfully even if it is in isolation. Gradle args as a dispatch parameter have been removed. All jobs now run test suite with code coverage and enable Gradle build scans. By default slow and replication tests are not run on the majority of matrix jobs. This can be overridden with the new "Default test group" input to specify which tests they will be running: fast - Only fast tests slow - Fast and slow tests all - All tests including replication tests

Separates omni action gradle cache so that each combination of JDK distribution and version has its own cache. That way two different JDK 11 distributions do not share the same cache.

…vailable

… top

sehrope · 2021-05-20T17:19:47Z

I think we got a winner: https://github.com/sehrope/pgjdbc/actions/runs/861226522

vlsi reviewed May 17, 2021

View reviewed changes

sehrope force-pushed the fix-gradle-ci-caching branch from cd225e3 to 2e9203f Compare May 18, 2021 16:53

sehrope added 3 commits May 20, 2021 11:15

test: Mark large sizes in ByteStreamWriterTest as SlowTest

1163929

test: Enable synchronous_commit by default in test postgres container

a6c8bf0

Adds backs enabling synchronous_commit by default to the test postgres server container as having it disabled causes some replication tests to fail due to unflushed WAL.

sehrope force-pushed the fix-gradle-ci-caching branch from 2e9203f to bba6c86 Compare May 20, 2021 16:30

sehrope added 7 commits May 20, 2021 13:16

test: Separate omni action gradle cache by jdk type and version

e2a2c35

Separates omni action gradle cache so that each combination of JDK distribution and version has its own cache. That way two different JDK 11 distributions do not share the same cache.

test: Remove ACTIONS_ALLOW_UNSECURE_COMMANDS from all actions

9c10eef

test: Restrict seed build cache action to only run when S3 keys are a…

d3271e2

…vailable

test: Rename main CI actions so they fit better in the web UI

54acf21

test: Reorder main CI actions so that non-matrix builds are always on…

c246eef

… top

test: Remove dead code from main CI action source distribution check job

4c0a6b3

sehrope force-pushed the fix-gradle-ci-caching branch from bba6c86 to 4c0a6b3 Compare May 20, 2021 17:19

sehrope changed the title ~~Teach Gradle to include server version and CI matrix in test runtime cache key~~ Gradle and CI improvements May 21, 2021

sehrope merged commit 305ee1a into pgjdbc:master May 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradle and CI improvements #2156

Gradle and CI improvements #2156

sehrope commented May 17, 2021

vlsi May 17, 2021

sehrope May 17, 2021

vlsi May 17, 2021

sehrope May 17, 2021

vlsi commented May 18, 2021

sehrope commented May 18, 2021

vlsi commented May 18, 2021 •

edited

sehrope commented May 18, 2021

vlsi commented May 18, 2021

sehrope commented May 18, 2021

vlsi commented May 18, 2021

sehrope commented May 18, 2021

vlsi commented May 18, 2021

vlsi commented May 18, 2021

sehrope commented May 18, 2021

vlsi commented May 18, 2021

sehrope commented May 18, 2021

sehrope commented May 18, 2021

sehrope commented May 18, 2021

vlsi commented May 18, 2021

vlsi commented May 18, 2021

sehrope commented May 18, 2021

vlsi commented May 18, 2021 •

edited

sehrope commented May 18, 2021

vlsi commented May 18, 2021

sehrope commented May 20, 2021

vlsi commented May 20, 2021

sehrope commented May 20, 2021

sehrope commented May 20, 2021

Gradle and CI improvements #2156

Gradle and CI improvements #2156

Conversation

sehrope commented May 17, 2021

vlsi May 17, 2021

Choose a reason for hiding this comment

sehrope May 17, 2021

Choose a reason for hiding this comment

vlsi May 17, 2021

Choose a reason for hiding this comment

sehrope May 17, 2021

Choose a reason for hiding this comment

vlsi commented May 18, 2021

sehrope commented May 18, 2021

vlsi commented May 18, 2021 • edited

sehrope commented May 18, 2021

vlsi commented May 18, 2021

sehrope commented May 18, 2021

vlsi commented May 18, 2021

sehrope commented May 18, 2021

vlsi commented May 18, 2021

vlsi commented May 18, 2021

sehrope commented May 18, 2021

vlsi commented May 18, 2021

sehrope commented May 18, 2021

sehrope commented May 18, 2021

sehrope commented May 18, 2021

vlsi commented May 18, 2021

vlsi commented May 18, 2021

sehrope commented May 18, 2021

vlsi commented May 18, 2021 • edited

sehrope commented May 18, 2021

vlsi commented May 18, 2021

sehrope commented May 20, 2021

vlsi commented May 20, 2021

sehrope commented May 20, 2021

sehrope commented May 20, 2021

vlsi commented May 18, 2021 •

edited

vlsi commented May 18, 2021 •

edited