-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Spring Cleaning #2137
CI Spring Cleaning #2137
Conversation
I'm afraid the change kills testing for the old DB versions. There are lots of combinations (Java in [8, 11, 16]; os in [Linux, windows, mac]; GSS in [Y, N]; PG version in [9...14]; timezone in [...]; and so on) What if |
Yes we definitely need a wider coverage than what's in the current actions matrix but rather than have it be random, I'm thinking we handle that with a separate full workflow that's run on a schedule (e.g. daily). I don't like randomizing the PR action test matrix itself. Maybe if the randomness was seeded with something that was fixed per-PR (hash of the branch name?) so that it's consistent. But that seems like it'd be more hassle than it's worth. Especially if we have a separate task providing wider coverage. |
What is your plan? Do you intend to replicate the comparable coverage via Travis? |
Yes that's the plan. A full matrix covering everything currently in Travis and hopefully even more. I'm putting it together as an improved docker configuration. It'll be leveraged by the GitHub Actions and also provide a simple way for developers to test their changes against the exact same server configurations. There's no rush to remove anything until the new coverage is in place. We can hold off on this entire PR or merge the rest and leave out the Travis removal. |
It's still a work in progress but see here for the "full matrix" branch: https://github.com/sehrope/pgjdbc/tree/omni-ci I haven't pushed it to this PR branch yet as I didn't want to spam the repo. Took quite few a few pushes to get the actions actually running. Once it's ready for review I'm going to layer it atop of here. That branch adds an "Omni CI" action to test a gigantic matrix of JDK x PG x SSL x XA. Also has a couple extra for replication and TZ testing. Rather than waiting for it for every PR, it runs every day at 06:00UTC so we'll have a daily report (Here's an example: https://github.com/sehrope/pgjdbc/actions/runs/782109472). We can also start it manually from the GitHub UI which is handy for a release branch (you can pick the branch it builds). While working on this I found a couple of race conditions in the tests but overall they succeed across everything that's tested so far. The flakiest ones are the replication tests that hang on that large read/write so I've marked those as "experimental" for now. I can reproduce that locally when running the full suite but interestingly running just the failing tests does not cause the error. It must require a build-up of WAL generated by the rest of the suite or some other slowdown to trigger the exact situation. I've also cleaned up the docker environment a bit and fiddled with some of it's default server settings to speed things up (e.g. we don't need fsync for ephemeral tests). On my local machine I'm seeing a 30-40% improvement in run time for the full suite running against the stock test docker container. As the Still need to add a couple of the more advanced test configurations, additional JDK options (right now it's just OpenJDK 8-15) , and possibly limiting the overall matrix a bit as the N x M x O x P... is a bit overkill. I'm hoping to circle back to this next week. If it looks like the additional complex configs are taking too long to set up, I'll rip out Travis removal and get this ready for inclusion with the current improvements. |
Do we really need to test every version of jdk 8-<latest>?
Would it make more sense to test LTS releases and latest and maybe an EA of
next?
So right now that would be 8, 11, 16, and 17ea.
It seems that could cut down the test execution time.
…On Sun, Apr 25, 2021 at 6:55 AM Sehrope Sarkuni ***@***.***> wrote:
It's still a work in progress but see here for the "full matrix" branch:
https://github.com/sehrope/pgjdbc/tree/omni-ci
I haven't pushed it to this PR branch yet as I didn't want to spam the
repo. Took quite few a few pushes to get the actions actually running. Once
it's ready for review I'm going to layer it atop of here.
That branch adds an "Omni CI" action to test a gigantic matrix of JDK x PG
x SSL x XA. Also has a couple extra for replication and TZ testing. Rather
than waiting for it for every PR, it runs every day at 06:00UTC so we'll
have a daily report (Here's an example:
https://github.com/sehrope/pgjdbc/actions/runs/782109472). We can also
start it manually from the GitHub UI which is handy for a release branch
(you can pick the branch it builds).
While working on this I found a couple of race conditions in the tests but
overall they succeed across everything that's tested so far. The flakiest
ones are the replication tests that hang on that large read/write so I've
marked those as "experimental" for now. I can reproduce that locally when
running the full suite but interestingly running just the failing tests
does not cause the error. It must require a build-up of WAL generated by
the rest of the suite or some other slowdown to trigger the exact situation.
I've also cleaned up the docker environment a bit and fiddled with some of
it's default server settings to speed things up (e.g. we don't need fsync
for ephemeral tests). On my local machine I'm seeing a 30-40% improvement
in run time for the full suite running against the stock test docker
container. As the main.yml uses the same container, it should speed up
the regular PR testing flow as well.
Still need to add a couple of the more advanced test configurations,
additional JDK options (right now it's just OpenJDK 8-15) , and possibly
limiting the overall matrix a bit as the N x M x O x P... is a bit
overkill. I'm hoping to circle back to this next week. If it looks like the
additional complex configs are taking too long to set up, I'll rip out
Travis removal and get this ready for inclusion with the current
improvements.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2137 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAW3U3IIAXCU6IFC53MQASDTKP7LFANCNFSM43O5LESQ>
.
|
I am afraid that does not qualify as fair use of what GitHub kindly provides. |
Yes I'm definitely going to limit things before bringing it into this PR. Part of the big matrix approach was just to see if it would even work (both the tests on all the platforms and the overkill matrix). Will probably do something like:
So it'd be on the order of: (Number of PG versions) + (Number of stable JDKs) + 1 + (Number of "interesting" combos) + (Numer of "other" JDKs) I think that should be an acceptable amount and it won't grow that fast either (linearly with each new PG or JDK release). From what I've read in the GitHub Actions docs, I think we can parameterize the actions as well. That means we could have either the randomization factor be controlled (defaulting to N random tests but override to run a different limit) or even add options to turn on slower / extensive tests that would be disabled by default. I'll circle back once I've had some time to clean this up some more. |
I pushed an update that cleans up some more tests and the omni ci now runs with the following builds:
Here's a sample run: https://github.com/sehrope/pgjdbc/actions/runs/797070873 For the "Other JDKs" it should be able to handle anything the GitHub action/install-jdk can handle directly or anything that can supply a tarball. For now I've included:
Thanks to a combination of defaulting to disabling some of the slower tests and fancy server settings (e.g. disable fsync and full page writes), the entire run takes only 17m. I think that should be fine for something that runs daily. We can also manually fire it for a branch from the GitHub UI. This should cover just about everything that Travis was previously testing. Only thing left out is running against server HEAD revision. We can set that up as a separate exercise as the container required for it is totally different. Another positive side effect of all this is that all the scripts for bootstrapping the CI environment work locally. There's nothing GitHub specific so for somebody new to the repo can easily reproduce CI errors locally via: # Start server in the foreground in a terminal:
$ docker/bin/postgres-server
# Build and test entire driver:
$ ./gradlew clean build test |
Any other thoughts on this prior to merging? Still some open TODO items to get a server source CI target going, but that might as well be it's own separate action given how completely different it will be. |
a) The key issue is that it is really hard to maintain the set of tested (and untested) combinations manually. Travis did include hand-crafted list of jobs, and it was hard to figure out which one can be removed. b) We need to collect coverage so we know if the test code does touch the lines affected by PR. I guess we don't need to collect coverage from all the jobs since most of them might be similar and coverage collection makes tests slower, hovewer, XA and Java might touch different code paths, so we aggregate coverage data from several jobs. c) the current labels are very long, and it makes Actions UI almost useless as you don't see what is tested. d) 27 jobs for each PR might be too much. |
We're not really losing anything with Travis right now anyway as the jobs run successfully there anymore. We could always leave it on, but having every PR complain about not passing CI because Travis itself doesn't work is annoying.
We could do the reverse and see what's being included in the "default" test environment to figure out what's missing.
I tried out a couple options for giving descriptive names and settled on the JSON representation as it's the only one that has the full details. I figured that we're primarily going to look at them when one of them failed and the JSON has everything needed to recreate it locally. The matrix generation in the CI template is commented to describe the environment and what impact that would have on testing. Easier to check there to understand the combos.
This isn't meant to be run per-PR. It's a single CI pipeline that runs daily at 6am UTC or manually (e.g. before a release or a one-off to test a particularly complex branch). |
496f6c5
to
68fcded
Compare
I removed the commit that deletes Travis and added a couple that might fix things over there. The two issues are the underlying OS for Travis is ancient (Ubuntu Trusty 14.04) and it's pointing at apt.postgresql.org which does not have the out of support server versions. I'm trying some more modern OS options and switched out PGDG repo to use apt-archive.postgresql.org. Let's see if this finally gets that environment working again. |
322c3dd
to
748fb7f
Compare
…oncurrency friendly
Adds a helper script to wait in a loop for a postgres server to startup.
Refactors docker environment to support versions going back to 8.4. Speeds up tests by defaulting to disabling fsync, syncronous_commit, and full_page_writes. All three can be enabled back via environment variables. Adds omni.yml GitHub Actions CI configuration to automate testing of a wide dynamic matrix of server versions, JDKs, and other testing options on an daily basis. Also adds a helper script, docker/bin/postgres-server, that can be used from on a developer's local machine to create a server environment exactly matching the CI platform.
d08fae4
to
50bc398
Compare
After going through the Travis setup I'm convinced we should scrap it entirely. Between the yaml file, the scripts, and the matrix environment flags, it's impossible to figure out where anything gets configured. If anybody wants to follow up on trying to get it working, I've got my latest attempt here: https://github.com/sehrope/pgjdbc/tree/ci-spring-cleaning-misc-travis-non-working I've removed the "Removing Travis" commit from this branch. Once it passes CI I'm going to let it sit till tomorrow and then merge. All these changes either fix things or are purely additive to the test setup (e.g. the new omni action). We can have a separate PR to either remove Travis or rewrite it from scratch. |
This PR addresses a couple of the topics in #2136
This does not yet include the third-party action version pinning as I'm hoping for a way to store the value just once and reference it throughout the template. I tried top level environment values but you can't reference them in
uses: ...
clauses.The "Only run self-hosted on pgjdbc/pgjdbc..." is done via a new build step that dynamically creates the build matrix to use for downstream steps. It outputs a JSON string that gets parsed by the other jobs as their
matrix
key. More info on how this works here: https://github.blog/changelog/2020-04-15-github-actions-new-workflow-features/