Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend timeout for CI tests #123

Merged
merged 1 commit into from
Nov 17, 2021
Merged

Conversation

galargh
Copy link
Contributor

@galargh galargh commented Nov 16, 2021

I believe the tests in #112 just about miss the 10m mark and that by extending the timeout to 20m we will be able to get more consistent runs.

I have tested it in CI on my forked repo: https://github.com/galargh/go-ds-crdt/actions/runs/1467239280 (the one failure on mac was cause by network issues).

By merging this PR we can restore the common, integrated CI and retire the custom action (note that the custom testing action was also prone to the timeout issue in some runs - it's just less noticeable because we're testing on fewer platforms there).

@welcome
Copy link

welcome bot commented Nov 16, 2021

Thank you for submitting this PR!
A maintainer will be here shortly to review it.
We are super grateful, but we are also overloaded! Help us by making sure that:

  • The context for this PR is clear, with relevant discussion, decisions
    and stakeholders linked/mentioned.

  • Your contribution itself is clear (code comments, self-review for the
    rest) and in its best form. Follow the code contribution
    guidelines

    if they apply.

Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
Next steps:

  • A maintainer will triage and assign priority to this PR, commenting on
    any missing things and potentially assigning a reviewer for high
    priority items.

  • The PR gets reviews, discussed and approvals as needed.

  • The PR is merged by maintainers when it has been approved and comments addressed.

We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution.
We are very grateful for your contribution!

@galargh galargh force-pushed the go-test-setup branch 2 times, most recently from 244f553 to 79fa39c Compare November 16, 2021 14:14
@galargh galargh marked this pull request as ready for review November 16, 2021 14:57
@galargh
Copy link
Contributor Author

galargh commented Nov 16, 2021

@hsanjuan do you think that would be an acceptable approach to testing in this repo?

@hsanjuan
Copy link
Collaborator

It is acceptable but I would just modity the actual action file directly for the test that takes long and add the necessary flag there.

@galargh
Copy link
Contributor Author

galargh commented Nov 16, 2021

@marten-seemann please correct me if I'm wrong. The actual go call that times out is defined here:

run: go test -v ./...
. We cannot modify this file directly because the automatic PRs from web3dt-bot (like this one #112) would overwrite it in the future. AFAICT custom actions within .github/actions/go-test-setup is the most fitting place to put the timeout override, at least for now.

@marten-seemann
Copy link
Member

@galargh Your understanding is correct.
Without having any insight into the code being tested here, is it really necessary to have a test that runs for more than 10 minutes? That seems like something we’d want to avoid anyway.

@hsanjuan
Copy link
Collaborator

I have removed the "automatic" CI stuff made by the bot because it only introduced tests that failed and took overly long. And honestly, this was a repo with its CI that was working well and I had no problem with it until things were automated for me. I am not sure that the 10 minute limit was the reason that things were failing though.

Regarding the 10 minute test and the last failure, the datastore test-suite is quite heavy on this datastore so yes, I think it is probably normal that it takes around that time, also the reason why it runs separately from the other tests and the reason also why it is disabled when running with race detector.

@marten-seemann
Copy link
Member

I have removed the "automatic" CI stuff made by the bot because it only introduced tests that failed and took overly long.

Please don't modify / delete any files deployed by web3-bot. It will be re-added automatically the next time the deploy script runs. Really, I don't think "let's not test at all" is the right answer to a failing test case.

@hsanjuan
Copy link
Collaborator

It's not "let's not test". Things are tested. The bot is just causing me work and solving problems I don't have.

How can I opt out of web3-bot?

@galargh
Copy link
Contributor Author

galargh commented Nov 17, 2021

I am not sure that the 10 minute limit was the reason that things were failing though.

Looking at the past runs it does seem so (here for example https://github.com/ipfs/go-ds-crdt/runs/3905630735). You can also see a set of passing runs for the automatic CI workflow here: https://github.com/galargh/go-ds-crdt/actions/runs/1467239280 (this is with the extended timeout).

I'd also like to add that the custom CI workflow is also prone to timeout issue. See these runs for example:

So even if the custom CI workflow was to stay in this repo I think extending the timeout for it or optimising the test might be a good idea.

I'm not too well versed in the PL ways yet but there are definitely some advantages I can see in the automatic CI approach:

  • more thorough testing (matrix of platforms + architectures)
  • adherence to PL policy (e.g support for current and previous Go versions)
  • automatic updates to the workflow code

It definitely comes at a cost of less customisation options.

I've had one more idea looking at the setup in this repo. Maybe we could setup tests here so that if -short flag is provided on the command line then TestDatastoreSuite is not executed. Then, we could add the -short flag to the automatic CI setup which would remove the need for extending timeout. Finally, we could remove the build section from https://github.com/ipfs/go-ds-crdt/blob/master/.github/workflows/go.yml#L6 and leave only the suite part there. That way we'll still have quick multi-platform, multi-arch CI + custom long running CI. What do you think? I could make the required changes if that's something you were interested in.

@marten-seemann
Copy link
Member

How can I opt out of web3-bot?

In principle, by removing this repo from https://github.com/protocol/.github. However, I would heavily advise against this. We've decided to standardize our workflows because having a separate config in every single repo just didn't work, especially given the large number of repos we have. We had a long discussion on how Go repositories should be tested, and looking at https://github.com/ipfs/go-ds-crdt/blob/master/.github/workflows/go.yml, it seems like a lot of use cases we care about remain untested here:

  • it doesn't test with the current Go version (which is the version we use to cut releases with!),
  • it doesn't test on 32-bit systems,
  • it doesn't test on OSX and Windows
  • additionally, it's *vulnerable to supply-chain attacks (via staticcheck)

@galargh's suggestion to pass the timeout flag to our go test calls sounds pretty reasonable to me, if the test case can only test things meaningfully in that time frame. I think we can increase the timeout globally (@galargh, do you want to create a PR in .github to do so?), maybe to 30 minutes or so? I assume that repo maintainers still have an incentive to keep their tests fast, as nobody likes waiting on CI.

@hsanjuan
Copy link
Collaborator

Ok, I am not against web3-bot existing, I am against it being added here and causing test failures when the original testing works based on one-glove-fits-all assumptions.

All the "advantanges" are mostly irrelevant here, for this library concretely.

@hsanjuan hsanjuan merged commit fbf3c3a into ipfs:web3-bot/sync Nov 17, 2021
@galargh galargh deleted the go-test-setup branch November 17, 2021 10:06
@hsanjuan
Copy link
Collaborator

I've merged because the patch should potentially solve the test failures, also with the web3-ci, whenever it gets re-added.

I have no problem having it around if it works. I have problem with failures that come from lack of options for customization but are not signalling any real issues.

@galargh
Copy link
Contributor Author

galargh commented Nov 17, 2021

I have problem with failures that ... are not signalling any real issues.

I think that's a very valid point. FYI, right now, I'm making my way through all of the unmerged web3d-bot PRs to figure out what the reasons are for the auto CI failures.

marten-seemann pushed a commit that referenced this pull request Nov 17, 2021
@marten-seemann
Copy link
Member

when the original testing works based on one-glove-fits-all assumptions

Not sure which "one-glove-fits-all assumptions" you mean (as I've said before, I think extending the test timeout is a very reasonable thing to do). When it comes to running tests on all platforms that we care about, I see that pretty much as a requirement.

I have no problem having it around if it works. I have problem with failures that come from lack of options for customization but are not signalling any real issues.

Happy to discuss customizations further. If you're missing a customization option, can you please open an issue in https://github.com/protocol/.github?
The goal of Unified CI is not to make your life hard, the goal is that we have to worry about CI (and updating CI) less! And we've achieved this for more than 150 repos across our entire stack already.

hsanjuan pushed a commit that referenced this pull request Nov 17, 2021
* bump go.mod to Go 1.16 and run go fix

* run go mod tidy

* run gofmt -s

* update .github/workflows/automerge.yml

* update .github/workflows/go-test.yml

* update .github/workflows/go-check.yml

* extend timeout for CI tests (#123)

Co-authored-by: web3-bot <web3-bot@users.noreply.github.com>
Co-authored-by: Piotr Galar <piotr.galar@gmail.com>
@hsanjuan
Copy link
Collaborator

When it comes to running tests on all platforms that we care about, I see that pretty much as a requirement.

This is a small library and these tests are meant to test the logic implemented on this library, and this does not deal with OS, arch or anything close to that so for me testing on MacOS or Windows or 32bit HERE has very little value. That testing happens or should happen below (i.e. in badger) or above (i.e. in cluster) where those things maybe relevant.

For the same reason, it is going to pass all those tests, just taking more time. I already had to adapt things on my code because the assumption of the unified-CI that everything can run with the race detector enabled and still need to run the tests as I had them before so the datastore test suite runs correctly.

the goal is that we have to worry about CI (and updating CI) less!

Yes, so I am trying to explain that this is causing me worries, because as maintainer/babysitter of this library I had no trouble at all with what I had, whereas now I am spending time with Unified CI stuff, and that was the reason I removed it.

I do thank you and @galargh for solving the issues with the timeouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants