Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Improve check-bioc.yml action #8

Closed
nuno-agostinho opened this issue Aug 15, 2020 · 9 comments
Closed

[FR] Improve check-bioc.yml action #8

nuno-agostinho opened this issue Aug 15, 2020 · 9 comments

Comments

@nuno-agostinho
Copy link

nuno-agostinho commented Aug 15, 2020

Hello!

I really like the work you put into creating check-bioc.yml and I used it as the basis to create my own GitHub Actions. Thank you for this amazing resource! However, I think the file is a bit complex to be understood and modified by novice users.

Here I present you some suggestions to improve on it. You can see an example of a GitHub Action of mine in production with the suggestions that follow.

Please let me know what do you think! Again, thank you for your amazing work :)

Branches to run on

First of all, I think all branches should be run on GitHubActions (with possible exceptions like github-pages). As an R/Bioconductor package developer, I have development branches to which I always want to run in GitHub Actions. So, instead of:

on:
push:
branches:
- master
- 'RELEASE_*'
pull_request:
branches:
- master
- 'RELEASE_*'

We could have:

on:
  push:
    branches-ignore:
      - gh-pages
  pull_request:
    branches-ignore:
      - gh-pages

Windows, Mac and Linux builds

I would argue that check-bioc.yml's complexity narrows down to code redundancy largely due to testing the Linux build in a Docker image separately from Windows and Mac. Also, this also means that we have to wait for the Linux version to finish testing before running the build in Windows and Mac.

As such, maybe it would be best to remove the Docker part (an alternative would be to test Windows/Mac in Docker images, but GitHub Actions does not support this as far as I know).

Build in both bioc-release and bioc-devel

I always like to know if there are issues with the code I am writing in both the release and development version of Bioconductor. This way, we know if something is going to break in a specific version of Bioconductor beforehand. My matrix configuration is like so:

matrix:
        config:
          - {os: windows-latest, biocversion: "release"}
          - {os: windows-latest, biocversion: "devel"}
          - {os: macOS-latest,   biocversion: "release"}
          - {os: macOS-latest,   biocversion: "devel"}
          - {os: ubuntu-latest,  biocversion: "release"}
          - {os: ubuntu-latest,  biocversion: "devel"}

To find the R version related with each version of Bioconductor, we can use bash:

      - name: Find R version to run
        run: |
          config="https://bioconductor.org/config.yaml"
          rversion=$(curl ${config} | \
            grep r_version_associated_with_${BIOCVERSION} | \
            grep -o "[0-9]*\.[0-9]*\.[0-9]*")
          echo "Using R ${rversion}..."
          echo "::set-output name=rversion::${rversion}"
        shell:
          bash {0}

Ubuntu versions

When defining the Ubuntu version to test on, why is RSPM also defined?

# - {os: ubuntu-16.04, rspm: "https://packagemanager.rstudio.com/cran/__linux__/xenial/latest"}

Although this is fine for specific versions of Ubuntu, a novice may simply update the version without understanding what the RSPM is and get confused with potential failures afterwards. I would suggest to either remove the RSPM variable or to add other Ubuntu versions with their respective RSPM values.

Also, I would suggest to put here - {os: ubuntu-latest}.

@nuno-agostinho
Copy link
Author

nuno-agostinho commented Aug 21, 2020

It may also be relevant to regularly run these checks (daily? every three days? weekly?)

To do so, we can add a cron scheduler:

on: 
  push: 
    branches: 
      - master 
      - 'RELEASE_*' 
  pull_request: 
    branches: 
      - master 
      - 'RELEASE_*' 
  schedule: 
    - cron: "0 7 * * *" # Run every day at 07:00 UTC

Some other alternatives (using crontab.guru makes it easier to create these):

# Every Monday at 07:00 UTC
cron: "0 7 * * 1"

# Every Monday, Wednesday and Friday at 07:00 UTC
cron: "0 7 * * 1,3,5"

# Every weekday at 07:00 UTC
cron: "0 7 * * 1-5"

# Every two days at 07:00 UTC
cron: "0 7 * * */2"

# Every two days from Monday to Friday at 07:00 UTC
cron: "0 0 * * 1-5/2"

# Every month on the first day of the month at 07:00 UTC
cron: "0 0 1 * *"

@LiNk-NY
Copy link
Contributor

LiNk-NY commented Sep 1, 2020

Hi Nuno, @nuno-agostinho

These are helpful suggestions.

I think that developers should have one bioc_check.yml file per branch to keep it as simple as possible.
That means each branch (master and RELEASE_3_11) would have one bioc_check.yml file with
on: push and a descriptive name: field to differentiate the GHA workflows.

My opinion is that the purpose of the GH Actions functionality is to test the package development before
it gets pushed up to Bioconductor. Once it is in Bioconductor, the build system takes up the "role" of the
cron jobs proposed here.

Thanks!
-Marcel

@lcolladotor
Copy link
Owner

Hi Nuno,

I agree with Marcel that once the package is in Bioconductor, the Bioconductor daily builds are the best for regular reports on all operating systems. That way, if you are running the GHA workflow on a private repo, you can have more control on your GHA compute hours.

I like the idea of the GHA workflow working in mostly all branches, but that's something that can be edited by the user after using the template from biocthis.

On a similar line of thought, I would not drop the bioconductor-docker code because that's exactly the system that is most useful for reproducing bugs seen on the Bioconductor build machines. If that fails, there's no point really in running tests on Windows/Mac.

I agree that the code complexity is introduced by the redundancy, but at least when I wrote this GHA workflow, there was no way to get around it.

Best,
Leo

@lcolladotor
Copy link
Owner

And yes, it's a complex GHA workflow. But it's what's really made it much easier for me to work and collaborate on new R packages that we are about to submit to Bioconductor and it's helped me a lot with reducing the time spent trying to reproduce errors I might not have on my system.

Instead of simplifying the GHA workflow, I need to expand the documentation.

@nuno-agostinho
Copy link
Author

Hey @lcolladotor !

Thanks for the reply!

On a similar line of thought, I would not drop the bioconductor-docker code because that's exactly the system that is most useful for reproducing bugs seen on the Bioconductor build machines. If that fails, there's no point really in running tests on Windows/Mac.

Okay, but imagine that you are trying to fix a Mac and/or Windows-exclusive issue: you have to wait for the Linux one to finish first (even if you know it’s not problematic) before other builds start. Why not get the R and Bioconductor versions in the Mac/Windows builds from somewhere else so all builds are run in parallel?

Instead of simplifying the GHA workflow, I need to expand the documentation.

Always in favour of improving documentation :)

Cheers!

@nuno-agostinho
Copy link
Author

nuno-agostinho commented Oct 1, 2020

Also,

the Bioconductor daily builds are the best for regular reports on all operating systems.

I would say GHA has its advantages, namely that you can restart a failed build immediately (instead of waiting a day for the next build) and check if something just crashed momentarily (e.g. a timeout of an external API). To test if it’s just a temporary error, if using Bioconductor system:

  • receive alert from Bioconductor from builds whose start time is not defined by the user (so if they run on Friday after work, I will only try to fix it on Monday)
  • branch my repository in GH
  • wait for GHA results (note that the user has to enable running GHA on any branch, which is not part of the default check-bioc.yml)
  • delete the branch if everything’s now okay

whereas using GHA:

  • receive alert from GHA from builds that start at user-defined times
  • click “Re-run all jobs”
  • wait for GHA results

I don’t see the problem of testing in redundant systems and I find GHA more flexible.

@lcolladotor
Copy link
Owner

Hi,

I just finished merging #11 and adding several features. The stuff about docker first then mac & Windows is no longer true, since now all 3 of them run in parallel, thanks to Marcel ^^.

Regarding cron, we can mention it somewhere on the docs, but I wouldn't make it the default. Just as a note, you might want to avoid Fridays since that's the day that the bioconductor docker devel image gets updated.

I have a few meetings and have to polish another package later today. But I'll come back to this issue. If you have time, I'd appreciate a PR for documenting the cron code and/or ubuntu names & RSPM.

Best,
Leo

@LiNk-NY
Copy link
Contributor

LiNk-NY commented Oct 1, 2020

Hi all, @lcolladotor

I agree with @nuno-agostinho about having parallel checks on the package running.
If we were to consider the "pure" Bioconductor approach, we would only use Biconductor "ordained" Docker images which would point to the Linux builds. As mentioned in other conversations, the Bioconductor Build System (BBS) is quite hard to replicate due to its complexity. Currently, the Bioc Docker images are the ones closest to matching the BBS setup. Any failures specific to Windows / Mac will not necessarily be the case on the BBS which sort of detracts from having the Windows / Mac versions available.

Cron jobs are icing on the cake and should be optional inputs to the use_bioc_github_action function for maintainers that want continual checks.

RSPM:
I agree that the RSPM link will change when the Ubuntu version changes. It is a matter for maintainers to pick this up when it does. The next LTS release version is quite a ways ahead of us (it gets released every 2 years AFAIK). We could even use this map to automatically change that link: https://wiki.ubuntu.com/DevelopmentCodeNames

Best,
Marcel

@lcolladotor
Copy link
Owner

Those cron jobs could be expensive icing on the cake hehe ;) Specially if you are testing a package on a private GitHub repo.

And thanks for the link on RSPM. I don't think that we need to make that code dynamic really, given the 2 year time window you outlined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants