Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup-r / setup-pandoc stuck on macOS (hitting 6h timeout) #435

Closed
riccardoporreca opened this issue Nov 13, 2021 · 25 comments
Closed

setup-r / setup-pandoc stuck on macOS (hitting 6h timeout) #435

riccardoporreca opened this issue Nov 13, 2021 · 25 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@riccardoporreca
Copy link
Contributor

Describe the bug
Over the last weeks, I have often seen setup-r (and setup-pandoc, just today) getting stuck and eventually hitting the default 6h timeout.
This has happened on macOS-latest, in different repos and for various R versions.
From the log it is not very clear where and why the execution got stuck (especially for setup-r), and the issue is transient in that a re-run usually goes through.

To Reproduce

Examples of failures, with relevant logs including timing for convenience and for the records:

Sat, 23 Oct 2021 00:51:51 GMT ==> Pouring qpdf--10.3.2.big_sur.bottle.tar.gz
Sat, 23 Oct 2021 00:51:57 GMT 🍺  /usr/local/Cellar/qpdf/10.3.2: 73 files, 6.3MB
Sat, 23 Oct 2021 00:51:57 GMT ==> Pouring checkbashisms--2.21.4.all.bottle.tar.gz
Sat, 23 Oct 2021 00:52:03 GMT 🍺  /usr/local/Cellar/checkbashisms/2.21.4: 6 files, 71.2KB
Sat, 23 Oct 2021 06:51:13 GMT Error: The operation was canceled.
Mon, 25 Oct 2021 14:07:22 GMT ==> Pouring qpdf--10.3.2.big_sur.bottle.tar.gz
Mon, 25 Oct 2021 14:07:27 GMT 🍺  /usr/local/Cellar/qpdf/10.3.2: 73 files, 6.3MB
Mon, 25 Oct 2021 14:07:27 GMT ==> Pouring checkbashisms--2.21.4.all.bottle.tar.gz
Mon, 25 Oct 2021 14:07:32 GMT 🍺  /usr/local/Cellar/checkbashisms/2.21.4: 6 files, 71.2KB
Mon, 25 Oct 2021 20:06:43 GMT Error: The operation was canceled.
Sat, 06 Nov 2021 00:14:44 GMT ==> Pouring qpdf--10.3.2.big_sur.bottle.tar.gz
Sat, 06 Nov 2021 00:14:48 GMT 🍺  /usr/local/Cellar/qpdf/10.3.2: 73 files, 6.3MB
Sat, 06 Nov 2021 00:14:48 GMT ==> Pouring checkbashisms--2.21.4.all.bottle.tar.gz
Sat, 06 Nov 2021 00:14:51 GMT 🍺  /usr/local/Cellar/checkbashisms/2.21.4: 6 files, 71.2KB
Sat, 06 Nov 2021 06:14:12 GMT Error: The operation was canceled.
Sat, 13 Nov 2021 00:16:48 GMT sudo installer -pkg /Users/runner/work/_temp/pandoc-2.7.3-macOS.pkg -target /
Sat, 13 Nov 2021 06:15:27 GMT Error: The operation was canceled.

Expected behavior
It would be great if setup-r and setup-pandoc would capture explicitly the issue causing the action to get stuck, or perhaps just handle a meaningful timeout internally for individual operations, perhaps hinting at re-running the workflow in a timeout error message.
Using the action in a workflow, one can of course set a timeout for the action step, see e.g. riccardoporreca/rmdgallery@65b87fe, but I believe for the users this should be best handled in the action itself.

Additional context
Given how common the issue is (I see it every week in cron runs), it is not great to keep a runner busy for 6hours if we can prevent it.

@riccardoporreca riccardoporreca added the bug an unexpected problem or unintended behavior label Nov 13, 2021
@jimhester
Copy link
Member

This is likely a network failure that never times out, probably from homebrew, I don't think there is anything we can do, just cancel these jobs if you see one taking too long.

@riccardoporreca
Copy link
Contributor Author

Thanks @jimhester, closing this.

@drmowinckels
Copy link

I'm experiencing this quite often (on website daily builds).

Shame theres no way to tag something to retry if there is a timeout.

@ha0ye
Copy link

ha0ye commented Dec 6, 2021

What about adding a manual timeout for that particular step in the GH action?
https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#jobsjob_idstepstimeout-minutes

@gaborcsardi
Copy link
Member

These do not look like network errors to me. They seem to hang after calling installer, either for pandoc or gfortran. This is probably because there is a dialog window for the user to OK.

@gaborcsardi gaborcsardi reopened this Dec 6, 2021
@gaborcsardi
Copy link
Member

I added some debugging to the installer calls, in the master branch. Can you switch some of your projects to that? Hopefully we'll learn what is happening.

riccardoporreca added a commit to riccardoporreca/rmdgallery that referenced this issue Dec 6, 2021
@riccardoporreca
Copy link
Contributor Author

@gaborcsardi, I have triggered a number or builds using setup-r/-pandoc from master. One on the builds hit a timeout (which I have set to 10 min), see https://github.com/riccardoporreca/rmdgallery/runs/4434994244?check_suite_focus=true#step:4:90

Mon, 06 Dec 2021 19:17:15 GMT ==> Pouring qpdf--10.4.0.big_sur.bottle.tar.gz
Mon, 06 Dec 2021 19:17:19 GMT 🍺  /usr/local/Cellar/qpdf/10.4.0: 73 files, 6.4MB
Mon, 06 Dec 2021 19:17:19 GMT ==> Pouring checkbashisms--2.21.5.all.bottle.tar.gz
Mon, 06 Dec 2021 19:17:21 GMT 🍺  /usr/local/Cellar/checkbashisms/2.21.5: 6 files, 71.8KB
Mon, 06 Dec 2021 19:26:56 GMT Error: The action has timed out.

Not sure how much more informative this is, but the installer extra logs might help. For reference, here is the log to an equivalent successful run: https://github.com/riccardoporreca/rmdgallery/runs/4434997287?check_suite_focus=true#step:4:89, I just noticed the failing job above does not log anything like "installer: Package name is R 4.2.0 for macOS". Hope this helps

I hit another failure in https://github.com/riccardoporreca/rmdgallery/runs/4434994121?check_suite_focus=true#step:4:89, which should be completely unrelated

installer: Package name is gfortran
installer: Installing at base path /
installer: The install was successful.
/usr/bin/sudo hdiutil detach /Volumes/gfortran-8.2-Mojave
hdiutil: couldn't unmount "disk2" - Resource busy
Error: Failed to get R 4.0.5: Failed to get R 4.0.5: Failed to umount /Volumes/gfortran-8.2-Mojave: Error: The process '/usr/bin/sudo' failed with exit code 16

@gaborcsardi
Copy link
Member

gaborcsardi commented Dec 6, 2021

Thanks, that is useful! It seems that the R installer can be stuck as well, so I should add the logging to that as well. My theory is that the issue is the concurrent installs interfering. (Pretty much a guess at this point.) More soon.

Btw. the disk eject failure could also be a concurrency issue, but in general we should ignore disk ejection failures, the attached disks do not bother anyone.

@gaborcsardi
Copy link
Member

For the record, I added logging for the R installer as well in the master branch, if you encounter timeouts, please link them to this issue. Thanks!

@riccardoporreca
Copy link
Contributor Author

riccardoporreca commented Dec 6, 2021

@gaborcsardi, I have done another battery of runs (same setup as above), and here the logs of three timeout failures

Hope having this few more examples, including the additional logging, help.

@gaborcsardi
Copy link
Member

@riccardoporreca Thanks, very helpful indeed! One of the installer processes clearly hangs. I still cannot reproduce the this locally, but nevertheless I made the macOS installations sequential now, hopefully this will help.

gaborcsardi added a commit that referenced this issue Dec 10, 2021
Maybe this was causing the hangs, see #435.
@gaborcsardi
Copy link
Member

My impression is that with v2 this is now better, but I still saw some hangs on gfortran:
https://github.com/r-lib/revdepcheck/runs/4477261536?check_suite_focus=true#step:4:45

I'll remove the detach from the gfortran volume, in case that causes it.

@gaborcsardi
Copy link
Member

I reran it a bunch of times, and it seems to work better, so fingers crossed:
https://github.com/r-lib/revdepcheck/runs/4483425337?check_suite_focus=true

@gaborcsardi
Copy link
Member

I would encourage everyone to switch to @v2. I still haven't seen any freezes with the new tag. We can keep this issue still open a bit longer. If there won't be any new issues in about a month, I'll close it.

@drmowinckels
Copy link

Hi.

I got a timeout yesterday it seems with v2
https://github.com/rladies/awesome-rladies-blogs/runs/4540433972?check_suite_focus=true

First one I've seen with this version.

@gaborcsardi
Copy link
Member

Can you try using v2 for everything? I am not sure what GHA does if you checkout both: https://github.com/rladies/awesome-rladies-blogs/runs/4540433972?check_suite_focus=true#step:1:32

setup-pandoc@v2 is pretty much unchanged from v1 I think.

It is a very long shot...

@drmowinckels
Copy link

Sure, but does GHA really care if there is a mix of action versions as long as they dont collide?
I reran the jobs asis and it worked fine this time. As it usually does on rerun.

@gaborcsardi
Copy link
Member

For a second I was thinking if GHA checks them out into the same directory. But apparently not, because the setup-pandoc action does not have debugging, so it is v1, whereas setup-r does, so it is v2.

I would still use v2, maybe the debugging helps. It seems to me that installer returns the prompt while still running some setup processes in the background. This could be different if we specify -dumplog.

@gaborcsardi
Copy link
Member

Anyway, we can add timeouts and retries, if we'll still see errors with a fully v2 workflow. Which is likely.

@gadenbuie
Copy link

setup-pandoc@v2 timed out for me yesterday, here are the relevant logs.

sudo installer -allowUntrusted -dumplog -pkg /Users/runner/work/_temp/pandoc-2.14.2-macOS.pkg -target /
Jan  3 22:09:45  installer[3368] <Debug>: Product archive /Users/runner/work/_temp/pandoc-2.14.2-macOS.pkg trustLevel=350
Jan  3 22:09:45  installer[3368] <Debug>: External component packages (1) trustLevel=350
Jan  3 22:09:45  installer[3368] <Debug>: -[IFDInstallController(Private) _buildInstallPlanReturningError:]: location = file://localhost
Jan  3 22:09:45  installer[3368] <Debug>: -[IFDInstallController(Private) _buildInstallPlanReturningError:]: file://localhost/Users/runner/work/_temp/pandoc-2.14.2-macOS.pkg#pandoc.pkg
Jan  3 22:09:45  installer[3368] <Debug>: Set authorization level to root for session
Jan  3 22:09:45  installer[3368] <Info>: Administrator authorization granted.
Error: The operation was canceled.

(I'm not sure if it's relevant, but I missed updating setup-r in that workflow and it was at v1 for that run.)

@gaborcsardi
Copy link
Member

I have just seen it twice with all actions on v2: https://github.com/r-lib/actions/runs/4714679983?check_suite_focus=true

In both cases the gfortran installation froze:

2022-01-05T12:44:08.1412420Z [command]/usr/bin/sudo installer -allowUntrusted -dumplog -package /Volumes/gfortran-8.2-Mojave/gfortran-8.2-Mojave/gfortran.pkg -target /
2022-01-05T12:44:08.2767460Z Jan  5 12:44:08  installer[1197] <Debug>: fileURLForURL = x-disc://gfortran-8.2-Mojave/gfortran-8.2-Mojave/gfortran.pkg
2022-01-05T12:44:08.2871220Z Jan  5 12:44:08  installer[1197] <Debug>: fileURLForURL = file://localhost/Volumes/gfortran-8.2-Mojave/gfortran-8.2-Mojave/gfortran.pkg
2022-01-05T12:44:08.9498810Z Jan  5 12:44:08  installer[1197] <Debug>: Referenced component packages (1) trustLevel=350
2022-01-05T12:44:08.9602120Z Jan  5 12:44:08  installer[1197] <Debug>: -[IFPKGDerivedDocument sortedPackageLocations]: result = (
2022-01-05T12:44:08.9705050Z 	    "file://localhost"
2022-01-05T12:44:08.9812120Z 	)
2022-01-05T12:44:08.9918330Z Jan  5 12:44:08  installer[1197] <Debug>: -[IFDInstallController(Private) _buildInstallPlanReturningError:]: location = file://localhost
2022-01-05T12:44:09.0025590Z Jan  5 12:44:08  installer[1197] <Debug>: -[IFDInstallController(Private) _buildInstallPlanReturningError:]: file://localhost/Volumes/gfortran-8.2-Mojave/gfortran-8.2-Mojave/gfortran.pkg
2022-01-05T12:44:09.0128490Z Jan  5 12:44:08  installer[1197] <Debug>: Set authorization level to root for session
2022-01-05T12:44:09.0129580Z Jan  5 12:44:08  installer[1197] <Info>: Administrator authorization granted.
2022-01-05T13:13:53.9292480Z ##[error]The operation was canceled.

riccardoporreca added a commit to miraisolutions/rTRNG that referenced this issue Mar 9, 2022
…` examples

* Closes #26.
* Latest setup got via `usethis::use_github_action("check-standard", save_as = "ci.yaml")`, using development `usethis` version (to support relying on the default branch, now `v2-branch`).
* `covr` installed via pak, so it is cached alongside the other dependencies. (Not done as part of `setup-r-dependencies` for all jobs, since it is only used in conditional steps.)
* Drop fixing macOS symlinks (fixed in `v2`), but keep the 5 minutes timeout for setup-r (#24), since r-lib/actions#435 isn't fully addressed.
@gaborcsardi
Copy link
Member

Just a note that this is still an issue. Hopefully switching to rig will solve it.

@gaborcsardi
Copy link
Member

I haven't seen this for a while, maybe because of the macos updates on GHA. Anyone else has seen this recently?

@gaborcsardi
Copy link
Member

I am going to close this now. Hopefully it is not coming back.

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue and include a link to this issue

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

6 participants