Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning: Failed to download action 'https://api.github.com/repos/microsoft/powerplatform-actions/zipball/0c80aacb61f9bfdcb930b880febc420ec4ef2f3e'. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing. #416

Closed
Joseph-Duty-VA opened this issue Aug 5, 2023 · 18 comments
Labels
enhancement New feature or request

Comments

@Joseph-Duty-VA
Copy link

I have started seeing this error repeatedly over the past several days across a number of my workflow calls into this repo:

Warning: Failed to download action 'https://api.github.com/repos/microsoft/powerplatform-actions/zipball/0c80aacb61f9bfdcb930b880febc420ec4ef2f3e'. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.

and then it repeatedly fails before erroring out my entire workflow run. Sometimes the retry succeeds, but other times it continues to fail. The other actions from external repos (actions/checkout, some internal ones i have in my customer's org, etc.) all seem to be working just fine.

I am using v0 (e.g. - uses: microsoft/powerplatform-actions/import-data@v0 )

@tehcrashxor
Copy link
Member

I've seen this issue occur intermittently in one of our own workflows. It usually would fail less than once a day, but that's been increasing lately.

The failure itself occurs during the "Set up job" step, which is an autogenerated step created by GitHub Actions itself
image
so there's nothing apparent that we can do to affect that ourselves.

I've opened up a support ticket with GitHub to see if they will assist.

@Joseph-Duty-VA
Copy link
Author

This makes sense. I had opened a ticket here because it seems to only happen when I call actions from this repo. Other actions (from other repos) don't seem to have this same issue.

@camiloqp
Copy link

It is happening to me as well.
Just when using
repos/microsoft/powerplatform-actions

@tehcrashxor
Copy link
Member

No word back from GitHub support on the ticket since Friday.

Our current hypothesis is that this has been occurring more frequently due to the steady size increase of our action's tarball/zipball, as each version of PAC is larger than the last, and those are checked in via git-lfs into the repo. Reducing the size of that should lead to fewer timeouts, which might just occur during periods when GitHub has higher server load?

#424 is an option to do just that - remove PAC from the repo and install via Nuget at action execution time. If we go this route, it will be a breaking change with a major version bump, as all Jobs would need a new install step.

@Joseph-Duty-VA
Copy link
Author

Thanks for the follow-up. To clarify, when you say "it will be a breaking change with a major version bump, as all Jobs would need a new install step" - you are referring to consumers needing to call an additional step actions-install on each job that has a "- uses: microsoft/powerplatform-actions/xxxxxx@v0" step in it would need to have previously called a "- uses: microsoft/powerplatform-actions/actions-install@v0" step - is that right?

@tehcrashxor
Copy link
Member

Correct.
New releases would have their version numbers in the v1.X.Y range instead of the current v0.X.Y, the tag v0 would point to the last v0.X.Y release, and the tags v1 and latest would point to the most recent v1.X.Y release.

@matthewborne13
Copy link

@tehcrashxor
Thank you for looking into this issue. My organization's applications rely on these actions for deployment, and we are concerned that our production rollouts will be unstable until this issue is resolved. One of our deployments this week failed 4 times over the course of 2.5 hours before succeeding on the 5th try. I don't have any additional information to add, but I am adding this comment in case it helps the teams prioritize this higher.

@petrochuk
Copy link
Contributor

Our current plan is to release new version next week.

@tehcrashxor
Copy link
Member

We heard back from GitHub on our support ticket - they're still looking into why the download speed within the Set up job step drops so drastically, but recommended what we're already doing via #424 by reducing the size of tarball/zipball downloaded by the actions runner.

Preliminary findings from that tinkering is that the tarball's size has been cut from 53 MB down to 5.2 MB, and the Set up job step usually takes 10 seconds or less. However I've seen at a couple of runs which have taken closer to 3 minutes, triggering the 100 second timeout, but succeeding on the retry before failing the entire job. Hopefully GitHub will be able to resolve that download issue entirely, but at the very least this should reduce the number of job failures.

@larsxschneider
Copy link

👋 Hello from GitHub!

You have enabled "Git LFS objects in archives" for this repo (the default is "off").

The commit mentioned in the issue here (0c80aacb61f9bfdcb930b880febc420ec4ef2f3e) is referencing 800 files tracked by LFS files and all of those files are added to your archive which seems to cause a timeout for your repo.

$ git checkout 0c80aacb61f9bfdcb930b880febc420ec4ef2f3e
$ git lfs ls-files | wc -l
     800

We are looking into this issue!

Do you need the LFS files in your archive? If not, then disabling the "Git LFS objects in archives" option for this repo should fix the download of older archives. Archives of the lastest main commit seem to work just fine either way because they only track 74 files with LFS.

@petrochuk petrochuk added the enhancement New feature or request label Aug 21, 2023
@tehcrashxor
Copy link
Member

Do you need the LFS files in your archive?

Yep, the Node code is just a thin shim to read the parameters passed in the users YAML to the binaries stored in LFS.
Even with our plans for the upcoming version switching to downloading those from Nuget first instead of storing the LFS, we can't disable the LFS archive inclusion as that would break all previous versions of our Actions.

@tehcrashxor
Copy link
Member

Released new version as v1.0.0, which should alleviate most of the "Set up job" failures as the tarball/zipball is significantly smaller.

As noted in the release notes, this pipelines upgrading to the new version will need both to update the @v0 version specifier to @v1, and will need the actions-install step added prior to any other of our actions to handle grabbing PAC from nuget.org at runtime.

@markwong-synechron
Copy link

markwong-synechron commented Sep 4, 2023

No word back from GitHub support on the ticket since Friday.

Our current hypothesis is that this has been occurring more frequently due to the steady size increase of our action's tarball/zipball, as each version of PAC is larger than the last, and those are checked in via git-lfs into the repo. Reducing the size of that should lead to fewer timeouts, which might just occur during periods when GitHub has higher server load?

#424 is an option to do just that - remove PAC from the repo and install via Nuget at action execution time. If we go this route, it will be a breaking change with a major version bump, as all Jobs would need a new install step.

HI, this method won't work for my clients. Many large and highly regulated company do not allow direct access to nuget hence grabbing Nuget on runtime would be a deadend for them. Their runners are behind firewalls and security won't open access to nuget given the risk. So they would be stuck using v0 and with this timeout issue. Since many of my client's production deployment is heavily impacted.

@petrochuk
Copy link
Contributor

Some highly regulated companies use self hosted GitHub runners.

@danmcpherson
Copy link

danmcpherson commented Sep 5, 2023

I'm seeing these timeout error more often than I'm not seeing them. I've moved to v1 and added the Install-Action. If I try to manually download the file "https://api.github.com/repos/microsoft/powerplatform-actions/zipball/09afea19cc361004739641ee6dda4ee7d7fac716" from my browser, it gets to 5.6 MB quite quickly, but then it just sits there waiting, which I'm sure is also happening inside the Github Action.

image

@tehcrashxor
Copy link
Member

I'm seeing these timeout error more often than I'm not seeing them. I've moved to v1

Our monitoring job showed a fair number of v1 timeouts over Sunday and Monday, but it's still failing significantly less often for us than the v0 runs. We've updated our GitHub support ticket with that info, and we know of at least one large customer that has their own opened as well to investigate the timeouts, but haven't had anything further from GitHub yet.

HI, this method won't work for my clients. Many large and highly regulated company do not allow direct access to nuget hence grabbing Nuget on runtime would be a deadend for them.

We're looking into updating the actions-install step to support taking the package from either internal feeds or directly from a .nupkg file already on the box, so that such customers can obtain the package and provide it on the build agent / action runner without a direct call to nuget.org

Obtaining PAC another way and installing into the expected location that actions-install/index.ts uses and setting an environment variable named POWERPLATFORMTOOLS_PACINSTALLED to true is sufficient to get the other v1 actions running without our own actions-install step, though is a bit hacky and shouldn't be necessary after we update the install step.

@tehcrashxor
Copy link
Member

tehcrashxor commented Sep 18, 2023

We have a monitoring job that runs every hour, which runs four jobs to check that the actions are downloading and running on the agent correctly. These jobs are two v0 who-am-i jobs (one Windows, one Ubuntu), and another two jobs consisting of v1 actions-install followed by who-am-i (also one Windows, one Ubuntu).

In the last week, we've only seen a single failure of one v0 job failing, so it looks like the process is working better now.

GitHub Support suggests that this resolved incident https://www.githubstatus.com/incidents/frdfpnnt85s8 was likely a fix for the download slowness and failure, but we're not 100% convinced as it was Resolved on September 5th, but there were a series of failures after that on September 8th through September 10th.

@tehcrashxor
Copy link
Member

GitHub Support elaborated that the incident marked Resolved on Sept 5th wasn't fully remediated until Sept 10th, explaining the failures we saw from the 8th - 10th.

As we've only had a single failure in the monitoring job for the last week, it looks like they may well have fixed the issue. We'll monitor for further failures and reopen this issue and our GitHub Support Ticket if we see it start to fail again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants