Skip to content

arm: validate archive and retry on all errors for FVP and toolchain downloads#19309

Merged
rascani merged 1 commit intopytorch:mainfrom
rascani:fix-arm-fvp-toolchain-download-resilience
May 5, 2026
Merged

arm: validate archive and retry on all errors for FVP and toolchain downloads#19309
rascani merged 1 commit intopytorch:mainfrom
rascani:fix-arm-fvp-toolchain-download-resilience

Conversation

@rascani
Copy link
Copy Markdown
Contributor

@rascani rascani commented May 5, 2026

Summary

The Test ARM Backend workflow has been intermittently failing with curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR during the FVP corstone download from developer.arm.com's CDN. The toolchain download in the same setup uses the same bare-curl pattern and fails the same way when the CDN flakes. In both cases the previous flow was a single curl --output ... followed by a fatal verify_md5, so neither a transient HTTP/2 reset nor a short error body that curl treats as a successful 200 was retried.

Factor out a download_with_retry helper in utils.sh that wraps the download in a five-attempt outer loop using
curl --fail --retry-all-errors and validates each attempt against the published MD5 before proceeding, with the on-disk file size logged on failure for diagnosis. Switch verify_md5's mismatch path from exit 2 to return 2 so the helper can treat a bad checksum as a retryable failure; existing callers (verify_md5 ... || exit 1) keep the same fatal-on-mismatch behavior since the function still returns non-zero on a bad checksum.

Use the helper from both fvp_utils.sh and toolchain_utils.sh in place of the bare curl + verify_md5 pair.

Authored with Claude Code.

Test plan

CI

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

…ownloads

The Test ARM Backend workflow has been intermittently failing with
`curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR`
during the FVP corstone download from developer.arm.com's CDN. The
toolchain download in the same setup uses the same bare-curl pattern
and fails the same way when the CDN flakes. In both cases the previous
flow was a single `curl --output ...` followed by a fatal `verify_md5`,
so neither a transient HTTP/2 reset nor a short error body that curl
treats as a successful 200 was retried.

Factor out a `download_with_retry` helper in utils.sh that wraps the
download in a five-attempt outer loop using
`curl --fail --retry-all-errors` and validates each attempt against
the published MD5 before proceeding, with the on-disk file size logged
on failure for diagnosis. Switch verify_md5's mismatch path from
`exit 2` to `return 2` so the helper can treat a bad checksum as a
retryable failure; existing callers (`verify_md5 ... || exit 1`) keep
the same fatal-on-mismatch behavior since the function still returns
non-zero on a bad checksum.

Use the helper from both fvp_utils.sh and toolchain_utils.sh in place
of the bare `curl` + `verify_md5` pair.

Authored with Claude Code.
@rascani rascani requested a review from digantdesai as a code owner May 5, 2026 21:48
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 5, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19309

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 171 Pending, 1 Unrelated Failure

As of commit 7150544 with merge base acffcb0 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 5, 2026
@github-actions github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels May 5, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@rascani rascani merged commit e0cc468 into pytorch:main May 5, 2026
426 of 449 checks passed
@rascani rascani deleted the fix-arm-fvp-toolchain-download-resilience branch May 5, 2026 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: arm Issues related to arm backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants