Update lint_urls.sh #153246

shoumikhin · 2025-05-09T15:01:05Z

Treat 403, 429 and 503 http errors as success.
Ignore non-verbal hostnames.
Kill child jobs immediately.

Treat 403, 429 and 503 http errors as success. Ignore non-verbal hostnames.

pytorch-bot · 2025-05-09T15:01:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153246

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 1dca413 with merge base 916f6ba ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh) (similar failure)
MISSING REGRESSION TEST

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, ephemeral.linux.2xlarge) (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

albanD · 2025-05-09T15:09:56Z

Not sure what's the logic behind this change?
Why this vs only considering 404 as failures?
tbh I don't know my http err code by heart, so not sure how many other variants we want to consider failures?

shoumikhin · 2025-05-09T15:19:02Z

@albanD other things we likely want to catch:

410 - the resource used to exist, but is permanently removed
401 - need credentials to see it: likely some private link leaked
400 - the request URL itself was malformed
5xx - could be a genuine outage or mis-deployment - worth noticing

So we're trying to be on a safer side and ignore only a few codes related to "too many requests" or Cloudflare and similar challenges.

albanD

Ok, what's your plan to monitor the flakyness of this to know when it's ok to re-enable?

shoumikhin · 2025-05-09T15:48:25Z

The plan is to turn it back on once it proves to be green on nightly and in logs on every PR for a long period of time (except those that do actually break urls).

malfet

Sure, but is it possible to replace shell scripts with Python (grev -Ev is GNU extension), so one would not be able to run this on their Mac or Win machine to reproduce linter failure

shoumikhin · 2025-05-09T17:22:15Z

@malfet overall, yes, we can make it a python script in future to support Windows (for users w/o Cygwin, etc.). And it should work on macOS, feel free to try ./scripts/lint_urls.sh.

malfet · 2025-05-09T17:24:04Z

The plan is to turn it back on once it proves to be green on nightly and in logs on every PR for a long period of time (except those that do actually break urls).

This does not sound like a good plan to me, because any system that relies on all the websites on the internet to be always available is flaky. There should be a periodic process that crawls the URLs and updates it's availability in DB of some sort, which can have something like availability_raiting which is number between 0 (unavailable) to 10 (always available) and every time this periodic job runs it increases or decreases availability by one (clamped to 0-10 range)

shoumikhin · 2025-05-09T18:18:42Z

@malfet if I got that right, you're proposing to remember for each URL whether it worked during past checks, and have some heuristics to either forgive or flag if it fails now, depending on its previous status?
That sounds robust, and a bit like over-engineering to me, at least for this specific task.

Let's step back for a moment.

What problem we're trying to solve?
Maintain URLs to keep them valid.

How can we do that?
Check if they are alive periodically, and at the time they are introduced/modified.

Can URLs get temporarily broken not due to our fault?
Of course. And they will.

What do we do if we find a broken link with a nightly job?
Triage and address it manually. E.g. check the logs and, as an option, look up if it failed in previous jobs too and make a decision on how healthy it is. The periodic job should be FYI only and should not be treated as a failure (maybe return success always if there's no concept of FYI job?). It can also come handy as a sanity check, especially before the next release.

Can we make the nightly job always pass unless some URLs have been failing consistently for a long time?
Yes, by keeping a log of the health score per each URL. Although, is it worth the effort? Not sure.

What do we do if we find a broken URL with a PR job?
The PR job checks the modified/added code only. In rare case someone changed a line with an URL w/ or w/o actually touching the link and it turned out to be broken, it's still a good hint to either fix it along the way, or add the lint ignore comment/label to the PR explicitly to flag it and move on.

Anyhow, this particular PR is to improve the script and let it skip some of the expected HTTP errors. It doesn't change when the script is run.

Or ignore them. Found by running the lint_urls.sh script locally with #153246 Pull Request resolved: #153277 Approved by: https://github.com/malfet

shoumikhin · 2025-05-14T16:52:57Z

@pytorchbot merge -f "lint only changes"

pytorchmergebot · 2025-05-14T16:54:33Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: replace read_config with select For more info, please refer to the [doc](https://docs.google.com/document/d/1e0Hvht8WEHhcRvlCAodq_R9xnAtKBrAhdyvxcAqQjCw/edit?tab=t.hl8j18gza0cv) Test Plan: CI Reviewed By: malfet Differential Revision: D70267850 Pull Request resolved: #148925 Approved by: https://github.com/malfet

Update lint_urls.sh

cdf681e

Treat 403, 429 and 503 http errors as success. Ignore non-verbal hostnames.

pytorch-bot bot added the topic: not user facing topic category label May 9, 2025

shoumikhin requested review from albanD and malfet May 9, 2025 15:01

Update lint_urls.sh

298ad61

albanD reviewed May 9, 2025

View reviewed changes

malfet approved these changes May 9, 2025

View reviewed changes

shoumikhin added 2 commits May 9, 2025 11:32

Update lint_urls.sh

0d56342

Update lint_urls.sh

28d0a74

shoumikhin mentioned this pull request May 9, 2025

Fix more URLs #153277

Closed

Update lint_urls.sh

1dca413

pytorchmergebot pushed a commit that referenced this pull request May 14, 2025

Fix more URLs (#153277)

7d39e73

Or ignore them. Found by running the lint_urls.sh script locally with #153246 Pull Request resolved: #153277 Approved by: https://github.com/malfet

pytorchmergebot added the merging label May 14, 2025

pytorchmergebot closed this in ba70876 May 14, 2025

pytorchmergebot added Merged and removed merging labels May 14, 2025

github-actions bot deleted the shoumikhin-patch-10 branch June 18, 2025 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update lint_urls.sh #153246

Update lint_urls.sh #153246

Uh oh!

shoumikhin commented May 9, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 9, 2025 •

edited

Loading

Uh oh!

albanD commented May 9, 2025

Uh oh!

shoumikhin commented May 9, 2025 •

edited

Loading

Uh oh!

albanD left a comment

Uh oh!

shoumikhin commented May 9, 2025

Uh oh!

malfet left a comment

Uh oh!

shoumikhin commented May 9, 2025

Uh oh!

malfet commented May 9, 2025 •

edited

Loading

Uh oh!

shoumikhin commented May 9, 2025

Uh oh!

shoumikhin commented May 14, 2025

Uh oh!

pytorchmergebot commented May 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Update lint_urls.sh #153246

Update lint_urls.sh #153246

Uh oh!

Conversation

shoumikhin commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153246

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

albanD commented May 9, 2025

Uh oh!

shoumikhin commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

shoumikhin commented May 9, 2025

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

shoumikhin commented May 9, 2025

Uh oh!

malfet commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoumikhin commented May 9, 2025

Uh oh!

shoumikhin commented May 14, 2025

Uh oh!

pytorchmergebot commented May 14, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shoumikhin commented May 9, 2025 •

edited

Loading

pytorch-bot bot commented May 9, 2025 •

edited

Loading

shoumikhin commented May 9, 2025 •

edited

Loading

malfet commented May 9, 2025 •

edited

Loading