-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable cooldown for failed index.json fetches #33509
Conversation
5962373
to
38621df
Compare
else: | ||
# May need to fetch the index and update the local caches | ||
try: | ||
needs_regen = self._fetch_and_cache_index( | ||
cached_mirror_url, expect_hash=cached_index_hash | ||
) | ||
self._last_fetch_times[cached_mirror_url] = now | ||
self._last_fetch_times[cached_mirror_url] = (now, True) | ||
all_methods_failed = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haampie I have a question that's only tangentially-related to this PR: it appears that if any cached_mirror_url
succeeds, then all_methods_failed
will be false for all the URLs, so this variable is specifically just tracking whether we could update at least one mirror index successfully. Is that what you wanted explicitly (from examining the git blame
, it looks like you added that logic - let me know if I got that wrong)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what it has turned into, but the idea is to have an exception to propagate out of this if nothing worked. Before the pr from my side it was a sad debug message iirc, or no error at all?
@spackbot run pipeline |
I've started that pipeline for you! |
While debugging some pipeline generation jobs from the cloud locally, I just pulled this commit in, but it didn't prevent the hundreds of those messages (example below). I'll run with the debugger and see if I can figure out why not.
|
Oops, too late. |
Does merging this make it harder to debug? I can push a PR that reverts this. I figured it would be easier to figure out if it were in (normally I wouldn't handle it like that, but I wasn't sure how easy it was to debug the CI behavior with a PR). |
In that case, no worries. You're right it will be easier not to need to pull this in separately. |
spack#32137 added an option to update() a BinaryCacheIndex with a cooldown: repeated attempts within this cooldown would not actually retry. However, the cooldown was not properly tracked for failures (which is common when the mirror does not store any binaries and therefore has no index.json). This commit ensures that update(..., with_cooldown=True) will also skip the update even if a failure has occurred within the cooldown period.
I did some debugging to get to the bottom of the repeated, failed attempts to fetch the index during pipeline generation. The problem happens if the configuration contains any mirrors without an index. The |
@scottwittenburg If spack/lib/spack/spack/binary_distribution.py Lines 360 to 364 in 8be6378
Can you check whether it's actually calling |
This is the call site I'm hitting in local testing. When that just returns |
I see now. So in short, this entire code block needs to be put inside a conditional just like this. That would enable the cooldown for "new" mirror urls, which then covers the case where a mirror has no index and is always considered "new."
That is how the logic worked before my PRs, a |
Implementation of the above idea over in #33781. |
spack#32137 added an option to update() a BinaryCacheIndex with a cooldown: repeated attempts within this cooldown would not actually retry. However, the cooldown was not properly tracked for failures (which is common when the mirror does not store any binaries and therefore has no index.json). This commit ensures that update(..., with_cooldown=True) will also skip the update even if a failure has occurred within the cooldown period.
My previous PR (#32137) added a cooldown between fetches of a buildcache's
index.json
when requested, but only when the fetch was successful. In some cases (#32137 (comment)) this causes long repeated fetch failures.Enable the cooldown for failed fetches as well, and propagate the success/failure status when a fetch is prevented by the cooldown.