Enable cooldown for failed index.json fetches #33509

blue42u · 2022-10-25T16:16:11Z

My previous PR (#32137) added a cooldown between fetches of a buildcache's index.json when requested, but only when the fetch was successful. In some cases (#32137 (comment)) this causes long repeated fetch failures.

Enable the cooldown for failed fetches as well, and propagate the success/failure status when a fetch is prevented by the cooldown.

lib/spack/spack/binary_distribution.py

scheibelp · 2022-10-25T17:39:37Z

lib/spack/spack/binary_distribution.py

                else:
                    # May need to fetch the index and update the local caches
                    try:
                        needs_regen = self._fetch_and_cache_index(
                            cached_mirror_url, expect_hash=cached_index_hash
                        )
-                        self._last_fetch_times[cached_mirror_url] = now
+                        self._last_fetch_times[cached_mirror_url] = (now, True)
                        all_methods_failed = False


@haampie I have a question that's only tangentially-related to this PR: it appears that if any cached_mirror_url succeeds, then all_methods_failed will be false for all the URLs, so this variable is specifically just tracking whether we could update at least one mirror index successfully. Is that what you wanted explicitly (from examining the git blame, it looks like you added that logic - let me know if I got that wrong)?

I don't know what it has turned into, but the idea is to have an exception to propagate out of this if nothing worked. Before the pr from my side it was a sad debug message iirc, or no error at all?

lib/spack/spack/binary_distribution.py

scheibelp · 2022-10-25T18:05:45Z

@spackbot run pipeline

spackbot-app · 2022-10-25T18:05:53Z

I've started that pipeline for you!

scottwittenburg · 2022-10-25T18:36:22Z

While debugging some pipeline generation jobs from the cloud locally, I just pulled this commit in, but it didn't prevent the hundreds of those messages (example below). I'll run with the debugger and see if I can figure out why not.

==> [2022-10-25-18:20:32.321047] Checking existence of https://mirror.spack.io/build_cache/index.json
==> [2022-10-25-18:20:32.466065] Failure reading URL: Download failed: HTTP Error 404: Not Found

scottwittenburg · 2022-10-25T18:36:39Z

Oops, too late.

scheibelp · 2022-10-25T18:38:05Z

Does merging this make it harder to debug? I can push a PR that reverts this. I figured it would be easier to figure out if it were in (normally I wouldn't handle it like that, but I wasn't sure how easy it was to debug the CI behavior with a PR).

scottwittenburg · 2022-10-25T18:39:38Z

In that case, no worries. You're right it will be easier not to need to pull this in separately.

spack#32137 added an option to update() a BinaryCacheIndex with a cooldown: repeated attempts within this cooldown would not actually retry. However, the cooldown was not properly tracked for failures (which is common when the mirror does not store any binaries and therefore has no index.json). This commit ensures that update(..., with_cooldown=True) will also skip the update even if a failure has occurred within the cooldown period.

scottwittenburg · 2022-11-07T19:07:47Z

I did some debugging to get to the bottom of the repeated, failed attempts to fetch the index during pipeline generation. The problem happens if the configuration contains any mirrors without an index. The _fetch_and_cache_index function detects that condition and simply returns False, but given the new behavior, I'm guessing it should maybe also add an entry to self._last_fetch_times?

blue42u · 2022-11-07T19:22:27Z

@scottwittenburg If _fetch_and_cache_index returns False it should already add an entry to self._last_fetch_times:

spack/lib/spack/spack/binary_distribution.py

Lines 360 to 364 in 8be6378

    
           needs_regen = self._fetch_and_cache_index( 
        
               cached_mirror_url, expect_hash=cached_index_hash 
        
           ) 
        
           self._last_fetch_times[cached_mirror_url] = (now, True) 
        
           all_methods_failed = False

Can you check whether it's actually calling _fetch_and_cache_index after the first call (i.e. is this conditional always false)? I'm wondering if the fetch failed in the first call, and after that it's constantly re-raising the FetchCacheError on every call to update() from find_by_hash().

scottwittenburg · 2022-11-07T19:37:08Z

This is the call site I'm hitting in local testing. When that just returns False because there is no index, I guess it does put an entry into the _last_fetch_times (entering True for the success/failure indicator, is that ok?). And that's the line that triggers the index fetch every time. If False is returned here, then the mirror url never gets put into self._local_cache_index, and hence it never gets into the loop with that conditional you linked above.

blue42u · 2022-11-07T19:52:16Z

I see now. So in short, this entire code block needs to be put inside a conditional just like this. That would enable the cooldown for "new" mirror urls, which then covers the case where a mirror has no index and is always considered "new."

entering True for the success/failure indicator, is that ok?

That is how the logic worked before my PRs, a _fetch_and_cache_index call that returns is considered "successful" and prevents throwing a FetchCacheError. IMHO the logic doesn't make much sense, but I'm not the author here.

blue42u · 2022-11-09T15:22:30Z

Implementation of the above idea over in #33781.

spack#32137 added an option to update() a BinaryCacheIndex with a cooldown: repeated attempts within this cooldown would not actually retry. However, the cooldown was not properly tracked for failures (which is common when the mirror does not store any binaries and therefore has no index.json). This commit ensures that update(..., with_cooldown=True) will also skip the update even if a failure has occurred within the cooldown period.

spackbot-app bot added binary-packages core PR affects Spack core functionality labels Oct 25, 2022

blue42u mentioned this pull request Oct 25, 2022

Update the binary index before attempting direct fetches #32137

Merged

scheibelp self-assigned this Oct 25, 2022

scheibelp reviewed Oct 25, 2022

View reviewed changes

lib/spack/spack/binary_distribution.py Show resolved Hide resolved

BinaryCacheIndex: enable cooldown for failed fetches

38621df

blue42u force-pushed the pr-index-cooldown-on-failure branch from 5962373 to 38621df Compare October 25, 2022 17:38

scheibelp reviewed Oct 25, 2022

View reviewed changes

scheibelp approved these changes Oct 25, 2022

View reviewed changes

scheibelp merged commit 9d5151b into spack:develop Oct 25, 2022

blue42u deleted the pr-index-cooldown-on-failure branch November 9, 2022 14:55

blue42u mentioned this pull request Nov 9, 2022

Enable fetch cooldown in more cases #33781

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable cooldown for failed index.json fetches #33509

Enable cooldown for failed index.json fetches #33509

blue42u commented Oct 25, 2022

scheibelp Oct 25, 2022

haampie Oct 25, 2022

scheibelp commented Oct 25, 2022

spackbot-app bot commented Oct 25, 2022

scottwittenburg commented Oct 25, 2022

scottwittenburg commented Oct 25, 2022

scheibelp commented Oct 25, 2022

scottwittenburg commented Oct 25, 2022

scottwittenburg commented Nov 7, 2022

blue42u commented Nov 7, 2022

scottwittenburg commented Nov 7, 2022

blue42u commented Nov 7, 2022

blue42u commented Nov 9, 2022

Enable cooldown for failed index.json fetches #33509

Enable cooldown for failed index.json fetches #33509

Conversation

blue42u commented Oct 25, 2022

scheibelp Oct 25, 2022

Choose a reason for hiding this comment

haampie Oct 25, 2022

Choose a reason for hiding this comment

scheibelp commented Oct 25, 2022

spackbot-app bot commented Oct 25, 2022

scottwittenburg commented Oct 25, 2022

scottwittenburg commented Oct 25, 2022

scheibelp commented Oct 25, 2022

scottwittenburg commented Oct 25, 2022

scottwittenburg commented Nov 7, 2022

blue42u commented Nov 7, 2022

scottwittenburg commented Nov 7, 2022

blue42u commented Nov 7, 2022

blue42u commented Nov 9, 2022