Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry RPM cloning on http/502 error (bad gateway) #7373

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,14 @@ import (
"os"
"path/filepath"
"strings"
"time"

"github.com/microsoft/CBL-Mariner/toolkit/tools/internal/buildpipeline"
"github.com/microsoft/CBL-Mariner/toolkit/tools/internal/logger"
"github.com/microsoft/CBL-Mariner/toolkit/tools/internal/packagerepo/repocloner"
"github.com/microsoft/CBL-Mariner/toolkit/tools/internal/packagerepo/repomanager/rpmrepomanager"
"github.com/microsoft/CBL-Mariner/toolkit/tools/internal/pkgjson"
"github.com/microsoft/CBL-Mariner/toolkit/tools/internal/retry"
"github.com/microsoft/CBL-Mariner/toolkit/tools/internal/safechroot"
"github.com/microsoft/CBL-Mariner/toolkit/tools/internal/shell"
"github.com/microsoft/CBL-Mariner/toolkit/tools/internal/tdnf"
Expand Down Expand Up @@ -598,9 +600,14 @@ func (r *RpmRepoCloner) Close() error {
// It will gradually enable more repos to consider until the package is found.
func (r *RpmRepoCloner) clonePackage(baseArgs []string) (preBuilt bool, err error) {
const (
unresolvedOutputPrefix = "No package"
toyboxConflictsPrefix = "toybox conflicts"
unresolvedOutputPostfix = "available"
// With 6 attempts, initial delay of 1 second, and a backoff factor of 3.0 the total time spent retrying will be
// 1 + 3 + 9 + 27 + 81 = 121 seconds.
//
// *NOTE* These values are copied from downloader/downloader.go; they need not be the same but seemed like a
// good enough starting point.
Comment on lines +605 to +607
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment here just for the sake of the draft? It looks like something that we may not want in the final version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually appreciate suggestions of how best to share these -- or perspectives on whether they should be? (Should we create a wrapper with these defaults?)

Perhaps you or @dmcilvaney have thoughts on that?

downloadRetryAttempts = 6
failureBackoffBase = 3.0
downloadRetryDuration = time.Second
)

releaseverCliArg, err := tdnf.GetReleaseverCliArg()
Expand All @@ -615,47 +622,92 @@ func (r *RpmRepoCloner) clonePackage(baseArgs []string) (preBuilt bool, err erro

finalArgs := append(baseArgs, reposArgs...)

var (
stdout string
stderr string
)
stdout, stderr, err = shell.Execute("tdnf", finalArgs...)
// We run in a retry loop on errors deemed retriable.
cancel := make(chan struct{})
retryNum := 1
PawelWMS marked this conversation as resolved.
Show resolved Hide resolved
_, err = retry.RunWithExpBackoff(func() error {
downloadErr, retriable := tdnfDownload(finalArgs...)
if downloadErr != nil {
if retriable {
logger.Log.Warnf("Attempt %d/%d: Failed to clone packages", retryNum, downloadRetryAttempts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change that to a debug entry. These messages tend to generate noise, confuse people who don't know why they are generated (which is most users) and are red herrings when we debug issues.

Suggested change
logger.Log.Warnf("Attempt %d/%d: Failed to clone packages", retryNum, downloadRetryAttempts)
logger.Log.Debugf("Attempt %d/%d: Failed to clone packages", retryNum, downloadRetryAttempts)

} else {
close(cancel)
}
}

logger.Log.Debugf("stdout: %s", stdout)
logger.Log.Debugf("stderr: %s", stderr)
retryNum++
return downloadErr
}, downloadRetryAttempts, downloadRetryDuration, failureBackoffBase, cancel)

if err != nil {
logger.Log.Debugf("tdnf error (will continue if the only errors are toybox conflicts):\n '%s'", stderr)
if err == nil {
preBuilt = r.reposArgsHaveOnlyLocalSources(reposArgs)
break
}
}

// ============== TDNF SPECIFIC IMPLEMENTATION ==============
// Check if TDNF could not resolve a given package. If TDNF does not find a requested package,
// it will not error. Instead it will print a message to stdout. Check for this message.
//
// *NOTE*: TDNF will attempt best effort. If N packages are requested, and 1 cannot be found,
// it will still download N-1 packages while also printing the message.
splitStdout := strings.Split(stdout, "\n")
for _, line := range splitStdout {
trimmedLine := strings.TrimSpace(line)
// Toybox conflicts are a known issue, reset the err value if encountered
if strings.HasPrefix(trimmedLine, toyboxConflictsPrefix) {
logger.Log.Warn("Ignoring known toybox conflict")
err = nil
continue
}
// If a package was not available, update err
if strings.HasPrefix(trimmedLine, unresolvedOutputPrefix) && strings.HasSuffix(trimmedLine, unresolvedOutputPostfix) {
err = fmt.Errorf(trimmedLine)
break
}
}
return
}

if err == nil {
preBuilt = r.reposArgsHaveOnlyLocalSources(reposArgs)
func tdnfDownload(args ...string) (err error, retriable bool) {
const (
unresolvedOutputPrefix = "No package"
toyboxConflictsPrefix = "toybox conflicts"
unresolvedOutputPostfix = "available"
PawelWMS marked this conversation as resolved.
Show resolved Hide resolved
)

var (
stdout string
stderr string
)
stdout, stderr, err = shell.Execute("tdnf", args...)

logger.Log.Debugf("stdout: %s", stdout)
logger.Log.Debugf("stderr: %s", stderr)

if err != nil {
logger.Log.Debugf("tdnf error (will continue if the only errors are toybox conflicts):\n '%s'", stderr)
}

// ============== TDNF SPECIFIC IMPLEMENTATION ==============
//
// Check if TDNF could not resolve a given package. If TDNF does not find a requested package,
// it will not error. Instead it will print a message to stdout. Check for this message.
//
// *NOTE*: TDNF will attempt best effort. If N packages are requested, and 1 cannot be found,
// it will still download N-1 packages while also printing the message.
splitStdout := strings.Split(stdout, "\n")
for _, line := range splitStdout {
trimmedLine := strings.TrimSpace(line)
// Toybox conflicts are a known issue, reset the err value if encountered
if strings.HasPrefix(trimmedLine, toyboxConflictsPrefix) {
logger.Log.Warn("Ignoring known toybox conflict")
err = nil
continue
}
Comment on lines +682 to +686
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've seen a question about toybox and if we still need this check. It looks like toybox is not present in 2.0, so I think we can remove this logic completely.

// If a package was not available, update err
if strings.HasPrefix(trimmedLine, unresolvedOutputPrefix) && strings.HasSuffix(trimmedLine, unresolvedOutputPostfix) {
err = fmt.Errorf(trimmedLine)
break
}
}

//
// *NOTE*: There are cases in which some of our upstream package repositories are hosted
// on services that are prone to intermittent errors (e.g., HTTP 502 errors). We
// specifically look for such known cases and apply some retry logic in hopes of getting
// a better result; note that we don't indiscriminately retry because there are legitimate
// cases in which the upstream repo doesn't contain the package and a 404 error is to be
// expected. This involves scraping through stderr, but it's better than not doing so.
//
if err != nil {
for _, line := range strings.Split(stderr, "\n") {
if strings.Contains(line, "Error: 502 when downloading") {
logger.Log.Warn("Encountered possibly intermittent HTTP 502 error.")
PawelWMS marked this conversation as resolved.
Show resolved Hide resolved
retriable = true
}
}
Comment on lines +703 to +708
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. Do we only want to deal with 502s or all 5XX errors? Since we're here, it might be worth handling all server-side issues.
  2. What do you think about dropping the for loop in favour of a regular expression?
Suggested change
for _, line := range strings.Split(stderr, "\n") {
if strings.Contains(line, "Error: 502 when downloading") {
logger.Log.Warn("Encountered possibly intermittent HTTP 502 error.")
retriable = true
}
}
serverErrorMatch := serverErrorsRegex.FindStringSubmatch(stderr)
if len(serverErrorMatch) > errorCodeIndex {
logger.Log.Warn("Encountered possibly intermittent HTTP %d error.", serverErrorMatch[errorCodeIndex])
retriable = true
}

The regex and the index could be defined as global, private variables (making it global forces a one-time compilation of the regex during program's start-up):

// In global context:
var (
(...)
serverErrorsRegex = regexp.MustCompile(`(?m)Error: (5\d{2}) when downloading`)
errorCodeIndex = 1
...

}

return
}

Expand Down
Loading