Skip to content
This repository has been archived by the owner on Aug 3, 2020. It is now read-only.

Fix error handling #121

Merged
merged 3 commits into from
Aug 23, 2017
Merged

Fix error handling #121

merged 3 commits into from
Aug 23, 2017

Conversation

cjellick
Copy link
Contributor

@cjellick cjellick commented Aug 23, 2017

Addresses rancher/rancher#9669 (and then some)

I ended up revamping our error handling a bit.

The end result is:

  1. Functionality change: calls to the refresh api will now return 500 if the update failed. Previously, it would always return 200.
  2. There is a timeout of 10 seconds added to to attempting to check the remote hash of our git mirror and all github.com repos. Furthermore, if the request times out, we do not attempt to perform the git fetch, as that can take up to five minutes
  3. Reused an http client instead of creating a new one everytime we want to check a remote hash
  4. Upgrade got to 1.8 to get a fix to copy headers when the client receives a redirect. This allowed us to remove a custom redirect handler that was copying headers.
  5. If an error is encountered when updating an existing catalog that prevents us from updating the local git repo do not perform a DB update of the catalogs/templates. Previous behavior was to go ahead an update the db entries again, even though the git fetch failed.
  6. Improved the logging around errors encountered when doing an auto refresh.
    You used to get:
time="2017-08-22T17:42:57Z" level=error msg="Failed to update existing repo cache: exit status 128" 
time="2017-08-22T17:43:10Z" level=error msg="Failed to update existing repo cache: exit status 128" 
time="2017-08-22T17:47:31Z" level=error msg="Failed to update existing repo cache: exit status 128" 
time="2017-08-22T17:47:44Z" level=error msg="Failed to update existing repo cache: exit status 128" 

you now get:

timeme="2017-08-23T16:34:34Z" level=error msg="Multiple errors encountered performing catalog refresh" 
time="2017-08-23T16:34:34Z" level=error msg="Catalog refresh failed for community (https://git.rancher.io/community-catalog.git): Remote commit check failed: Repo [https://git.rancher.io/community-catalog.git] is not accessible: Get https://git.rancher.io/repos/community-catalog/commits/master: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" 
time="2017-08-23T16:34:34Z" level=error msg="Catalog refresh failed for library (https://git.rancher.io/rancher-catalog.git): Remote commit check failed: Repo [https://git.rancher.io/rancher-catalog.git] is not accessible: Get https://git.rancher.io/repos/rancher-catalog/commits/v1.6-development: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" 

@@ -28,56 +26,6 @@ func HeadCommit(path string) (string, error) {
return strings.Trim(string(output), "\n"), err
}

func formatGitURL(endpoint, branch string) string {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved these two functions so that they could be attached to a struct so that I could reuse an http client, as opposed to creating a new client on each request, as it was previously doing.

res, err := m.httpClient.Do(req)
if err != nil {
// Return timeout errors so caller can decide whether or not to proceed with updating the repo
if uErr, ok := err.(*url.Error); ok && uErr.Timeout() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to only bubble up the error if it was explicitly a timeout error. The goal is to prevent the next step of doing a git fetch or clone from also timing out (which will take much logner). For non-timeout errors, the thinking is that the clone/fetch will either fail the same way or it wont.

}

func NewManager(cacheRoot string, configFile string, strict bool, db *gorm.DB, uuid string) *Manager {
client := http.Client{
Timeout: time.Second * 10,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This client is used for doing the remote sha checking. Is 10 seconds a safe amount of time to wait for a response?

This pulls in a change to automatically copy most headers on http
redirects. We had a custom redirect handler doing this, but can now
remove it.
Craig Jellick added 2 commits August 23, 2017 09:48
Swallow fewer errors.
Add meaningful context to errors.
Properly return 500 response codes on calls to the refresh API if the refresh fails.
@@ -173,20 +173,26 @@ func formatDSN(user, password, address, dbname, params string) string {
return mysqlConfig.FormatDSN()
}

func refresh(m *manager.Manager, update bool) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this function simply because there were a bunch of functions with "refresh" in their name and it was quite confusing. I wanted to make it obvious that this function served no purpose other than wrapping the call to RefreshAll and logging the result in the context of autorefresh

if err = git.Update(repoPath, branch); err != nil {
// Ignore error unless running in strict mode
if m.strict {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is strict mode always on now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, strict mode is still only turned on for validating catalogs. But as far as I can tell, its primary purpose is to validate the yaml files themselves. That it still controlled by the strict flag.

I removed the strict check here to accomplish the following:
If an error is encountered when updating an existing catalog that prevents us from updating the local git repo do not perform a DB update of the catalogs/templates. Previous behavior was to go ahead an update the db entries again, even though the git fetch failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. This makes sense.

@@ -71,11 +86,11 @@ func (m *Manager) refreshConfigCatalogs(update bool) error {
catalog = existingCatalog
}
if err := m.refreshCatalog(catalog, update); err != nil {
refreshErrors = append(refreshErrors, fmt.Sprintf("%s: %v", catalog.Name, err))
refreshErrors = append(refreshErrors, errors.Wrapf(err, "Catalog refresh failed for %v (%v)", catalog.Name, catalog.URL))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First should be %s, right? Not sure if it really matters.

Copy link
Contributor Author

@cjellick cjellick Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesnt matter. %v on a string gets you the string

@joshwget
Copy link
Contributor

LGTM. Error handling is much nicer now.

@cjellick cjellick merged commit 9074fab into rancher:master Aug 23, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants