Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/maintner: isTempError improvements #28994

Open
orthros opened this issue Nov 29, 2018 · 3 comments

Comments

@orthros
Copy link

commented Nov 29, 2018

Currently isTempError logs the error and vacuously returns true

func isTempErr(err error) bool {
	log.Printf("IS TEMP ERROR? %T %v", err, err)
	return true
}

This is used in the sync function's error group members in conjunction with loop to determine if the error returned from the various source sync functions is a temporary error or not.

for {
  err := gr.sync(ctx, token, loop)
  if loop && isTempErr(err) {
	log.Printf("Temporary error from github %v: %v", gr.ID(), err)
	time.Sleep(30 * time.Second)
	continue
  }
  log.Printf("github sync ending for %v: %v", gr.ID(), err)
	return err
}

This guarantees that if SyncLoop is called, the routines will loop forever (loop is true and isTempErr always returns true), but it also means that if Sync is called, and any repository sync operation fails, the entire error group will exit. In a corpus that is tracking hundreds of repositories with tens of thousands of issues, a single temporary error in the Sync call will always abort all other routines in the group.

It would be useful to be able to provide criteria for what constitutes a temporary error or not.

I propose something similar to the following:

type errCheck func(error) bool

// SyncLoop runs forever (until an error or context expiration) and
// updates the corpus as the tracked sources change.
func (c *Corpus) SyncLoop(ctx context.Context, check errCheck) error {
	if check == nil {
		check = isTempErr
	}
	return c.sync(ctx, true, check)
}

// Sync updates the corpus from its tracked sources.
func (c *Corpus) Sync(ctx context.Context, check errCheck) error {
	if check == nil {
		check = isTempErr
	}
	return c.sync(ctx, false, check)
}
func (c *Corpus) sync(ctx context.Context, loop bool, check errCheck) error {
	if _, ok := c.mutationSource.(*netMutSource); ok {
		return errors.New("maintner: can't run Corpus.Sync on a Corpus using NetworkMutationSource (did you mean Update?)")
	}

	group, ctx := errgroup.WithContext(ctx)
	for _, w := range c.watchedGithubRepos {
		gr, token := w.gr, w.token
		group.Go(func() error {
			log.Printf("Polling %v ...", gr.id)
			for {
				err := gr.sync(ctx, token, loop)
                                isTempErr := check(err)
				if loop && isTempErr {
					log.Printf("Temporary error from github %v: %v", gr.ID(), err)
					time.Sleep(30 * time.Second)
					continue
				}
				log.Printf("github sync ending for %v: %v", gr.ID(), err)
                                if isTempErr {
                                    err = nil
                                }
				return err
			}
		})
	}
  // Rest of function similarly calls check
}

This would give the consumers of corpus the ability to determine what constitutes a temporary error, but default back to the old behaviour by passing nil to Sync and SyncLoop

@gopherbot gopherbot added this to the Unreleased milestone Nov 29, 2018

@gopherbot gopherbot added the Builders label Nov 29, 2018

@gopherbot

This comment has been minimized.

Copy link

commented Nov 29, 2018

Change https://golang.org/cl/151658 mentions this issue: maintner: add ability to define custom checks on sync errors

@andybons

This comment has been minimized.

Copy link
Member

commented Feb 5, 2019

@dmitshur

This comment has been minimized.

Copy link
Member

commented Feb 6, 2019

if Sync is called, and any repository sync operation fails, the entire error group will exit
...
a single temporary error in the Sync call will always abort all other routines in the group

I agree that behavior of Sync (without loop) is suboptimal in that regard. I hope we can come up with a solution that is better than asking the user to provide a custom isTempErr function.

How would you like Sync to behave if one of the repos runs into an error? Should it retry temporary errors up to a fixed number of times, before giving up (and aborting all other repo goroutines)? Or should it collect errors it runs into, and report them at the end (without aborting other repo goroutines)? Or a mix of the two?

If we decide to make it possible for users to provide a custom isTempErr, perhaps it'd be cleaner to make it an exported field on Corpus rather than a new parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.