Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/pkgsite: devtools/cmd/seeddb -keep_going defers termination until all modules have had a fetch attempt #48587

Open
sfllaw opened this issue Sep 23, 2021 · 3 comments

Comments

@sfllaw
Copy link

@sfllaw sfllaw commented Sep 23, 2021

Current behaviour

When devtools/cmd/seeddb encounters any errors at all, it immediately aborts:

username@host:pkgsite$ go run ./devtools/cmd/seeddb -seed <(echo github.com/quantcast/g2@all)
...
2021/09/23 11:11:35 Info (map[fetch:github.com/quantcast/g2@v0.0.1]): Updated module version state for github.com/quantcast/g2@v0.0.1: code=491, num_packages=0, err=fetchAndInsertModule("github.com/quantcast/g2", "v0.0.1"): FetchModule("github.com/quantcast/g2", "v0.0.1"): module path=github.com/quantcast/g2, go.mod path=github.com/ssmccoy/g2: alternative module; timings: db.UpdateModuleVersionState=0.214s, fetch.FetchModule=0.362s, worker.deleteModule=0.462s, worker.updatedVersionMap=0.132s
2021/09/23 11:11:35 Critical: fetchFunc(ctx, f, "github.com/quantcast/g2", "v0.0.1"): FetchAndUpdateState("github.com/quantcast/g2", "v0.0.1", ""): fetchAndInsertModule("github.com/quantcast/g2", "v0.0.1"): FetchModule("github.com/quantcast/g2", "v0.0.1"): module path=github.com/quantcast/g2, go.mod path=github.com/ssmccoy/g2: alternative module

This is because the run function uses an errgroup.Group, for early termination:

https://cs.opensource.google/go/x/pkgsite/+/fc5196d5:devtools/cmd/seeddb/main.go;l=91-114

	r := results{}
	g := new(errgroup.Group)
	f := &worker.Fetcher{
		ProxyClient:  proxyClient,
		SourceClient: sourceClient,
		DB:           postgres.New(db),
	}
	for path, vers := range versionsByPath {
		path := path
		vers := vers
		// Process versions of the same module sequentially, to avoid DB contention.
		g.Go(func() error {
			for _, v := range vers {
				if err := fetch(ctx, db, f, path, v, &r); err != nil {
					return err
				}
			}
			return nil
		})
	}
	if err := g.Wait(); err != nil {
		return err
	}
	log.Infof(ctx, "Successfully fetched all modules: %v", time.Since(start))

Since historical versions of github.com/quantcast/g2 are broken, you cannot seed with “github.com/quantcast/g2@all”. In addition, if there is any module that is broken in the seed listing, it must be excluded and the seeddb command must be run again. In our codebase, this seems quite common due to early confusion around the “module” directive in go.mod files.

Desired behaviour

When seeding with a list of modules, an operator probably wants to fetch as many valid modules as possible. Understandably, this is the opposite behavior from what a developer wants.

I suggest a -keep_going flag, inspired by GNU make’s --keep-going, which would collect errors until the program has tried to fetch every version:

	r := results{}
	var (
		mu     sync.Mutex
		errors []error
	)
	g := new(errgroup.Group)
	f := &worker.Fetcher{
		ProxyClient:  proxyClient,
		SourceClient: sourceClient,
		DB:           postgres.New(db),
	}
	for path, vers := range versionsByPath {
		path := path
		vers := vers
		// Process versions of the same module sequentially, to avoid DB contention.
		g.Go(func() error {
			for _, v := range vers {
				if err := fetch(ctx, db, f, path, v, &r); err != nil {
					if *keep_going {
						mu.Lock()
						errors = append(errors, err)
						mu.Unlock()
						continue
					}
					return err
				}
			}
			return nil
		})
	}
	if err := g.Wait(); err != nil {
		return err
	}
	if len(errors) != 0 {
		return multierror.New(errors)
	}
	log.Infof(ctx, "Successfully fetched all modules: %v", time.Since(start))

Workaround

Users may manually specify valid versions to seed, but that requires them to know which versions are valid. For our example repository, our seed.txt contains:

github.com/quantcast/g2@v0.6.3
github.com/quantcast/g2@v0.6.4
github.com/quantcast/g2@v0.6.5
github.com/quantcast/g2@v0.6.6
github.com/quantcast/g2@v0.6.7
@gopherbot gopherbot added this to the Unreleased milestone Sep 23, 2021
@jamalc jamalc removed this from the Unreleased milestone Sep 23, 2021
@jamalc jamalc added this to the pkgsite/unplanned milestone Sep 23, 2021
@jamalc jamalc assigned jamalc and julieqiu and unassigned jamalc Sep 23, 2021
@julieqiu julieqiu assigned jba and unassigned julieqiu Sep 23, 2021
@gopherbot
Copy link

@gopherbot gopherbot commented Sep 27, 2021

Change https://golang.org/cl/352489 mentions this issue: devtools/cmd/seeddb: add -keep_going

@jba
Copy link
Contributor

@jba jba commented Sep 27, 2021

Thanks for the clear report and the fix.

gopherbot pushed a commit to golang/pkgsite that referenced this issue Sep 27, 2021
Add a flag that continues fetching modules even if there is an error.

For golang/go#48587

Change-Id: I9df1070ee9bb0e206ddd569228239bde5bcb05cd
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/352489
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
@sfllaw
Copy link
Author

@sfllaw sfllaw commented Sep 29, 2021

:shipit: 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants