Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using info file as an atomic flag #1841

Closed
linzhp opened this issue Mar 2, 2023 · 4 comments
Closed

Using info file as an atomic flag #1841

linzhp opened this issue Mar 2, 2023 · 4 comments

Comments

@linzhp
Copy link
Contributor

linzhp commented Mar 2, 2023

Is your feature request related to a problem? Please describe.
Currently, Athens upload the mod, info and zip file in parallel:

go saveOrAbort("info", "application/json", info)
go saveOrAbort("mod", "text/plain", mod)
go saveOrAbort("zip", "application/octet-stream", zip)

When errors happen during uploading any of the file, the module/version would be left in an incomplete state. As a result, to check whether a module/version exists, Athens has to check all 3 files exist:

found := make(map[string]struct{}, 3)
err := s.s3API.ListObjectsPagesWithContext(ctx, lsParams, func(loo *s3.ListObjectsOutput, lastPage bool) bool {
for _, o := range loo.Contents {
if _, exists := found[*o.Key]; exists {
log.EntryFromContext(ctx).Warnf("duplicate key in prefix %q: %q", *lsParams.Prefix, *o.Key)
continue
}
if *o.Key == config.PackageVersionedName(module, version, "info") ||
*o.Key == config.PackageVersionedName(module, version, "mod") ||
*o.Key == config.PackageVersionedName(module, version, "zip") {
found[*o.Key] = struct{}{}
}
}
return len(found) < 3
})
if err != nil {

This means all storage clients have to implement similar logic. What's worse, it becomes problematic when a module has thousands of versions, all of them are in the same directory with the current layout. Athens will have to rely on the blob storage's ability to filter part of file names (#1840). If a blob storage doesn't support that, the Athens will have to either:

Describe the solution you'd like
If we instead upload the info file synchronously after Athens successfully upload mod and zip file, which can still be in parallel, then the existence of info file serves as a flag that the module/version is complete. The info file is very small, typically less than 200 bytes. Uploading it synchronously is not a problem.

Describe alternatives you've considered
We also discussed putting each version in its own directory in #1840, so we don't need pagenation or file filtering.

@marwan-at-work @r-ashish What do you think?

@r-ashish
Copy link
Member

r-ashish commented Mar 2, 2023

I like the idea but this will also require a cache refresh otherwise we may have false positives while declaring a module version as complete.

@linzhp
Copy link
Contributor Author

linzhp commented Mar 2, 2023

As long as the module/version is ever requested once, Athens will detect the absence of any of the 3 files and force cache refresh. So the only false positive would only happen to those newly cached modules/versions but never read since, which should be very rare. We can deploy the change in two stages to further reduce the chance:

Stage 1: enforce the order of saving the 3 files and have users to run go clean -modcache && go mod tidy to make sure that all active modules are complete
Stage 2: modify the Exits() function to check only the info file.

Alternatively, we can write an empty file instead, and use that as the atomic flag.

@manugupt1
Copy link
Member

Can we close this now @linzhp

@linzhp
Copy link
Contributor Author

linzhp commented Mar 4, 2023

Yeah, with less calls to Exists() after #1842 and #1844 improving the performance, we don't need this anymore

@linzhp linzhp closed this as completed Mar 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants