Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy.golang.org: don't cache /@latest for modules that no longer exist at the upstream head revision #39007

Open
bcmills opened this issue May 11, 2020 · 14 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented May 11, 2020

In gomodule/redigo#440, a nested github.com/gomodule/redigo/redis module was created and subsequently removed from the github.com/gomodule/redigo repository: today, the nested module does not exist at the head revision of that repo, and also does not exist in the latest tagged release of the parent module (github.com/gomodule/redigo).

In gomodule/redigo#440 (comment) and gomodule/redigo#440 (comment), I noted that proxy.golang.org was still serving an old commit for /@latest for the nested module over an hour later, even though GOPROXY=direct correctly reported that there was no latest version of that module, and even though proxy.golang.org did know about a higher (and later) pseudo-version of that module.

As a workaround, the owners of the repo had to tag an explicit release of the nested module, even though they did not intend for anyone to actually use that release.

I don't think that workaround should have been necessary: I think proxy.golang.org should not serve known-stale contents for the /@latest endpoint. That endpoint should be invalidated whenever a newer version of the named module is successfully fetched, even if that version is not itself latest (as interpreted by the go command).

That may mean that /@latest transitions from advertising a version of the module to no longer advertising any such version, but that behavior is no worse than what would happen if the repo owner took some other drastic action to remove the module — such as publishing a commit on the default branch with all of the source code deleted, or publishing a commit with the module line declaring some other path. Regardless of the reason, proxy.golang.org should view the affirmative non-existence of a module as a valid “latest” state for that module, and not get stuck advertising “the last version that did exist”.

This only really matters for modules that never explicitly tagged any release or pre-release version, since the go command does not consult the /@latest endpoint unless the list of release and pre-release versions is empty.

(Note that, even if the /@latest endpoint itself does not advertise any version, the proxy should continue to serve all previously-fetched versions of any module with a suitable license. I am not proposing that we evict any explicit versions from the cache — only the /@latest endpoint itself.)

CC @jayconrod @matloob @katiehockman @heschik @hyangah @stevenh

@bcmills bcmills added this to the Backlog milestone May 11, 2020
@heschik
Copy link
Contributor

@heschik heschik commented May 11, 2020

Sorry, but I don't understand what the actual feature request here is. My best guess is:

module X has an @latest entry.
module X/Y has an @latest entry.
attempted fetches of module X/Y begin failing -- possibly because the origin is flaky, possibly because it's been deleted, or possibly because something is internally wrong with proxy.golang.org.
module X continues to work.

In this situation, you're saying that we should assume that X/Y is failing because it's been deleted, check that a version of X later than the version of X/Y has been fetched, and forget the @latest for X/Y?

If that's wrong please be very specific about the proposal. I can't imagine a version of this that I think is a good idea so I'm struggling to understand the suggestion.

@hyangah
Copy link
Contributor

@hyangah hyangah commented May 11, 2020

fyi - #32239 is related. We tried this as another way to address this type of issue, but we had to give it up because of noticeable performance degradation.

@bcmills
Copy link
Member Author

@bcmills bcmills commented May 11, 2020

In this situation, you're saying that we should assume that X/Y is failing because it's been deleted, check that a version of X later than the version of X/Y has been fetched, and forget the @latest for X/Y?

Behaviorally, yes.

But I would phrase the causality of it a bit differently:

  • At time τ₂, X/Y@latest is fetched and resolves to (pseudo-)version v₁, with metadata indicating commit timestamp τ₁.
  • At time τ₄, module X/Y@yyyyyyyyyyyy is fetched and resolves to version v₃, with metadata indicating commit timestamp τ₃ (where τ₄ > τ₃ > τ₂ > τ₁).

Since τ₃ > τ₁ and τ₄ > τ₂, we can infer that the state of the repo probably changed at some point between τ₂ and τ₃, and thus the @latest cache entry from that point in time is suspect. So we should remove the suspected-stale cache entry, and either not replace it or re-resolve and replace it “soonish”.

The next time X/Y@latest is requested (at τ₅), we should re-resolve it.

  • If that happens to result in failure, that's fine: we should serve the new failure instead of the suspected-stale v₁.
  • If it happens to continue to resolve to v₁, that's fine too: we can cache the new result until we see a version newer than τ₃ (or perhaps τ₄ or τ₅, depending on how much of a race we want to allow — and how much, if any, we are willing to expand the database schema).
@heschik
Copy link
Contributor

@heschik heschik commented May 12, 2020

We can't tell whether a module version is on the default branch, so we would have to invalidate @latest on any commit to any branch that we became aware of. If we then experienced a transient failure refreshing @latest, that would break users.

I think this is a very specific workaround to a very specific problem that has better solutions and causes some amount of collateral damage. It certainly increases complexity. I would rather not do it.

@katiehockman
Copy link
Member

@katiehockman katiehockman commented May 12, 2020

I agree that this seems like a lot of complexity for a pretty niche situation. I definitely empathize with @bcmills' point that we don't want users to have to a special workaround for this, but the tradeoff of this could harm other users (who then may also need a workaround, etc etc). I'm also not confident that I know every single side effect of this change.

If this becomes a bigger problem that more people experience, then let's re-visit this. For now, a reasonable solution is to use gomodule/redigo#440 (comment) as a guideline if anyone else enounters this. Maybe documenting it if that issue isn't good enough.

This also relates back to #30134, where we could do more here if we had more information from the error message (like @heschik alluded to). It's difficult to make the right decision about whether or not to drop one of our cache entries if there was a go command error, since it could be from a legitimate error that we can save, or it could be from a transient server error that we should throw away.

@stevenh
Copy link
Contributor

@stevenh stevenh commented May 12, 2020

Could a user exploit this to break a repo simply by putting in a PR that is then cached in this way?

@bcmills
Copy link
Member Author

@bcmills bcmills commented May 12, 2020

@stevenh, no: the go command refuses to use commits that are not reachable from at least one tag or branch, and unmerged PRs are not reachable from any tag or branch.

@bcmills
Copy link
Member Author

@bcmills bcmills commented May 14, 2020

We can't tell whether a module version is on the default branch, so we would have to invalidate @latest on any commit to any branch that we became aware of.

Yes, that's true.

If we then experienced a transient failure refreshing @latest, that would break users.

It would not break any existing users, since they are all requesting explicit versions. It could indeed break new users of the module, but a module author can already break new users of the module in all sorts of ways (such as by pushing a latest version that explicitly removes all of the source code from within the module). I think it clearly is the proxy's responsibility to provide continuity for existing users, but ensuring that new users can always find a viable latest to use seems outside its charter.

@bcmills
Copy link
Member Author

@bcmills bcmills commented May 14, 2020

I agree that this seems like a lot of complexity for a pretty niche situation.

A similar approach that would reduce overall complexity would be to apply the same TTL for latest that we apply for branch-to-version resolution in general. (After all, latest is just the branch-to-version resolution for “the default branch”.)

That would have fixed github.com/gomodule/redigo/redis after an hour, instead of requiring explicit tagging.

To me the source of complexity here is the desire to treat @latest as somehow different from other branches — and I don't agree that it is different.

@hyangah
Copy link
Contributor

@hyangah hyangah commented May 14, 2020

The reason that proxy is treating @latestdifferently is because that's part of the module mirror's promise (no more left-pad) and affects end users. There are many tools that do not use the versioned tags. For example, the vscode go extension depends on many tools and many of them still go by pseudo-versions.

@heschik
Copy link
Contributor

@heschik heschik commented May 14, 2020

It's not even that. As far as I recall this is exactly the same behavior everything else gets. If a module goes from existing to failing to download, we'll continue to serve it for as long as we can. That goes for branches, tags, pseudoversions, everything.

@bcmills
Copy link
Member Author

@bcmills bcmills commented May 14, 2020

If a module goes from existing to failing to download, we'll continue to serve it for as long as we can. That goes for branches, tags, pseudoversions, everything.

That seems like a major problem. What happens if the branch is, say, security-backports, and the module author decides to rename the branch to lts-security-backports? Users who set up a branch-oriented workflow will end up stuck on the last update, where they should have gotten an explicit error.

(CC @FiloSottile)

@bcmills
Copy link
Member Author

@bcmills bcmills commented May 14, 2020

The reason that proxy is treating @latest differently is because that's part of the module mirror's promise (no more left-pad) and affects end users.

The “mirror's promise” is that existing users will be able to continue to use the dependency versions that they were already using. It does not promise that the latest version of a module will always be a usable version of that module.

As noted in #39007 (comment), it is already possible for a module author to break latest for new users if they so choose, and in the case of left-pad the decision to break users seems to have been intentional.

There are many tools that do not use the versioned tags. For example, the vscode go extension depends on many tools and many of them still go by pseudo-versions.

Expiring the cache entry for latest would not prevent any of those existing users from continuing to use the versions they already depend on.

@heschik
Copy link
Contributor

@heschik heschik commented May 14, 2020

One of the primary goals of the proxy is to protect users from flaky upstreams, in particular outages. Since the go command gives us no way to distinguish outages from deliberate actions, we are required to be conservative and not try to infer intent.

Adding new dependencies is not a trivial use case and it is not acceptable for the ecosystem to break the next time some origin server goes down for a day. I don't think that's negotiable but if you would like to argue that case, this is probably not the place to do it.

If the fix for #24031 isn't sufficient to cover the security use case that sounds like a critical problem with it. The vast majority of people are presumably not going to be tracking a branch.

I don't think this discussion is going anywhere and I think this issue should be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.