Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/pkgsite/cmd/frontend: no longer loads stdlib in proxy mode #63249

Open
aslatter opened this issue Sep 26, 2023 · 9 comments
Open

x/pkgsite/cmd/frontend: no longer loads stdlib in proxy mode #63249

aslatter opened this issue Sep 26, 2023 · 9 comments
Labels
pkgsite/cmd Issues related to x/pkgsite/cmd/pkgsite pkgsite

Comments

@aslatter
Copy link

aslatter commented Sep 26, 2023

What version of Go are you using (go version)?

$ go version
go version go1.20.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/net_home/alatter/.cache/go-build"
GOENV="/net_home/alatter/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/net_home/alatter/go/pkg/mod"
GONOPROXY="*."
GONOSUMDB="*."
GOOS="linux"
GOPATH="/net_home/alatter/go"
GOPRIVATE="*.epic.com"
GOPROXY=",direct"
GOROOT="/net_home/alatter/local/go1.20.2.linux-amd64"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/net_home/alatter/local/go1.20.2.linux-amd64/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.2"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/net_home/alatter/source/pkgsite/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1051915940=/tmp/go-build -gno-record-gcc-switches"

What did you do?

We run pkgsite and host it internally to provide documentation for our own packages. We run it in "direct" mode, and send it through our internal go-proxy (Athena), which has credentials to fetch our internal go-modules.

We run pkgsite with arguments like the following:

$ go run ./cmd/frontend -bypass_license_check -direct_proxy -proxy_url https://<redacted> -host :8080

After commit f159e616582641bdbeb2d6a0018ca4c8c472fbb5 we are no longer able to load godocs
for stdlib packages - for example, loading http://localhost:8080/io will return HTTP-500 with the following logs from pkgsite:

2023/09/26 15:11:41 Info: Listening on addr :8080
2023/09/26 15:11:46 Info: 0 /io map[requestType:request start]
2023/09/26 15:11:46 Info: FetchDataSource: fetching std@latest
2023/09/26 15:11:46 Info: FetchDataSource: fetched std@latest using *fetch.proxyModuleGetter in 226.3µs with error FetchModule("std", "latest"): proxy.Client.Info("std", "latest"): Client.readBody("std", "latest", "info"): Client.escapedURL("std", "latest", "info"): path: malformed module path "std": missing dot in first path element: invalid argument
2023/09/26 15:11:46 Error: serveUnitPage(ctx, w, r, ds, &{io std latest}): FetchDataSource.GetUnitMeta("io", "std", "latest"): FetchDataSource.findModule("io", "std", "latest"): FetchDataSource.getModule("std", "latest"): FetchModule("std", "latest"): proxy.Client.Info("std", "latest"): Client.readBody("std", "latest", "info"): Client.escapedURL("std", "latest", "info"): path: malformed module path "std": missing dot in first path element: invalid argument
2023/09/26 15:11:46 Info: 500 /io map[isRobot:false requestType:request end]

The error malformed module path "std": missing dot in first path element: invalid argument appears to be coming from
our go-proxy (Athena)
. It is objected to a module named std, which seems like a reasonable thing to object-to.

(edit: on further testing the error is internal to pkgsite - the module for loading stdlib does not seem to be compatible with the direct-proxy mode)

If I swap the order of the getters used in fetchdatasource.Options in the direct-mode branch of frontend startup everything seems to work for me:

	if *directProxy {
		sourceClient := source.NewClient(&http.Client{Transport: &ochttp.Transport{}, Timeout: 1 * time.Minute})
		ds := fetchdatasource.Options{
			Getters: []fetch.ModuleGetter{
				fetch.NewProxyModuleGetter(proxyClient, sourceClient), // <-- if I move this down one line my problem goes away
				fetch.NewStdlibZipModuleGetter(),
			},
			ProxyClientForLatest: proxyClient,
			BypassLicenseCheck:   *bypassLicenseCheck,
		}.New()
		dsg = func(context.Context) internal.DataSource { return ds }
	} else {

Bisect result:

f159e616582641bdbeb2d6a0018ca4c8c472fbb5 is the first bad commit
commit f159e616582641bdbeb2d6a0018ca4c8c472fbb5
Author: Michael Matloob <matloob@golang.org>
Date:   Wed Jul 5 14:02:46 2023 -0400

    internal/fetch: add two stdlib getters

    This breaks out the behavior of getting the stdlib into its own
    getter, stdlibZipModuleGetter. Most (but not all) of the special cases
    in the fetch code for the stdlib are in the stdlibZipModuleGetter,
    which is installed on the frontend. For the local pkgsite command, we
    add a new builder for the goPackagesModuleGetter that loads the
    packages from the stdlib given a GOROOT. The stdlibZipModuleGetter is
    also installed for pkgsite in as a fallback if the
    goPackagesModuleGetter doesn't successfully fetch the stdlib module
    (I'm not sure if that's possible).

    Fixes #60114

    Change-Id: Ida6a5367343643cc337c6d05e0d40095f79ca9e5
    Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/507883
    Reviewed-by: Jamal Carvalho <jamal@golang.org>
    TryBot-Result: Gopher Robot <gobot@golang.org>
    Run-TryBot: Michael Matloob <matloob@golang.org>
    kokoro-CI: kokoro <noreply+kokoro@google.com>

What did you expect to see?

What did you see instead?

@bcmills bcmills changed the title affected/package: x/pkgsite no longer loads stdlib if go-proxy cannot parse fake stdlib module x/pkgsite/cmd/frontend: no longer loads stdlib if go-proxy cannot parse fake stdlib module Sep 26, 2023
@gopherbot gopherbot added this to the Unreleased milestone Sep 26, 2023
@bcmills bcmills added the pkgsite/cmd Issues related to x/pkgsite/cmd/pkgsite label Sep 26, 2023
@bcmills
Copy link
Contributor

bcmills commented Sep 26, 2023

If I swap the order of the getters used in fetchdatasource.Options in the direct-mode branch of frontend startup everything seems to work for me:

Hmm. I suspect that's a bug in the fetch.proxyModuleGetter. ./internal/fetchdatasource checks for derrors.NotFound to skip errors:
https://cs.opensource.google/go/x/pkgsite/+/master:internal/fetchdatasource/fetchdatasource.go;l=166-168;drc=14233af499055be3ba3fe6816af8e7d4a4efa9ad

Probably if a module with an invalid path is requested from fetch.proxyModuleGetter it should return an error for which errors.Is(err, derrors.NotFound) returns true.

@aslatter, want to send a fix?

@aslatter
Copy link
Author

Just to make sure I understand, you mean something like this?

diff --git a/internal/proxy/client.go b/internal/proxy/client.go
index 92adc32c..dec80330 100644
--- a/internal/proxy/client.go
+++ b/internal/proxy/client.go
@@ -175,7 +175,14 @@ func (c *Client) ZipSize(ctx context.Context, modulePath, resolvedVersion string
 }

 func (c *Client) EscapedURL(modulePath, requestedVersion, suffix string) (_ string, err error) {
-       defer derrors.WrapStack(&err, "Client.escapedURL(%q, %q, %q)", modulePath, requestedVersion, suffix)
+       defer func() {
+               // if we cannot correctly craft a URL for the request module, it's possible we are dealing with
+               // something we cannot proxy (i.e. stdlib).
+               //
+               // Pass-back "NotFound" so we keep searching.
+               err = fmt.Errorf("%w: %s", derrors.NotFound, err)
+               derrors.WrapStack(&err, "Client.escapedURL(%q, %q, %q)", modulePath, requestedVersion, suffix)
+       }()

        if suffix != "info" && suffix != "mod" && suffix != "zip" {
                return "", errors.New(`suffix must be "info", "mod" or "zip"`)

@bcmills
Copy link
Contributor

bcmills commented Sep 27, 2023

That seems like probably too low a level for the transformation.

It looks like there are already special cases in ./internal/fetch for the standard library:
https://cs.opensource.google/go/x/pkgsite/+/master:internal/fetch/fetch.go;l=77-104;drc=14233af499055be3ba3fe6816af8e7d4a4efa9ad

Perhaps the if fr.ModulePath == stdlib.ModulePath condition should be moved up before the switch mg.(type)?

@aslatter
Copy link
Author

I can think of two options:

  1. Special-case "std" inside the methods of the proxy-client, and directly return "NotFound" (at minimum in "info", but extending that to the other "getters" makes sense).
  2. Special-case "std" inside "./internal/fetch", to skip module-getters which are not appropriate

Both of these feel kinda icky - (2) has a higher density of weird coupling of components, but (1) requires the proxy-client having to know about the "std" module.

My comment about flipping the order of the getters in the issue-description was not a serious suggestion - it was more describing what I'd done to diagnose things. But it doesn't seem worse than these options? Every module which is handled by the stdlib-getter can never be handled by the proxy-getter.

@hyangah hyangah modified the milestones: Unreleased, pkgsite/backlog Oct 5, 2023
@hyangah
Copy link
Contributor

hyangah commented Oct 5, 2023

cc @jba

@aslatter aslatter changed the title x/pkgsite/cmd/frontend: no longer loads stdlib if go-proxy cannot parse fake stdlib module x/pkgsite/cmd/frontend: no longer loads stdlib in proxy mode Apr 25, 2024
@aslatter
Copy link
Author

As far as I can tell this issue is a blocker for running in "direct proxy" mode. Any work-arounds would be appreciated (or guidance/advice for running in non-proxy mode).

I'd be happy to send a fix, but it seemed like the direction we wanted to take wasn't clear.

@mxk
Copy link

mxk commented May 4, 2024

I'm not sure if this is related, but I can't get frontend to serve stdlib even in database mode (just get 404 responses). As far as I can tell, the database has to be populated with stdlib info, but that is only done by a worker, which only runs when frontend is hosted on GCP? I'm having a hard time understanding how this is all supposed to work.

@aslatter
Copy link
Author

aslatter commented May 4, 2024

I expect that's a different issue - in direct-proxy mode stdlib browsing works almost works if the request gets routed to the back-end for it. Once that happens the discovery/download happens on the fly, similar to proxy-discovered packages/modules.

@mxk
Copy link

mxk commented May 5, 2024

Yea, I think I figured out that I need to run seeddb in order to populate the database with stdlib info. Unfortunately, when I run it, my PostgreSQL instance gets killed due to OOM error (with 32 GB of RAM). So this is sort of unrelated to direct-proxy mode, but I'm not sure why frontend doesn't just fetch stdlib the same as any other missing package when running with a database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkgsite/cmd Issues related to x/pkgsite/cmd/pkgsite pkgsite
Projects
None yet
Development

No branches or pull requests

5 participants