Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/pkgsite: support generating documentation for massive modules #64437

Open
MKrupauskas opened this issue Nov 28, 2023 · 4 comments
Open

x/pkgsite: support generating documentation for massive modules #64437

MKrupauskas opened this issue Nov 28, 2023 · 4 comments
Labels
pkgsite WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@MKrupauskas
Copy link

MKrupauskas commented Nov 28, 2023

What did you do?

I tried to set up a local pkg.go.dev instance to generate the doc for an internal package which contains more than the max limit of packages per module and around half a million go source files.

What did you expect to see?

I expected the documentation to be generated successfully even for a huge module with many packages and tons of source files.

What did you see instead?

Trying to generate the doc for a large module with many packages and source files resulted in hard failures. The module size limits were hit https://github.com/golang/pkgsite/blob/42af168e68a06ea24646bfc3f17fc6226dc15d38/internal/fetch/limit.go#L9 and wouldn't work even if those limits were raised.

I'm looking for ideas/guidance on what needs to be done to support generating the documentation for large modules that are disallowed by current limits. I need to look into the code but I assume some sort of lazy module loading would be required to support it. Guidance, ideas, and thoughts are high appreciated. I'd be happy to implement support for large modules.

@gopherbot gopherbot added this to the Unreleased milestone Nov 28, 2023
@MKrupauskas MKrupauskas changed the title x/pkgsite: support large modules x/pkgsite: support generating documentation for massive modules Nov 28, 2023
@adonovan
Copy link
Member

adonovan commented Dec 1, 2023

Thanks for reporting this. As you point out, there are limits on the number of packages per module (10,000). Does your module exceed that size? I am told that the number of Go source files shouldn't by itself be a problem, though individual very large files are skipped. We're curious to hear as many details about this monster module as you're comfortable sharing.

@hyangah hyangah added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Dec 7, 2023
@MKrupauskas
Copy link
Author

Yes, the module contains considerably more packages than the 10k packages per module limit. It's a large codebase monorepo that's structured as a single go module for central dependency management and compatibility reasons. I will share the rough number of packages the module contains when I have those numbers.

@seankhliao seankhliao added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jan 28, 2024
@MKrupauskas
Copy link
Author

Did some more digging and I think the bottleneck is in the custom packages.Load https://github.com/golang/pkgsite/blob/463d7c943fe04a6c6f2b97e71def0fb165fff76a/internal/fetch/getters.go#L252 which runs out of memory when the module is too large. What ideas would you have about addressing this bottleneck?

@adonovan
Copy link
Member

That's quite surprising: the packages.Load operation is essentially a wrapper around go list -json to retrieve the metadata (directory, filename, package name, etc) for each package in the workspace, which is usually efficient and reliable. Are you able to print the list of arguments to packages.Load? If so, could you try running these two commands:

$ time go run golang.org/x/tools/go/packages/gopackages@master ./...
$ time go list -json  ./...

(Replace ./... by the arguments you observe in the call to packages.Load.) I would expect both to run to completion rapidly on the order of several hundred packages per second, so a 10K package repo should take around half a minute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkgsite WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

5 participants