Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharing and distributing log files/build metadata? #441

Open
thoughtpolice opened this issue Oct 9, 2023 · 2 comments
Open

Sharing and distributing log files/build metadata? #441

thoughtpolice opened this issue Oct 9, 2023 · 2 comments

Comments

@thoughtpolice
Copy link
Contributor

buck2 log is pretty cool, since it lets you look at what build commands a user ran and what their output is, and you can do things like log cmd or log replay to watch the build. Is it possible or recommended to 'share' these log files? What would some infrastructure for doing that look like? Maybe something like:

  • Users run buck2 build ... a bunch
  • Their logs can get synchronized to something like an HTTP endpoint (maybe a proxy that just fronts an S3 bucket)
  • Something goes wrong, a user asks for help, posts their Build ID.
  • You could run buck2 log cmd --trace-id $BUILD_ID to find out what happened.
  • This would (transparently?) download the log files from some endpoint (somehow?)

There could be other useful things to derive from this possibly, assuming you had these logs. For example, you could use these logs as a direct source of build analytic information from users to derive information about build times, etc.

Is there any kind of "thing" like this inside Meta? Does this seem interesting? To be truly useful it would need some support in the core executable, I think. It seems like something kind of like what buck2 rage does, right? Except I'd want it in all cases, not just failures.

I imagine this could look like the following for OSS users:

  • A .buckconfig key like buck2_logs.upload_address is set to https://buck2-logs.aseipp.dev/
  • All .pb.zst files are POST'd to $UPLOAD_ADDRESS/upload with their trace ID as a primary key, asynchronously, by the daemon
  • When a user says buck2 log cmd --trace-id $TRACE_ID, then:
    • HTTP GET $UPLOAD_ADDRESS/trace/$TRACE_ID
    • This should return a json object containing metadata and a new address to find the log content at
    • That new address is the public URL to download the .pb.zst file
    • This allows users to materialize logs on demand
  • Something something something authorization with a bearer token
  • Ideally the HTTP protocol is simple enough to implement "by hand" in a short afternoon
@thoughtpolice
Copy link
Contributor Author

thoughtpolice commented Oct 9, 2023

Actually, if --trace-id could take an https:// URL and download a file, you could probably do the first part completely outside of the core executable with a file watching tool? Just watch buck-out/v2/log and upload every file that gets written? Is that feasible and perhaps less invasive?

@cjhopman
Copy link
Contributor

Yeah, I've been really happy with the entire suite of buck2 log commands.

Is there any kind of "thing" like this inside Meta? Does this seem interesting?

Yes, there is. You can see some stuff related to log uploading here:

fn log_upload_url(use_vpnless: bool) -> Option<&'static str> {
and then all the log commands have support for fetching the trace-id from where they are stored around here: https://github.com/facebook/buck2/blob/062f014ddfeddbef12542b3e233e2f609eacca9a/app/buck2_client/src/commands/log/options.rs#L90C1-L112

This has been incredibly useful for understanding and providing support for user builds.

I think it would be great if we could figure out the appropriate API for supporting this for open source users as well. I think it probably should be considered together with #226. The answer there may be that it's actually best to have separate apis for them (you'll note that internally we are doing them basically entirely separately).

One important thing we've found is that we want to upload the log incrementally as the build progresses to ensure that we get logs even on failed things (particularly our CI may aggressively kill off timing out or ooming things that makes it hard to reliably capture on failures).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants