Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/cmd/releasebot: canceled release can create truncated tarballs #33025

Open
toothrot opened this issue Jul 10, 2019 · 1 comment

Comments

@toothrot
Copy link
Contributor

commented Jul 10, 2019

During a recent patch release, releasebot appeared stalled during a --mode=release run. Restarting releasebot appeared to resume the release, and reported a successful release. The generated tarball for at least one architecture was truncated. The SHA sum of the tar matched the truncated tar.

Releasebot should:

  • be safe to resume (either by cleaning up on error, or possibly by not re-using artifacts).
  • ensure that the SHA used is generated on the builder creating the tar, only on success.

Nice-to-have:

  • releasebot should mention what release steps were in-flight when it was cancelled.

Example of error:

$ tar xf goX.Y.Z.platform-arch.tar.gz 

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

$ sha256sum goX.Y.Z.platform-arch.tar.gz
b9c7eb3e77c0489e0801ea5ba0ac7245425c4d3adcf051db99d527315667e965
$ cat goX.Y.Z.platform-arch.tar.gz.sha256
b9c7eb3e77c0489e0801ea5ba0ac7245425c4d3adcf051db99d527315667e965

/cc @dmitshur

@toothrot toothrot added this to the Unreleased milestone Jul 10, 2019

@dmitshur dmitshur self-assigned this Sep 13, 2019

@dmitshur

This comment has been minimized.

Copy link
Member

commented Sep 13, 2019

CL 189537 has helped with this, it changed the behavior of releasebot to first write release artifacts to a temporary directory while the release is in progress, and then after tests complete successfully, atomically moving them into their final location.

That means interrupting releasebot while it's in the process of downloading the release artifact from a buildlet will leave the truncated artifact in a temporary directory, and it will not be incorrectly re-used by the next releasebot run.

So releasebot should be safe to resume now. What's left here is to investigate if there are other scenarios where this can still happen, and investigate whether the timing/location of the SHA generation is optimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.