Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev-cmd/bottle: use gnu-tar universally, sort entries & use PAX #11253

Merged
merged 1 commit into from
Apr 27, 2021

Conversation

Bo98
Copy link
Member

@Bo98 Bo98 commented Apr 26, 2021

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

  • ustar is too restrictive on file path lengths - we need PAX.
  • The ordering of contents is non-deterministic, but gnu-tar provides a --sort option to solve this (libarchive does not but can be worked around)
  • GNU Tar's implementation of PAX also has some non-deterministic data. In particular, the headers contain the PID for versions < 1.33. Also clear atime and ctime to be safe (the latter being something that's messy to control). Set --pax-option to deal with this.

System libarchive bsdtar could potentially be retained, but it'll need more testing.

@Bo98 Bo98 added the critical Critical change which should be shipped as soon as possible. label Apr 26, 2021
@BrewTestBot
Copy link
Member

Review period skipped due to critical label.

BrewTestBot
BrewTestBot previously approved these changes Apr 26, 2021
@MikeMcQuaid
Copy link
Member

MikeMcQuaid commented Apr 26, 2021

  • ustar is too restrictive on file path lengths - we need PAX.

@Bo98 Can you elaborate on this? What's failing as-is without it? What about e.g. checking for path lengths and using pax or ustar accordingly?

  • (libarchive does not but can be worked around)

How?


Why prefer pax over the GNU tar default? If we're using pax anyway: why not use pax with libarchive rather than GNU tar everywhere?


If/when this is merged: can the various cellar all: bottles be rebottled so they have consistent local/reproducible checksums?

@MikeMcQuaid
Copy link
Member

Actually: if #11251 is blocking: can it be reverted instead and we hold off on this PR until tomorrow (which could be rebased on the reverted #11251 if need be). Thanks!

@Bo98
Copy link
Member Author

Bo98 commented Apr 26, 2021

Can you elaborate on this? What's failing as-is without it?

ansible/3.3.0/libexec/lib/python3.9/site-packages/ansible_collections/community/aws/tests/unit/plugins/modules/placebo_recordings/aws_direct_connect_link_aggregation_group/nonexistent_lag_does_not_exist/directconnect.DescribeLags_1.json

What about e.g. checking for path lengths and using pax or ustar accordingly?

Potentially, but I'm not sure I see much benefit of this - unless you want to avoid changing any bottle checksums already produced?

How?

Traverse the directory ourselves, sort it and feed it into tar one by one.

Why prefer pax over the GNU tar default?

pax is POSIX standard. GNU tar default is not.

GNU tar actually plans to change to posix/pax by default in the future: "This will change in future releases, since we plan to make ‘POSIX’ format the default."

why not use pax with libarchive rather than GNU tar everywhere?

This might work if we apply external sorting as mentioned above.

I'll need to investigate ctime a bit first since libarchive doesn't offer the option to strip it.

@Bo98
Copy link
Member Author

Bo98 commented Apr 27, 2021

Biggest issue I've had with retaining compatibility between gnu-tar and bsdtar is PAX's ctime (there's a few other things that are likely blockers but this is the one I focussed on).

ctime is something I'm stripping from gnu-tar, as there is no way I know to change it like mtime and atime. This is proving difficult with bsdtar.

The only way to strip ctime from bsdtar is to use the dynamic format mode (for some reason it is always omitted in that scenario even when it picks PAX). However, this means that in order to be consistent with gnu-tar, we need to implement the exact same decision logic to choose whether pax or ustar should be used (namely this: Homebrew/homebrew-core#75880 (comment)) and apply that to gnu-tar. And then keep up-to-date with any logic changes that may occur in the future.

This is annoying, and when looking at upstream for guidance I came across a note, in the commit which contained the most recent change to the ctime logic. It seems what we want bsdtar to do contradicts with what bsdtar is designed to do:

People who are really serious about generating bit-for-bit
identical archives should really build their own command-line
interface: You can still use libarchive to build the output,
but your custom CLI could sort the entries and strip everything
except a bare minimum of basic metadata.

We can of course follow that suggestion without too much issue. A native gem which interfaces with libarchive (statically) and sets up the tar exactly the way we want it regardless of the platform is fairly easy to do (I've done something similar before and it doesn't take long at all), and even means we don't need to install any formula. Do we want to though? (I'm not even sure if this is a rhetorical question - I'm genuinely open to anything)

In the end, there's even subtle differences between libarchive and gnu-tar in ustar mode. I've not looked into those much - the file permission seems to have an extra digit in gnu-tar. I think we're asking for the impossible to be using a hybrid system like we were and expect outputs to be identical. I suppose the next question is: is the output of gnu-tar 1.34 from the formula identical to the 1.28 that ships with Ubuntu 16.04?

BrewTestBot
BrewTestBot previously approved these changes Apr 27, 2021
BrewTestBot
BrewTestBot previously approved these changes Apr 27, 2021
@iMichka
Copy link
Member

iMichka commented Apr 27, 2021

Ubuntu 16.04 -> knowing that we will migrate to Ubuntu 18.04 one day, so everything might change again.

@MikeMcQuaid
Copy link
Member

Potentially, but I'm not sure I see much benefit of this - unless you want to avoid changing any bottle checksums already produced?

Yeh, I'm hoping to do this. If we do, though, let's rebuild the various cellar :any bottles. The alternative (that I favoured yesterday but I think care less about today) is to implement at least our own path length logic (longest path length - HOMEBREW_PREFIX length + HOMEBREW_DEFAULT_LINUX_PREFIX length: if too long, use pax, otherwise ustar).

Do we want to though?

Definitely not.

In the end, there's even subtle differences between libarchive and gnu-tar in ustar mode. I've not looked into those much - the file permission seems to have an extra digit in gnu-tar.

diffoscope didn't seem to show any differences and they were generated with identical checksums on my machine (testing using ack).

I think we're asking for the impossible to be using a hybrid system like we were and expect outputs to be identical.

I don't think so. When we find and can demonstrate differences: we can fix them when it's possible/worthwhile to do so.

MikeMcQuaid
MikeMcQuaid previously approved these changes Apr 27, 2021
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with either this (if we rebottle existing all: bottles using ustar), using the pax/ustar switching logic I mentioned, using system BSD tar on newer macOS versions (unless we already have examples of where it differs) provided the (legit) macOS test failures are fixed.

Note, it's also probably worth adjusting this logic so that it's only going to install/use gnu-tar on macOS when using --only-json-tab because reproducibility is guaranteed to fail otherwise.

# Ensure tar is set up for reproducibility.
# https://reproducible-builds.org/docs/archives/
safe_system tar, "--create", "--numeric-owner", "--format", "pax",
"--owner", "0", "--group", "0", "--sort", "name",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't rely on --owner and --group here as this may not be gnu-tar if the formula isn't present. I'd suggest continuing to return arguments from setup_tar_and_args!.

@MikeMcQuaid
Copy link
Member

using system BSD tar on newer macOS versions (unless we already have examples of where it differs)

I tested this again and ruled it out, it's not an option that can be done consistently.

@MikeMcQuaid
Copy link
Member

I'm fine with either this (if we rebottle existing all: bottles using ustar),

Let's go with this. It's the simplest option at this point and only requires a small amount of short-term pain with rebottling.

BrewTestBot
BrewTestBot previously approved these changes Apr 27, 2021
BrewTestBot
BrewTestBot previously approved these changes Apr 27, 2021
MikeMcQuaid
MikeMcQuaid previously approved these changes Apr 27, 2021
Library/Homebrew/dev-cmd/bottle.rb Outdated Show resolved Hide resolved
Library/Homebrew/dev-cmd/bottle.rb Outdated Show resolved Hide resolved
Library/Homebrew/dev-cmd/bottle.rb Show resolved Hide resolved
Library/Homebrew/dev-cmd/bottle.rb Show resolved Hide resolved
@MikeMcQuaid
Copy link
Member

Great work @Bo98!

@MikeMcQuaid MikeMcQuaid merged commit af1e397 into Homebrew:master Apr 27, 2021
@Bo98 Bo98 deleted the gnu-tar branch April 27, 2021 13:44
@github-actions github-actions bot added the outdated PR was locked due to age label May 28, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
critical Critical change which should be shipped as soon as possible. outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants