Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If SOURCE_DATE_EPOCH is set, produce identical output every time #1356

Closed
wants to merge 12 commits into from

Conversation

dankegel
Copy link
Contributor

Partly fixes #1232

Some work will be required in every package format handler;
for now, all that's been done is the minimum needed to achieve
reproducibility when running the canonical example:

   fpm -s gem -t deb json

See https://reproducible-builds.org/specs/source-date-epoch/
and https://wiki.debian.org/ReproducibleBuilds/StandardEnvironmentVariables
for why this is an environment variable, and not a commandline option.

…tly fixes jordansissel#1232

Some work will be required in every package format handler;
for now, all that's been done is the minimum needed to achieve
reproducibility when running the canonical example
   fpm -s gem -t deb json

See https://reproducible-builds.org/specs/source-date-epoch/
and https://wiki.debian.org/ReproducibleBuilds/StandardEnvironmentVariables
for why this is an environment variable, and not a commandline option.
@@ -2,4 +2,4 @@

* Package created with FPM.

-- <%= maintainer %> <%= Time.now.strftime("%a, %d %b %Y %T %z") %>
-- <%= maintainer %> <%= (if attributes[:source_date_epoch].nil? then Time.now() else Time.at(attributes[:source_date_epoch].to_i) end).strftime("%a, %d %b %Y %T %z") %>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"%a is locale-specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%a ceased being locale-specific in ruby 1.9....?
https://idiosyncratic-ruby.com/57-what-the-time.html says

"%a | Weekday Name (Abbreviated)
Locale-independent (English) three-letter name of weekday"

Verified:

$ export DEBEMAIL=dank@kegel.com
$ export DEBFULLNAME="Dan Kegel"
$ export SOURCE_DATE_EPOCH=123
$ LANG=de_DE.UTF-8
$ date
Do 22. Jun 15:42:00 PDT 2017
$ ruby -Ilib bin/fpm --verbose -s gem -t deb json
$ dpkg-deb -x *.deb tmp
$ zmore tmp/usr/share/doc/rubygem-json/changelog.gz 
rubygem-json (2.1.0) whatever; urgency=medium

  * Package created with FPM.

 -- Dan Kegel <dank@kegel.com>  Wed, 31 Dec 1969 16:02:03 -0800

return Stud::Temporary.directory("package-#{type}-staging")
else
# Choose a predictable build path; see https://reproducible-builds.org/docs/build-path/
root = ENV["TMP"] || ENV["TMPDIR"] || ENV["TEMP"] || "/tmp"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems somewhat regrettable (ie. it's not actually "predictable" as the comment claims IMHO) but I can see how there is no real workaround for fpm as it cannot, for example, guarantee the availability of, say, /tmp/buildd/whatever.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fpm already has a setting for this: --workdir WORKDIR

We can add an environment variable if you believe you need one, but --workdir is intended for this purpose, at least, as far as I can tell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--workdir, alas, only sets TMP; it doesn't prevent the nondeterministic suffix.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make --workdir do what you need, then?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may not be understanding this scenario, btw. Temporary directories, work directories, etc, are only need to exist during fpm's run -- when fpm exits, they should be automatically removed.

Can you help me understand what about temporary paths is important to this 'repeatable build' goal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you run the command "fpm -s gem -t deb json" twice in a row, it generates two different debs.
In the reproducible builds use case, we want it to generate two identical debs, regardless of who
runs the build when or where, as long as the OS is the same release / version / architecture.
(Otherwise e.g. clicking 'Force Rebuild' in buildbot generates failed uploads because reprepro rejects
the new deb with the same version and different checksum.)

One particular problem is that the C toolchain often embeds the absolute path to the build
directory in the binary, say, in the DW_AT_comp_dir field.
Until we teach the compiler to obey BUILD_PATH_PREFIX_MAP,
we can work around this by using a deterministic build path
( see https://reproducible-builds.org/docs/build-path/ ).

So, this has nothing to do with leaving a clean build machine after running fpm, but everything to do with bit-for-bit identical output packages given bit-for-bit identical input packages.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that helps a bit. Are there known casess with your fpm -s gem -t deb json example where temporary directories (by file content or file path) are showing up somehow in the resulting .deb?

Copy link
Contributor Author

@dankegel dankegel Jun 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the example from the next hunk down shows it. Here it is with more context:

$ fpm -s gem -t deb json; mkdir run1; mv *.deb run1
$ sleep 1
$ fpm -s gem -t deb json; mkdir run2; mv *.deb run2
$ diffoscope run1/rubygem-run2/json_2.1.0_amd64.debrubygem-json_2.1.0_amd64.deb  run2/rubygem-run2/json_2.1.0_amd64.debrubygem-json_2.1.0_amd64.deb 
...
├── data.tar.gz
│   ├── data.tar
│   │   ├── ./var/lib/gems/2.3.0/extensions/x86_64-linux/2.3.0/json-2.1.0/json/ext/generator.so
│   │   │   ├── readelf --wide --notes {}
│   │   │   │ @@ -1,5 +1,5 @@
│   │   │   │  
│   │   │   │  Displaying notes found at file offset 0x000001c8 with length 0x00000024:
│   │   │   │    Owner                 Data size        Description
│   │   │   │    GNU                  0x00000014        NT_GNU_BUILD_ID (unique build ID bitstring)
│   │   │   │ -    Build ID: 438800c00798505fc92d9302ebf4ecfcd5aa5f57
│   │   │   │ +    Build ID: 6364fd6a8cad33df79bd85b47faa3617e13304fe
│   │   │   ├── readelf --wide --debug-dump=info {}
│   │   │   │ @@ -2,60 +2,60 @@
│   │   │   │  
│   │   │   │    Compilation Unit @ offset 0x0:
│   │   │   │     Length:        0x7605 (32-bit)
│   │   │   │     Version:       4
│   │   │   │     Abbrev Offset: 0x0
│   │   │   │     Pointer Size:  8
│   │   │   │   <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
│   │   │   │ -    <c>   DW_AT_producer    : (indirect string, offset: 0x361): GNU C11 5.4.0 20160609 -mtune=generic -march=x86-64 -g -O2 -fstack-protector-strong -fPIC -fstack-protector-strong
│   │   │   │ +    <c>   DW_AT_producer    : (indirect string, offset: 0x2d1): GNU C11 5.4.0 20160609 -mtune=generic -march=x86-64 -g -O2 -fstack-protector-strong -fPIC -fstack-protector-strong
│   │   │   │      <10>   DW_AT_language    : 12        (ANSI C99)
│   │   │   │      <11>   DW_AT_name        : (indirect string, offset: 0x1147): generator.c
│   │   │   │ -    <15>   DW_AT_comp_dir    : (indirect string, offset: 0xae): /tmp/package-gem-staging-7f3a2a248718996b858ad69aa865ca5dc46125339c4d6f163bedf10f0b24/var/lib/gems/2.3.0/gems/json-2.1.0/ext/json/ext/generator
│   │   │   │ +    <15>   DW_AT_comp_dir    : (indirect string, offset: 0xe4a): /tmp/package-gem-staging-6d7699fb2e5ccddffbb0ca5b66855e3971f9db4da3984f8ef7845da10088/var/lib/gems/2.3.0/gems/json-2.1.0/ext/json/ext/generator

All of those diffs are from the different build paths.

# Choose a predictable build path; see https://reproducible-builds.org/docs/build-path/
root = ENV["TMP"] || ENV["TMPDIR"] || ENV["TEMP"] || "/tmp"
dir = File.join(root, 'fpm', 'reproducible-builds.org', attributes[:source_date_epoch], path)
FileUtils.mkdir_p(dir, :mode => 0700)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an insecure use of a tempdir?

Copy link
Contributor Author

@dankegel dankegel Jun 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I asked for review on the mailing list!

Until something like https://wiki.debian.org/ReproducibleBuilds/BuildPathProposal lands, this is needed to avoid differences in the embedded DW_AT_comp_dir field in the native extension; here's diffoscope's output between two runs of 'fpm -s gem -t deb json':

├── data.tar.gz
│   ├── data.tar
│   │   ├── ./var/lib/gems/2.3.0/extensions/x86_64-linux/2.3.0/json-2.1.0/json/ext/generator.so
│   │   │   ├── readelf --wide --notes {}
│   │   │   │ @@ -1,5 +1,5 @@
│   │   │   │  
│   │   │   │  Displaying notes found at file offset 0x000001c8 with length 0x00000024:
│   │   │   │    Owner                 Data size        Description
│   │   │   │    GNU                  0x00000014        NT_GNU_BUILD_ID (unique build ID bitstring)
│   │   │   │ -    Build ID: 438800c00798505fc92d9302ebf4ecfcd5aa5f57
│   │   │   │ +    Build ID: 6364fd6a8cad33df79bd85b47faa3617e13304fe
│   │   │   ├── readelf --wide --debug-dump=info {}
│   │   │   │ @@ -2,60 +2,60 @@
│   │   │   │  
│   │   │   │    Compilation Unit @ offset 0x0:
│   │   │   │     Length:        0x7605 (32-bit)
│   │   │   │     Version:       4
│   │   │   │     Abbrev Offset: 0x0
│   │   │   │     Pointer Size:  8
│   │   │   │   <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
│   │   │   │ -    <c>   DW_AT_producer    : (indirect string, offset: 0x361): GNU C11 5.4.0 20160609 -mtune=generic -march=x86-64 -g -O2 -fstack-protector-strong -fPIC -fstack-protector-strong
│   │   │   │ +    <c>   DW_AT_producer    : (indirect string, offset: 0x2d1): GNU C11 5.4.0 20160609 -mtune=generic -march=x86-64 -g -O2 -fstack-protector-strong -fPIC -fstack-protector-strong
│   │   │   │      <10>   DW_AT_language    : 12        (ANSI C99)
│   │   │   │      <11>   DW_AT_name        : (indirect string, offset: 0x1147): generator.c
│   │   │   │ -    <15>   DW_AT_comp_dir    : (indirect string, offset: 0xae): /tmp/package-gem-staging-7f3a2a248718996b858ad69aa865ca5dc46125339c4d6f163bedf10f0b24/var/lib/gems/2.3.0/gems/json-2.1.0/ext/json/ext/generator
│   │   │   │ +    <15>   DW_AT_comp_dir    : (indirect string, offset: 0xe4a): /tmp/package-gem-staging-6d7699fb2e5ccddffbb0ca5b66855e3971f9db4da3984f8ef7845da10088/var/lib/gems/2.3.0/gems/json-2.1.0/ext/json/ext/generator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I 'solved' this by also sensing BUILD_PATH_PREFIX_MAP and using the old secure path if it's set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove temporary_directory() and --build-path-prefix-map,
and go back to the original code once BUILD_PATH_PREFIX_MAP is implemented by old compilers.

# different output on successive runs.
Find.find(installdir) do |path|
if path =~ /.*(gem_make.out|Makefile)$/
logger.info("Removing %s" % path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about also logging the reason for removing these files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@wasosa
Copy link

wasosa commented Jun 22, 2017

Nice getting the ball rolling on reproducible builds here.

# Obey Reproducible Builds spec, see https://reproducible-builds.org/
# If set, individual package formats should set all timestamps to it
# and be extra-careful to avoid randomness in output.
if ENV.include?('SOURCE_DATE_EPOCH')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the :environment_variable feature in Clamp to achieve this. Can you refactor this to move this setting into the command line options instead?

(Note, what I am suggesting will still work for you with environment variables, it just is configured and declared in a different (and documented) way)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return Stud::Temporary.directory("package-#{type}-staging")
else
# Choose a predictable build path; see https://reproducible-builds.org/docs/build-path/
root = ENV["TMP"] || ENV["TMPDIR"] || ENV["TEMP"] || "/tmp"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fpm already has a setting for this: --workdir WORKDIR

We can add an environment variable if you believe you need one, but --workdir is intended for this purpose, at least, as far as I can tell.

Dan Kegel added 2 commits June 22, 2017 14:29
Use BUILD_PATH_PREFIX_MAP to avoid predictable build directory
Use clamp instead of ENV
Fix copy-n-paste that caused rspec failure.
@dankegel
Copy link
Contributor Author

I think all the issues raised during review have been resolved.
On Ubuntu 17.04, rspec is passing as well as it did before my changes
(filed #1357 for the one pre-existing failure).

@dankegel
Copy link
Contributor Author

Travis is unhappy with this PR because it's running on Ubuntu 12.04.
The --sort option wasn't added to gnu tar until circa Ubuntu 16.04.

I guess the test should skip if tar doesn't support --sort ...

Dan Kegel added 7 commits June 23, 2017 09:04
… builds

Needed e.g. on mac, where system ar is old, but gnu ar is available as 'gar' via 'brew install binutils'.

Eventually this commit can be reverted, once all ar's in existence default to no timestamps.

(cf. Homebrew/homebrew-core#14860 but don't wait for that to land)
system("#{ar_cmd.join(' ')} foo.ar /dev/null && env LC_TIME=C TZ=GMT ar tv foo.ar | grep 1970 > /dev/null")
FileUtils.rm_f("foo.ar")
if $?.exitstatus != 0
skip("This system doesn't seem to have an ar that can omit timestamps")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be in FPM, not the tests. Otherwise a user could request a repeatable-build and think fpm is doing the right thing but the ar isn't capable and we end up lying to the user, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in fpm, in the ar_cmd and tar_cmd functions. Unfortunately, those don't pass back the boolean. I should refactor that so I can avoid duplicating the test.

:environment_variable => "SOURCE_DATE_EPOCH"

option "--build-path-prefix-map", "BUILD_PATH_PREFIX_MAP",
"If SOURCE_DATE_EPOCH is given without BUILD_PATH_PREFIX_MAP, " \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description seems to be about what happens if you do not provide this flag, but not what happens if you do provide it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because I wrote it before I figured out what it'd do :-) I'll add that soon. Bit hard to test without a compiler that supports it, so maybe I'll have to build one.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, makes sense. Is this purely about compilers? Which compiler(s)? Would it be useful to instruct users on which software this affected? (gcc, for example, but not javac perhaps?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The set of folks who need to know are small, and anyone who invokes this feature will probably know to consult the doc at https://reproducible-builds.org/ in case of trouble.
This tool itself can't keep track of the state of play in which other tools support reproducible builds, and the doc already links to that site; adding much more detail probably wouldn't be worth it.

# Get an array containing the recommended 'ar' command for this platform
# and the recommended options to quickly create/append to an archive
# without timestamps or uids (if possible).
def ar_cmd
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to default to calling ar in a way that doesn't use timestamps or uids. This is a behavior change,but perhaps not a problem.

However, if the user is asking for repeatable builds (with whatever flags) on a system who's ar is not capable of doing this, then this ar_cmd will fall back to one that is not repeatable.

My main goal; If a user is asking for deterministic builds, and the system cannot produce it, we should abort and error. Looking at this ar_cmd it is unclear if that is what is happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dankegel
Copy link
Contributor Author

I have yet to implement true support for BUILD_PATH_PREFIX_MAP (which, I think, involves appending our work directory to that environment variable). Once I do that, and folks are happy, I'll squash and submit a clean MR.

@dankegel
Copy link
Contributor Author

Closing in favor of #1360

@dankegel dankegel closed this Jun 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support reproducible packages
4 participants