New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If SOURCE_DATE_EPOCH is set, produce identical output every time #1356
Conversation
…tly fixes jordansissel#1232 Some work will be required in every package format handler; for now, all that's been done is the minimum needed to achieve reproducibility when running the canonical example fpm -s gem -t deb json See https://reproducible-builds.org/specs/source-date-epoch/ and https://wiki.debian.org/ReproducibleBuilds/StandardEnvironmentVariables for why this is an environment variable, and not a commandline option.
| @@ -2,4 +2,4 @@ | |||
|
|
|||
| * Package created with FPM. | |||
|
|
|||
| -- <%= maintainer %> <%= Time.now.strftime("%a, %d %b %Y %T %z") %> | |||
| -- <%= maintainer %> <%= (if attributes[:source_date_epoch].nil? then Time.now() else Time.at(attributes[:source_date_epoch].to_i) end).strftime("%a, %d %b %Y %T %z") %> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"%a is locale-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%a ceased being locale-specific in ruby 1.9....?
https://idiosyncratic-ruby.com/57-what-the-time.html says
"%a | Weekday Name (Abbreviated)
Locale-independent (English) three-letter name of weekday"
Verified:
$ export DEBEMAIL=dank@kegel.com
$ export DEBFULLNAME="Dan Kegel"
$ export SOURCE_DATE_EPOCH=123
$ LANG=de_DE.UTF-8
$ date
Do 22. Jun 15:42:00 PDT 2017
$ ruby -Ilib bin/fpm --verbose -s gem -t deb json
$ dpkg-deb -x *.deb tmp
$ zmore tmp/usr/share/doc/rubygem-json/changelog.gz
rubygem-json (2.1.0) whatever; urgency=medium
* Package created with FPM.
-- Dan Kegel <dank@kegel.com> Wed, 31 Dec 1969 16:02:03 -0800
lib/fpm/package.rb
Outdated
| return Stud::Temporary.directory("package-#{type}-staging") | ||
| else | ||
| # Choose a predictable build path; see https://reproducible-builds.org/docs/build-path/ | ||
| root = ENV["TMP"] || ENV["TMPDIR"] || ENV["TEMP"] || "/tmp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems somewhat regrettable (ie. it's not actually "predictable" as the comment claims IMHO) but I can see how there is no real workaround for fpm as it cannot, for example, guarantee the availability of, say, /tmp/buildd/whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fpm already has a setting for this: --workdir WORKDIR
We can add an environment variable if you believe you need one, but --workdir is intended for this purpose, at least, as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--workdir, alas, only sets TMP; it doesn't prevent the nondeterministic suffix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make --workdir do what you need, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may not be understanding this scenario, btw. Temporary directories, work directories, etc, are only need to exist during fpm's run -- when fpm exits, they should be automatically removed.
Can you help me understand what about temporary paths is important to this 'repeatable build' goal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you run the command "fpm -s gem -t deb json" twice in a row, it generates two different debs.
In the reproducible builds use case, we want it to generate two identical debs, regardless of who
runs the build when or where, as long as the OS is the same release / version / architecture.
(Otherwise e.g. clicking 'Force Rebuild' in buildbot generates failed uploads because reprepro rejects
the new deb with the same version and different checksum.)
One particular problem is that the C toolchain often embeds the absolute path to the build
directory in the binary, say, in the DW_AT_comp_dir field.
Until we teach the compiler to obey BUILD_PATH_PREFIX_MAP,
we can work around this by using a deterministic build path
( see https://reproducible-builds.org/docs/build-path/ ).
So, this has nothing to do with leaving a clean build machine after running fpm, but everything to do with bit-for-bit identical output packages given bit-for-bit identical input packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, that helps a bit. Are there known casess with your fpm -s gem -t deb json example where temporary directories (by file content or file path) are showing up somehow in the resulting .deb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the example from the next hunk down shows it. Here it is with more context:
$ fpm -s gem -t deb json; mkdir run1; mv *.deb run1
$ sleep 1
$ fpm -s gem -t deb json; mkdir run2; mv *.deb run2
$ diffoscope run1/rubygem-run2/json_2.1.0_amd64.debrubygem-json_2.1.0_amd64.deb run2/rubygem-run2/json_2.1.0_amd64.debrubygem-json_2.1.0_amd64.deb
...
├── data.tar.gz
│ ├── data.tar
│ │ ├── ./var/lib/gems/2.3.0/extensions/x86_64-linux/2.3.0/json-2.1.0/json/ext/generator.so
│ │ │ ├── readelf --wide --notes {}
│ │ │ │ @@ -1,5 +1,5 @@
│ │ │ │
│ │ │ │ Displaying notes found at file offset 0x000001c8 with length 0x00000024:
│ │ │ │ Owner Data size Description
│ │ │ │ GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
│ │ │ │ - Build ID: 438800c00798505fc92d9302ebf4ecfcd5aa5f57
│ │ │ │ + Build ID: 6364fd6a8cad33df79bd85b47faa3617e13304fe
│ │ │ ├── readelf --wide --debug-dump=info {}
│ │ │ │ @@ -2,60 +2,60 @@
│ │ │ │
│ │ │ │ Compilation Unit @ offset 0x0:
│ │ │ │ Length: 0x7605 (32-bit)
│ │ │ │ Version: 4
│ │ │ │ Abbrev Offset: 0x0
│ │ │ │ Pointer Size: 8
│ │ │ │ <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
│ │ │ │ - <c> DW_AT_producer : (indirect string, offset: 0x361): GNU C11 5.4.0 20160609 -mtune=generic -march=x86-64 -g -O2 -fstack-protector-strong -fPIC -fstack-protector-strong
│ │ │ │ + <c> DW_AT_producer : (indirect string, offset: 0x2d1): GNU C11 5.4.0 20160609 -mtune=generic -march=x86-64 -g -O2 -fstack-protector-strong -fPIC -fstack-protector-strong
│ │ │ │ <10> DW_AT_language : 12 (ANSI C99)
│ │ │ │ <11> DW_AT_name : (indirect string, offset: 0x1147): generator.c
│ │ │ │ - <15> DW_AT_comp_dir : (indirect string, offset: 0xae): /tmp/package-gem-staging-7f3a2a248718996b858ad69aa865ca5dc46125339c4d6f163bedf10f0b24/var/lib/gems/2.3.0/gems/json-2.1.0/ext/json/ext/generator
│ │ │ │ + <15> DW_AT_comp_dir : (indirect string, offset: 0xe4a): /tmp/package-gem-staging-6d7699fb2e5ccddffbb0ca5b66855e3971f9db4da3984f8ef7845da10088/var/lib/gems/2.3.0/gems/json-2.1.0/ext/json/ext/generator
All of those diffs are from the different build paths.
lib/fpm/package.rb
Outdated
| # Choose a predictable build path; see https://reproducible-builds.org/docs/build-path/ | ||
| root = ENV["TMP"] || ENV["TMPDIR"] || ENV["TEMP"] || "/tmp" | ||
| dir = File.join(root, 'fpm', 'reproducible-builds.org', attributes[:source_date_epoch], path) | ||
| FileUtils.mkdir_p(dir, :mode => 0700) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an insecure use of a tempdir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I asked for review on the mailing list!
Until something like https://wiki.debian.org/ReproducibleBuilds/BuildPathProposal lands, this is needed to avoid differences in the embedded DW_AT_comp_dir field in the native extension; here's diffoscope's output between two runs of 'fpm -s gem -t deb json':
├── data.tar.gz
│ ├── data.tar
│ │ ├── ./var/lib/gems/2.3.0/extensions/x86_64-linux/2.3.0/json-2.1.0/json/ext/generator.so
│ │ │ ├── readelf --wide --notes {}
│ │ │ │ @@ -1,5 +1,5 @@
│ │ │ │
│ │ │ │ Displaying notes found at file offset 0x000001c8 with length 0x00000024:
│ │ │ │ Owner Data size Description
│ │ │ │ GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
│ │ │ │ - Build ID: 438800c00798505fc92d9302ebf4ecfcd5aa5f57
│ │ │ │ + Build ID: 6364fd6a8cad33df79bd85b47faa3617e13304fe
│ │ │ ├── readelf --wide --debug-dump=info {}
│ │ │ │ @@ -2,60 +2,60 @@
│ │ │ │
│ │ │ │ Compilation Unit @ offset 0x0:
│ │ │ │ Length: 0x7605 (32-bit)
│ │ │ │ Version: 4
│ │ │ │ Abbrev Offset: 0x0
│ │ │ │ Pointer Size: 8
│ │ │ │ <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
│ │ │ │ - <c> DW_AT_producer : (indirect string, offset: 0x361): GNU C11 5.4.0 20160609 -mtune=generic -march=x86-64 -g -O2 -fstack-protector-strong -fPIC -fstack-protector-strong
│ │ │ │ + <c> DW_AT_producer : (indirect string, offset: 0x2d1): GNU C11 5.4.0 20160609 -mtune=generic -march=x86-64 -g -O2 -fstack-protector-strong -fPIC -fstack-protector-strong
│ │ │ │ <10> DW_AT_language : 12 (ANSI C99)
│ │ │ │ <11> DW_AT_name : (indirect string, offset: 0x1147): generator.c
│ │ │ │ - <15> DW_AT_comp_dir : (indirect string, offset: 0xae): /tmp/package-gem-staging-7f3a2a248718996b858ad69aa865ca5dc46125339c4d6f163bedf10f0b24/var/lib/gems/2.3.0/gems/json-2.1.0/ext/json/ext/generator
│ │ │ │ + <15> DW_AT_comp_dir : (indirect string, offset: 0xe4a): /tmp/package-gem-staging-6d7699fb2e5ccddffbb0ca5b66855e3971f9db4da3984f8ef7845da10088/var/lib/gems/2.3.0/gems/json-2.1.0/ext/json/ext/generator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I 'solved' this by also sensing BUILD_PATH_PREFIX_MAP and using the old secure path if it's set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove temporary_directory() and --build-path-prefix-map,
and go back to the original code once BUILD_PATH_PREFIX_MAP is implemented by old compilers.
lib/fpm/package/gem.rb
Outdated
| # different output on successive runs. | ||
| Find.find(installdir) do |path| | ||
| if path =~ /.*(gem_make.out|Makefile)$/ | ||
| logger.info("Removing %s" % path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about also logging the reason for removing these files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
|
Nice getting the ball rolling on reproducible builds here. |
lib/fpm/package.rb
Outdated
| # Obey Reproducible Builds spec, see https://reproducible-builds.org/ | ||
| # If set, individual package formats should set all timestamps to it | ||
| # and be extra-careful to avoid randomness in output. | ||
| if ENV.include?('SOURCE_DATE_EPOCH') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the :environment_variable feature in Clamp to achieve this. Can you refactor this to move this setting into the command line options instead?
(Note, what I am suggesting will still work for you with environment variables, it just is configured and declared in a different (and documented) way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lib/fpm/package.rb
Outdated
| return Stud::Temporary.directory("package-#{type}-staging") | ||
| else | ||
| # Choose a predictable build path; see https://reproducible-builds.org/docs/build-path/ | ||
| root = ENV["TMP"] || ENV["TMPDIR"] || ENV["TEMP"] || "/tmp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fpm already has a setting for this: --workdir WORKDIR
We can add an environment variable if you believe you need one, but --workdir is intended for this purpose, at least, as far as I can tell.
Use BUILD_PATH_PREFIX_MAP to avoid predictable build directory Use clamp instead of ENV Fix copy-n-paste that caused rspec failure.
9100240
to
4609726
Compare
|
I think all the issues raised during review have been resolved. |
|
Travis is unhappy with this PR because it's running on Ubuntu 12.04. I guess the test should skip if tar doesn't support --sort ... |
… builds Needed e.g. on mac, where system ar is old, but gnu ar is available as 'gar' via 'brew install binutils'. Eventually this commit can be reverted, once all ar's in existence default to no timestamps. (cf. Homebrew/homebrew-core#14860 but don't wait for that to land)
spec/fpm/package/deb_spec.rb
Outdated
| system("#{ar_cmd.join(' ')} foo.ar /dev/null && env LC_TIME=C TZ=GMT ar tv foo.ar | grep 1970 > /dev/null") | ||
| FileUtils.rm_f("foo.ar") | ||
| if $?.exitstatus != 0 | ||
| skip("This system doesn't seem to have an ar that can omit timestamps") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should be in FPM, not the tests. Otherwise a user could request a repeatable-build and think fpm is doing the right thing but the ar isn't capable and we end up lying to the user, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is in fpm, in the ar_cmd and tar_cmd functions. Unfortunately, those don't pass back the boolean. I should refactor that so I can avoid duplicating the test.
| :environment_variable => "SOURCE_DATE_EPOCH" | ||
|
|
||
| option "--build-path-prefix-map", "BUILD_PATH_PREFIX_MAP", | ||
| "If SOURCE_DATE_EPOCH is given without BUILD_PATH_PREFIX_MAP, " \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description seems to be about what happens if you do not provide this flag, but not what happens if you do provide it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because I wrote it before I figured out what it'd do :-) I'll add that soon. Bit hard to test without a compiler that supports it, so maybe I'll have to build one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, makes sense. Is this purely about compilers? Which compiler(s)? Would it be useful to instruct users on which software this affected? (gcc, for example, but not javac perhaps?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The set of folks who need to know are small, and anyone who invokes this feature will probably know to consult the doc at https://reproducible-builds.org/ in case of trouble.
This tool itself can't keep track of the state of play in which other tools support reproducible builds, and the doc already links to that site; adding much more detail probably wouldn't be worth it.
| # Get an array containing the recommended 'ar' command for this platform | ||
| # and the recommended options to quickly create/append to an archive | ||
| # without timestamps or uids (if possible). | ||
| def ar_cmd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to default to calling ar in a way that doesn't use timestamps or uids. This is a behavior change,but perhaps not a problem.
However, if the user is asking for repeatable builds (with whatever flags) on a system who's ar is not capable of doing this, then this ar_cmd will fall back to one that is not repeatable.
My main goal; If a user is asking for deterministic builds, and the system cannot produce it, we should abort and error. Looking at this ar_cmd it is unclear if that is what is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…mented, and unavailable
|
I have yet to implement true support for BUILD_PATH_PREFIX_MAP (which, I think, involves appending our work directory to that environment variable). Once I do that, and folks are happy, I'll squash and submit a clean MR. |
|
Closing in favor of #1360 |
Partly fixes #1232
Some work will be required in every package format handler;
for now, all that's been done is the minimum needed to achieve
reproducibility when running the canonical example:
See https://reproducible-builds.org/specs/source-date-epoch/
and https://wiki.debian.org/ReproducibleBuilds/StandardEnvironmentVariables
for why this is an environment variable, and not a commandline option.