Skip to content
Permalink
Browse files Browse the repository at this point in the history
Implement reproducible gem -> deb conversion (#1360)
* Add option --source-date-epoch-default and implement for deb output.

This is the first step towards supporting bit-for-bit identical
output files given identical inputs.

Alas, Apple's ar is not too good at reading gnu ar archives,
so always use ar_cmd to find ar.

* deb: remove lines duplicated in a tragic merge conflict

Probably introduced by 62d0060 and not removed by 500f0c0

* Add options --source-date-epoch-from-changelog and --gem-stagingdir to support bit-for-bit reproducible gem -> deb conversion

In those cases where we can get the release date out of the changelog,
use it; otherwise fall back to the value given by SOURCE_DATE_EPOCH aka --source-date-epoch-default.

--gem-stagingdir is a bit of a kludge, only needed because no
compiler supports https://reproducible-builds.org/specs/build-path-prefix-map/ yet.
Could have been global option, but not sure any other package handler
invokes compilers?  Could hoist it up later.

Also:
- Defer initializing staging_path so subclasses can sneak in new value
- gem: remove build files

* gem: handle a few more gem changelog variants

* gem: also remove mkmf.log; lets ffi, kgio, raindrops, and ruby-ldap build reproducibly.

* deb: don't expect diffoscope to be installed in /usr/bin.  Lets it be found on mac.

* gem: document new options
  • Loading branch information
dankegel authored and jordansissel committed Jul 20, 2017
1 parent 488863b commit 14c4819
Show file tree
Hide file tree
Showing 9 changed files with 383 additions and 41 deletions.
52 changes: 52 additions & 0 deletions docs/source/gem.rst
Expand Up @@ -182,3 +182,55 @@ Nice, eh? Now, let's show what happens after these packages are installed::
You can put these .deb files in your apt repo (assuming you have a local apt
repo, right?) and easily install them with 'apt-get' like: 'apt-get install
rubygem-cucumber' and expect dependencies to work nicely.

Deterministic output
--------------------

If convert a gem to a deb twice, you'll get different output even though the inputs didn't change:

% fpm -s gem -t deb json
% mkdir run1; mv *.deb run1
% sleep 1
% fpm -s gem -t deb json
% mkdir run2; mv *.deb run2
% cmp run1/*.deb run2/*.deb
run1/rubygem-json_2.1.0_amd64.deb run2/rubygem-json_2.1.0_amd64.deb differ: byte 124, line 4
This can be a pain if you're uploading packages to an apt repository
which refuses reuploads that differ in content, or if you're trying
to verify that packages have not been infected.
There are several sources of nondeterminism; use 'diffoscope run1/*.deb run2/*.deb' if you
want the gory details. See http://reproducible-builds.org for the whole story.

To remove nondeterminism due to differing timestamps,
use the option --source-date-epoch-from-changelog; that will use the timestamp from
the gem's changelog.

In case the gem doesn't have a standard changelog (and most don't, alas),
use --source-date-epoch-default to set a default integer Unix timestamp.
(This will also be read from the environment variable SOURCE_DATE_EPOCH if set.)

Gems that include native extensions may have nondeterministic output
because of how the extensions get built (at least until fpm and
compilers finish implementing the reproducible-builds.org
recommendations). If this happens, use the option --gem-stagingdir=/tmp/foo.

For instance, picking the timestamp 1234 seconds after the Unix epoch:

% fpm -s gem -t deb --source-date-epoch-default=1234 --gem-stagingdir=/tmp/foo json
% mkdir run1; mv *.deb run1
% sleep 1
% fpm -s gem -t deb --source-date-epoch-default=1234 --gem-stagingdir=/tmp/foo json
% mkdir run2; mv *.deb run2
% cmp run1/*.deb run2/*.deb
% dpkg-deb -c run1/*.deb
...
-rw-rw-r-- 0/0 17572 1969-12-31 16:20 ./var/lib/gems/2.3.0/gems/json-2.1.0/CHANGES.md
% date --date @1234
Wed Dec 31 16:20:34 PST 1969

If after using those three options, the files are still different,
you may have found a bug; we might not have plugged all the sources
of nondeterminism yet. As of this writing, these options are only
implemented for reading gems and writing debs, and only verified
to produce identical output when run twice on the same Linux system.
14 changes: 14 additions & 0 deletions lib/fpm/command.rb
Expand Up @@ -233,6 +233,20 @@ def help(*args)
"copying, downloading, etc. Roughly any scratch space fpm needs to build " \
"your package.", :default => Dir.tmpdir

option "--source-date-epoch-from-changelog", :flag,
"Use release date from changelog as timestamp on generated files to reduce nondeterminism. " \
"Experimental; only implemented for gem so far. ",
:default => false

option "--source-date-epoch-default", "SOURCE_DATE_EPOCH_DEFAULT",
"If no release date otherwise specified, use this value as timestamp on generated files to reduce nondeterminism. " \
"Reproducible build environments such as dpkg-dev and rpmbuild set this via envionment variable SOURCE_DATE_EPOCH " \
"variable to the integer unix timestamp to use in generated archives, " \
"and expect tools like fpm to use it as a hint to avoid nondeterministic output. " \
"This is a Unix timestamp, i.e. number of seconds since 1 Jan 1970 UTC. " \
"See https://reproducible-builds.org/specs/source-date-epoch ",
:environment_variable => "SOURCE_DATE_EPOCH"

parameter "[ARGS] ...",
"Inputs to the source package type. For the 'dir' type, this is the files" \
" and directories you want to include in the package. For others, like " \
Expand Down
2 changes: 1 addition & 1 deletion lib/fpm/package.rb
Expand Up @@ -175,8 +175,8 @@ def initialize
@directories = []
@attrs = {}

staging_path
build_path
# Dont' initialize staging_path just yet, do it lazily so subclass can get a word in.
end # def initialize

# Get the 'type' for this instance.
Expand Down
60 changes: 38 additions & 22 deletions lib/fpm/package/deb.rb
Expand Up @@ -243,7 +243,7 @@ def extract_info(package)
build_path("control").tap do |path|
FileUtils.mkdir(path) if !File.directory?(path)
# Unpack the control tarball
safesystem("ar p #{package} control.tar.gz | tar -zxf - -C #{path}")
safesystem(ar_cmd[0] + " p #{package} control.tar.gz | tar -zxf - -C #{path}")

control = File.read(File.join(path, "control"))

Expand Down Expand Up @@ -340,7 +340,7 @@ def parse_depends(data)

def extract_files(package)
# Find out the compression type
compression = `ar t #{package}`.split("\n").grep(/data.tar/).first.split(".").last
compression = `#{ar_cmd[0]} t #{package}`.split("\n").grep(/data.tar/).first.split(".").last
case compression
when "gz"
datatar = "data.tar.gz"
Expand All @@ -358,7 +358,7 @@ def extract_files(package)
end

# unpack the data.tar.{gz,bz2,xz} from the deb package into staging_path
safesystem("ar p #{package} #{datatar} " \
safesystem(ar_cmd[0] + " p #{package} #{datatar} " \
"| tar #{compression} -xf - -C #{staging_path}")
end # def extract_files

Expand Down Expand Up @@ -387,6 +387,22 @@ def output(output_path)
end
end

if attributes[:source_date_epoch].nil? and not attributes[:source_date_epoch_default].nil?
attributes[:source_date_epoch] = attributes[:source_date_epoch_default]
end
if attributes[:source_date_epoch] == "0"
logger.error("Alas, ruby's Zlib::GzipWriter does not support setting an mtime of zero. Aborting.")
raise "#{name}: source_date_epoch of 0 not supported."
end
if not attributes[:source_date_epoch].nil? and not ar_cmd_deterministic?
logger.error("Alas, could not find an ar that can handle -D option. Try installing recent gnu binutils. Aborting.")
raise "#{name}: ar is insufficient to support source_date_epoch."
end
if not attributes[:source_date_epoch].nil? and not tar_cmd_supports_sort_names_and_set_mtime?
logger.error("Alas, could not find a tar that can set mtime and sort. Try installing recent gnu tar. Aborting.")
raise "#{name}: tar is insufficient to support source_date_epoch."
end

attributes.fetch(:deb_systemd_list, []).each do |systemd|
name = File.basename(systemd, ".service")
dest_systemd = staging_path("lib/systemd/system/#{name}.service")
Expand Down Expand Up @@ -414,24 +430,6 @@ def output(output_path)
end
end

write_control_tarball

# Tar up the staging_path into data.tar.{compression type}
case self.attributes[:deb_compression]
when "gz", nil
datatar = build_path("data.tar.gz")
compression = "-z"
when "bzip2"
datatar = build_path("data.tar.bz2")
compression = "-j"
when "xz"
datatar = build_path("data.tar.xz")
compression = "-J"
else
raise FPM::InvalidPackageConfiguration,
"Unknown compression type '#{self.attributes[:deb_compression]}'"
end

# There are two changelogs that may appear:
# - debian-specific changelog, which should be archived as changelog.Debian.gz
# - upstream changelog, which should be archived as changelog.gz
Expand All @@ -442,6 +440,9 @@ def output(output_path)
mkdir_p(File.dirname(dest_changelog))
File.new(dest_changelog, "wb", 0644).tap do |changelog|
Zlib::GzipWriter.new(changelog, Zlib::BEST_COMPRESSION).tap do |changelog_gz|
if not attributes[:source_date_epoch].nil?
changelog_gz.mtime = attributes[:source_date_epoch].to_i
end
if attributes[:deb_changelog]
logger.info("Writing user-specified changelog", :source => attributes[:deb_changelog])
File.new(attributes[:deb_changelog]).tap do |fd|
Expand All @@ -461,6 +462,9 @@ def output(output_path)
if attributes[:deb_upstream_changelog]
File.new(dest_upstream_changelog, "wb", 0644).tap do |changelog|
Zlib::GzipWriter.new(changelog, Zlib::BEST_COMPRESSION).tap do |changelog_gz|
if not attributes[:source_date_epoch].nil?
changelog_gz.mtime = attributes[:source_date_epoch].to_i
end
logger.info("Writing user-specified upstream changelog", :source => attributes[:deb_upstream_changelog])
File.new(attributes[:deb_upstream_changelog]).tap do |fd|
chunk = nil
Expand Down Expand Up @@ -533,13 +537,19 @@ def output(output_path)
end

args = [ tar_cmd, "-C", staging_path, compression ] + data_tar_flags + [ "-cf", datatar, "." ]
if tar_cmd_supports_sort_names_and_set_mtime? and not attributes[:source_date_epoch].nil?
# Use gnu tar options to force deterministic file order and timestamp
args += ["--sort=name", ("--mtime=@%s" % attributes[:source_date_epoch])]
# gnu tar obeys GZIP environment variable with options for gzip; -n = forget original filename and date
args.unshift({"GZIP" => "-9n"})
end
safesystem(*args)

# pack up the .deb, which is just an 'ar' archive with 3 files
# the 'debian-binary' file has to be first
File.expand_path(output_path).tap do |output_path|
::Dir.chdir(build_path) do
safesystem("ar", "-qc", output_path, "debian-binary", "control.tar.gz", datatar)
safesystem(*ar_cmd, output_path, "debian-binary", "control.tar.gz", datatar)
end
end
end # def output
Expand Down Expand Up @@ -693,6 +703,12 @@ def write_control_tarball

args = [ tar_cmd, "-C", control_path, "-zcf", controltar,
"--owner=0", "--group=0", "--numeric-owner", "." ]
if tar_cmd_supports_sort_names_and_set_mtime? and not attributes[:source_date_epoch].nil?
# Force deterministic file order and timestamp
args += ["--sort=name", ("--mtime=@%s" % attributes[:source_date_epoch])]
# gnu tar obeys GZIP environment variable with options for gzip; -n = forget original filename and date
args.unshift({"GZIP" => "-9n"})
end
safesystem(*args)
end

Expand Down
101 changes: 101 additions & 0 deletions lib/fpm/package/gem.rb
Expand Up @@ -47,6 +47,22 @@ class FPM::Package::Gem < FPM::Package

option "--version-bins", :flag, "Append the version to the bins", :default => false

option "--stagingdir", "STAGINGDIR",
"The directory where fpm installs the gem temporarily before conversion. " \
"Normally a random subdirectory of workdir."

# Override parent method
def staging_path(path=nil)
@gem_staging_path ||= attributes[:gem_stagingdir] || Stud::Temporary.directory("package-#{type}-staging")
@staging_path = @gem_staging_path

if path.nil?
return @staging_path
else
return File.join(@staging_path, path)
end
end # def staging_path

def input(gem)
# 'arg' is the name of the rubygem we should unpack.
path_to_gem = download_if_necessary(gem, version)
Expand Down Expand Up @@ -231,6 +247,21 @@ def install_to_staging(gem_path)
FileUtils.mv("#{bin_path}/#{bin}", "#{bin_path}/#{bin}-#{self.version}")
end
end

if attributes[:source_date_epoch_from_changelog?]
detect_source_date_from_changelog(installdir)
end

# Remove generated Makefile and gem_make.out files, if any; they
# are not needed, and may contain generated paths that cause
# different output on successive runs.
Find.find(installdir) do |path|
if path =~ /.*(gem_make.out|Makefile|mkmf.log)$/
logger.info("Removing no longer needed file %s to reduce nondeterminism" % path)
File.unlink(path)
end
end

end # def install_to_staging

# Sanitize package name.
Expand All @@ -239,5 +270,75 @@ def install_to_staging(gem_path)
def fix_name(name)
return [attributes[:gem_package_name_prefix], name].join("-")
end # def fix_name

# Regular expression to accept a gem changelog line, and store date & version, if any, in named capture groups.
# Supports formats suggested by http://keepachangelog.com and https://github.com/tech-angels/vandamme
# as well as other similar formats that actually occur in the wild.
# Build it in pieces for readability, and allow version and date in either order.
# Whenever you change this, add a row to the test case in spec/fpm/package/gem_spec.rb.
# Don't even try to handle dates that lack four-digit years.
# Building blocks:
P_RE_LEADIN = '^[#=]{0,3}\s?'
P_RE_VERSION_ = '[\w\.-]+\.[\w\.-]+[a-zA-Z0-9]'
P_RE_SEPARATOR = '\s[-=/(]?\s?'
P_RE_DATE1 = '\d{4}-\d{2}-\d{2}'
P_RE_DATE2 = '\w+ \d{1,2}(?:st|nd|rd|th)?,\s\d{4}'
P_RE_DATE3 = '\w+\s+\w+\s+\d{1,2},\s\d{4}'
P_RE_DATE = "(?<date>#{P_RE_DATE1}|#{P_RE_DATE2}|#{P_RE_DATE3})"
P_RE_URL = '\(https?:[-\w/.%]*\)' # In parens, per markdown
P_RE_GTMAGIC = '\[\]' # github magic version diff, per chandler
P_RE_VERSION = "\\[?(?:Version |v)?(?<version>#{P_RE_VERSION_})\\]?(?:#{P_RE_URL}|#{P_RE_GTMAGIC})?"
# The final RE's:
P_RE_VERSION_DATE = "#{P_RE_LEADIN}#{P_RE_VERSION}#{P_RE_SEPARATOR}#{P_RE_DATE}"
P_RE_DATE_VERSION = "#{P_RE_LEADIN}#{P_RE_DATE}#{P_RE_SEPARATOR}#{P_RE_VERSION}"

# Detect release date, if found, store in attributes[:source_date_epoch]
def detect_source_date_from_changelog(installdir)
name = self.name.sub("rubygem-", "") + "-" + self.version
changelog = nil
datestr = nil
r1 = Regexp.new(P_RE_VERSION_DATE)
r2 = Regexp.new(P_RE_DATE_VERSION)

# Changelog doesn't have a standard name, so check all common variations
# Sort this list using LANG=C, i.e. caps first
[
"CHANGELIST",
"CHANGELOG", "CHANGELOG.asciidoc", "CHANGELOG.md", "CHANGELOG.rdoc", "CHANGELOG.rst", "CHANGELOG.txt",
"CHANGES", "CHANGES.md", "CHANGES.txt",
"ChangeLog", "ChangeLog.md", "ChangeLog.txt",
"Changelog", "Changelog.md", "Changelog.txt",
"changelog", "changelog.md", "changelog.txt",
].each do |changelogname|
path = File.join(installdir, "gems", name, changelogname)
if File.exist?(path)
changelog = path
File.open path do |file|
file.each_line do |line|
if line =~ /#{self.version}/
[r1, r2].each do |r|
if r.match(line)
datestr = $~[:date]
break
end
end
end
end
end
end
end
if datestr
date = Date.parse(datestr)
sec = date.strftime("%s")
attributes[:source_date_epoch] = sec
logger.debug("Gem %s has changelog date %s, setting source_date_epoch to %s" % [name, datestr, sec])
elsif changelog
logger.debug("Gem %s changelog %s did not have recognizable date for release %s" % [name, changelog, self.version])
else
logger.debug("Gem %s did not have changelog with recognized name" % [name])
# FIXME: check rubygems.org?
end
end # detect_source_date_from_changelog

public(:input, :output)
end # class FPM::Package::Gem

0 comments on commit 14c4819

Please sign in to comment.