Skip to content

Make the build of extensions reproducible#2481

Closed
nlewo wants to merge 4 commits intoruby:masterfrom
nlewo:master
Closed

Make the build of extensions reproducible#2481
nlewo wants to merge 4 commits intoruby:masterfrom
nlewo:master

Conversation

@nlewo
Copy link
Copy Markdown

@nlewo nlewo commented Nov 20, 2018

Description:

When a package contains extension, the build is not reproducible due to temporary filepaths in some files which are not required at run time. This PR removes all of these files to make the build reproducible (https://reproducible-builds.org).

This PR has been tested on several gems, such as msgpack, strptime, yajl-ruby and all fluentd ruby dependencies.

See also the issue https://bugs.ruby-lang.org/issues/15304.

cc @nobu


Tasks:

  • Describe the problem / feature
  • Write tests
  • Write code to solve the problem
  • Get code review from coworkers / friends

I will abide by the code of conduct.

The mkmf.log file can contain name of temporary files used at build
time which makes the extension build non reproducible [1]. So, this
file has to be removed at package installation.

The generated (by the mkmf library) Makefile contains the `mkmf.log` file in
the `distclean` target, but it considers the file is in the build
directory while it has been moved to the installation dir.

This patch just deletes this file moving in order to be able to
distclean it.

[1] https://reproducible-builds.org/
@nlewo nlewo changed the title Make the build of extension reproducible Make the build of extensions reproducible Nov 20, 2018
@nlewo
Copy link
Copy Markdown
Author

nlewo commented Nov 20, 2018

Tests are failing because they are checking the existence of the files deleted by this PR:/ I would prefer to wait for feedback before fixing these tests.

Comment thread lib/rubygems/ext/builder.rb Outdated
@segiddins
Copy link
Copy Markdown
Contributor

Can you try to add a test for making extension builds “reproducible”, as well as documenting what exactly that means?

This removes files generated at build time that are no longer
required. The main purposes is to allow reproducible build but this
also saves disk space.

Indeed, the `Makefile` and `mkmf.log` are not reproducbile since they
contain temporary filepaths.
The Makefile output can still be viewed when the `verbose` flag is enable.
@nlewo
Copy link
Copy Markdown
Author

nlewo commented Nov 21, 2018

Basically, a build is reproducible if the build process always creates the same result from the same source code. https://reproducible-builds.org/ is providing more (and better) explanations.
@segiddins Where should I document that? In the first commit maybe?

I don't have any experience with Ruby so it's really time consuming for me to write a test;(

But a test for this PR should

  1. install a gem package,
  2. get the hash of all generated files
  3. install the same gem package again
  4. compare the hash of generated files with previous hashes

I started to make current tests working.

@MSP-Greg
Copy link
Copy Markdown
Contributor

@nlewo

Thanks for your work on this.

a build is reproducible if the build process always creates the same result

I'm not sure if there's agreement on what the meaning of 'the same result' is. If one looks at

https://reproducible-builds.org/docs/definition/

the critical phrase is 'any party can recreate bit-by-bit identical copies of all specified artifacts'

In addition - 'The artifacts of a build are the parts of the build results that are the desired primary output.'

A packaged gem (which is a single file) should be reproducible. As to a 'gem install', I'm not sure if files like mkmf.log, gem_make.out, and Makefile would be considered 'primary output' or 'specified artifacts'...

@nlewo
Copy link
Copy Markdown
Author

nlewo commented Nov 28, 2018

@MSP-Greg The point is distributions (at least ArchLinux and NixOS) are using gem install to create their packages which should be reproducible. There are also Docker users who could use gem install at image build time (these images should also be reproducible to be referenced with the Docker Digest id).
I think it would be more convenient if gem install could produce a reproducible output, otherwise, everybody would have to implement their own cleaning phases after a gem install. Note this is not what we have to do with the python setup.py install since some patchs have been recently merged.

Also, the mkmf.log and gem_make.out files are not required since their contents can be get by the user (in case of failure or if the verbose flags is enable) on stdout. I however don't know how the Makefile can be useful after the installation phase.

@MSP-Greg
Copy link
Copy Markdown
Contributor

@nlewo

So, the assumption here is that an external application is determining whether a gem install is correct by essentially hashing every file/folder created by gem install?

If so, and given that the 'byproducts / non-primary output / non-runtime' files may be helpful for some users, should a option be added to gem install regarding these items? I don't see a reason why whether it's opt-in or opt-out would be an issue...

@nlewo
Copy link
Copy Markdown
Author

nlewo commented Nov 28, 2018

So, the assumption here is that an external application is determining whether a gem install is correct by essentially hashing every file/folder created by gem install?

This is how you compare two builds - of the same gem - to know if it it is reproducible.

Regarding opt-in/opt-out, by default, the output of gem install should be reproducible. A flag such as debug could make it non-reproducible (because it could add debug/log files in the output path).

@MSP-Greg
Copy link
Copy Markdown
Contributor

I think --debug is already in use, maybe call the option --dirty or --nonrepro?

compare two builds

Since gem build exists, maybe refer to these as 'installs'...

As long as there's an option, I think it's a good idea...

@nlewo
Copy link
Copy Markdown
Author

nlewo commented Nov 28, 2018

compare two builds

Sorry, I was not enough clear! I was talking from the distribution packager point of view: when a distribution builds a ruby package, it uses gem install. So, from the gem point of view it is a installation while from the distribution point of view it is a build.

I don't understand why we should add a --dirty flag. What is the problem to remove the files (mkmf.log, gem_make.out and Makefile) as proposed by this PR?
From my point of view, mkmf.log and gem_make.out are useless. Regarding the Makefile, nobody affirms they are useful once the installation is done.

@MSP-Greg
Copy link
Copy Markdown
Contributor

I don't understand why we should add a --dirty flag.

We're changing previous behavior by deleting these files that are a byproduct of compiling, which I'm okay with.

If a user finds an issue with installing a gem with a particular OS/compiler, having these files may be helpful. Hence, the --dirty flag...

@nlewo
Copy link
Copy Markdown
Author

nlewo commented Dec 4, 2018

@MSP-Greg In the use case you are talking about (debugging an issue), the --debug flag would be suitable. Moreover, if this flag already exists, it would be easier for me to implement that without having to add a new flag (I know nothing about ruby:/).
So the idea would be to generate these files if the --debug flag is enable.
Would you be ok with that or do you prefer a --dirty flag?

@hsbt
Copy link
Copy Markdown
Member

hsbt commented Jan 4, 2019

I never approve your current approach for supporting the reproducible build. Because of this makes difficult to break the compatiblity.

Please consider to store existed logs to new or other directory.

@hsbt
Copy link
Copy Markdown
Member

hsbt commented Jan 17, 2019

No feedback with a couple of weeks. Please open another pull-request in the next time.

@nobu
Copy link
Copy Markdown
Member

nobu commented Apr 22, 2026

mkmf.log files are for development purpose.
I'm curious why rubygems installs them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants