-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducible builds #21
Comments
Hi, this has become a nuisance for me as well, so I dug in a bit. Concerning the build timestamp (which also contains R version and operation system), it leads to an additional entry in the DESCRIPTION file, and it does not seem to be present anywhere else. I could not find reference to the operating system anywhere else either. Concerning the source directory/library directory references, for the packages I surveyed at least, they could be found in the .so, .a, ... binary files. They are not stamps added by R, but come with the debug symbols that are added by R by default. It does not seem to be possible to remove those debug symbols with a flag (see https://stackoverflow.com/questions/9607155/make-gcc-put-relative-filenames-in-debug-information), so the best option IMO might be to invoke I do not guarantee this will make the builds reproducible, but it should address the two issues you pointed out, without having to acquire a lock. If this sounds good to you, I'll try to do a proof-of-concept with an additional reproducibility test (I hope it will pass!) sometime during the week. |
Hi Hadrien, It's not just the compiler adding the debug symbols. When you take a checksum of all the files in the installed package, you will see that the checksum of some .rdb/.rdx files vary as well. I was able to load one of these files in R and see that it had references to the library directory. These checksums become identical when you keep the If after this, we still want to strip the debug symbols, we can add a default Makevars file with the appropriate flags. |
Ok got it, I was wrong. Do you have this issue resolved internally? I just looked at whether I could find any path in the output files (like, grep -R ...), and I could only find them in the debug symbols of the .so files, so I thought "problem solved!". I don't know how this info ends up in the .rdb/rdx files, but actually even if I remove the debug symbols in the .so files, there are still a few bits that differ in the ELF header, for whatever reason. So, to have a reproducible build for things that ultimately go into a container layer, built-timestamp, R_MAKEVARS_USER and the package path (e.g., R CMD INSTALL ) must be constant. |
Resolved as much as possible in 5bb812b. See full commit message for details and caveats. |
I've noticed in openSUSE RPMs , and it appears to also be Fedora RPMs, that the builds are not reproducible so these tricks here havent made their way into R or build systems. I havent checked Debian yet. I did notice that https://salsa.debian.org/reproducible-builds/diffoscope/commit/4d31312 is adding analysis of R packages, esp. the files which embed timestamps and paths. Is there any ongoing effort to have R support reproducible builds? |
It is not clear with your message if you are building with bazel. This project is an extension to the bazel build system. These rules should have reproducible builds, at least from R 3.4 onwards. If you are building outside of bazel, use at least R 3.6, give the |
Hiya @siddharthab , I am referring to the general problem of R reproducible builds, which bazel appears to be trail-blazing.
openSUSE/build-compare#34 does the opposite approach of what you have done here, which is ignoring those specific items which change in every build, so they dont replace the existing 'identical' build artifacts. |
I thought staged installs in R 3.6 solved the problem of hard-coded paths. But I suppose the stage directory itself is not constant. R will simply need to accept a user setting as the stage directory prefix to get complete reproducibility. I suppose it can be brought up in the r-devel mailing list. |
The packages built have stamped information about the built timestamp, the source directory and the library directory for the installation. This is especially bothersome with docker images as different layers are created with each build.
The build timestamp can be fixed to an empty string with the
--built_timestamp
flag toR CMD INSTALL
. For the rest, we need to build and install in a constant directory, which means fixing a /tmp path for a package, and acquiring a lock on that path so that builds in other workspaces do not interfere with this build.The text was updated successfully, but these errors were encountered: