Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RocksDB does not build in a reproducible way #7035

Closed
ottok opened this issue Jun 26, 2020 · 10 comments
Closed

RocksDB does not build in a reproducible way #7035

ottok opened this issue Jun 26, 2020 · 10 comments
Assignees

Comments

@ottok
Copy link
Contributor

ottok commented Jun 26, 2020

Expected behavior

All open source software should build in a reproducible manner. See https://reproducible-builds.org/

This improves security. This also makes builds faster for users of ccache and alike, since builds that are reproducible are also able to re-use previously built objects.

Actual behavior

RocksDB does not build in a reproducible manner. Two consecutive builds produce binaries with different contents, even though the code is exactly the same and the build environment is the same.

Steps to reproduce the behavior

Checkout a fresh RocksDB into directory "1" and build it. Checkout a fresh RocksDB into directory "2" and build it (in the same way). Run diff -r -U 0 1 2 and notice that the directories differ and they did not produce identical things.

This is what I see on my computer (I am building RocksDB in the MariaDB Server context):

diff -r -U 0 server.build1/builddir/storage/rocksdb/build_version.cc server.build2/builddir/storage/rocksdb/build_version.cc
--- server.build1/builddir/storage/rocksdb/build_version.cc     2020-06-26 16:04:54.234052273 +0300
+++ server.build2/builddir/storage/rocksdb/build_version.cc     2020-06-26 20:55:14.552438467 +0300
@@ -4 +4 @@
-const char* rocksdb_build_git_date = "rocksdb_build_git_date:@2020-06-26 13:01:45@";
+const char* rocksdb_build_git_date = "rocksdb_build_git_date:@2020-06-26 17:52:13@";
Binary files server.build1/builddir/storage/rocksdb/CMakeFiles/rocksdblib.dir/build_version.cc.o and server.build2/builddir/storage/rocksdb/CMakeFiles/rocksdblib.dir/build_version.cc.o differ
Binary files server.build1/builddir/storage/rocksdb/ha_rocksdb.so and server.build2/builddir/storage/rocksdb/ha_rocksdb.so differ
Binary files server.build1/builddir/storage/rocksdb/librocksdblib.a and server.build2/builddir/storage/rocksdb/librocksdblib.a differ
Binary files server.build1/builddir/storage/rocksdb/mariadb-ldb and server.build2/builddir/storage/rocksdb/mariadb-ldb differ
Binary files server.build1/builddir/storage/rocksdb/mysql_ldb and server.build2/builddir/storage/rocksdb/mysql_ldb differ
Binary files server.build1/builddir/storage/rocksdb/sst_dump and server.build2/builddir/storage/rocksdb/sst_dump differ

Solution

Stop adding a timestamp into build_version.cc. Storing the git commit id should be enough. Storing a timestamp is useless since the code is always the same even though one set of files in the same commit is seconds or an hour older. Just remove this and it will be fixed.

See also

This is related to #6276 which was closed without issue being solved.

@ottok ottok changed the title RocksDB does not build in a reproducible RocksDB does not build in a reproducible way Jun 26, 2020
@adamretter
Copy link
Collaborator

@ottok Would you like to send a PR that addresses this?

@ottok
Copy link
Contributor Author

ottok commented Jun 29, 2020

I looked at the code but I did not understand it well enough to be able to change it properly. I can help testing a patch if somebody creates one.

@ottok
Copy link
Contributor Author

ottok commented Dec 9, 2020

Filed this downstream as http://bugs.debian.org/976985 to hopefully attract more eyes on this issue.

You can at any time review the latest status at https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/mariadb-10.5.html

@mrambacher
Copy link
Contributor

I have some ideas on how we could potentially fix this issue. I will take a look.

@mrambacher mrambacher self-assigned this Dec 21, 2020
@mrambacher
Copy link
Contributor

mrambacher commented Dec 21, 2020

I have some ideas on how to fix this issue but need guidance on what the community wants. Currently, the build_version contains two strings:

  • "rocksdb_build_git_sha" -> The SHA obtained by "git rev-parse HEAD"
  • "rocksdb_build_compile_date -> The date at which this library was built.

The latter field is the one causing the non-reproducible builds.

My suggestion is to do the following:
For "compile date", this value will be one of the following:

  • If the current branch has no modifications (e.g., a clean branch), it will be the "date" of the last check-in on that branch (git log -1 --date=format:"%Y-%m-%d %T" --format="%ad")
  • If the current branch has modifications, the date will be the date of the build (current behavior).

Additionally, we should add "rocksb_build_git_branch". This will be the name of the branch being built (git status --short -b)

@ottok
Copy link
Contributor Author

ottok commented Dec 21, 2020

Personally I would always use the git commit ID and only it, and not allow builds that have uncommitted changes, as they cannot be tracked in any way. Also, if you want traceability, supporting reproducible builds should be a priority for you :)

But if you want to support these other options as well and find value in branch names and dates too, then what you suggested is OK.

@mrambacher
Copy link
Contributor

If you do not allow builds with uncommitted changes, there will have to be some switches so that developers can do a build to address issues and add features. Though I understand it would be nice for a "production build" to not allow uncommitted changes.

@adamretter
Copy link
Collaborator

adamretter commented Dec 21, 2020

@mrambacher I think all of your suggestions sound correct to me.

I also like the general idea of adding rocksb_build_git_branch, however the git command you suggested won't work for tags which is generally what I build RocksJava releases from. Instead could you please use the following: git symbolic-ref -q --short HEAD || git describe --tags --exact-match. I believe that works for branches and tags.

@ottok
Copy link
Contributor Author

ottok commented Feb 1, 2021

Thanks @mrambacher. I will report back to you in a couple of weeks if RocksDB (and MariaDB) is really reproducible now with these changes.

@mrambacher
Copy link
Contributor

@ottok FWIW, I did not get the libraries to be identical on separate builds, but did not track down further what the issue was. Libraries are still different, even if the input source files are the same. Please let me know if there are any further issues.

codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this issue Mar 5, 2021
Summary:
Closes facebook#7035

Changed how build_version.cc was generated:
- Included the GIT tag/branch in the build_version file
- Changed the "Build Date" to be:
      - If the GIT branch is "clean" (no changes), the date of the last git commit
      - If the branch is not clean, the current date
 - Added APIs to access the "build information", rather than accessing the strings directly.

The build_version.cc file is now regenerated whenever the library objects are rebuilt.

Verified that the built files remain the same size across builds on a "clean build" and the same information is reported by sst_dump --version

Pull Request resolved: facebook#7866

Reviewed By: pdillinger

Differential Revision: D26086565

Pulled By: mrambacher

fbshipit-source-id: 6fcbe47f6033989d5cf26a0ccb6dfdd9dd239d7f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants