Skip to content
This repository has been archived by the owner. It is now read-only.

luajit has build-CPU-dependent compilation output #110

Closed
bmwiedemann opened this issue Apr 1, 2020 · 12 comments
Closed

luajit has build-CPU-dependent compilation output #110

bmwiedemann opened this issue Apr 1, 2020 · 12 comments

Comments

@bmwiedemann
Copy link

While working on reproducible builds for openSUSE, I found that
our bcc package contained binaries produced by luajit that varied depending on the build machine's CPU features.

See https://reproducible-builds.org/ for why this matters.

I traced this down to be triggered by the JIT_F_SSE4_2 flag that changes lj_str_hash.c to use lj_str_hash_opt as alternate hash function.
In other places, the easiest way to guarantee reproducible output is to sort iterations over hash keys by key value.

See also on this topic: https://github.com/bmwiedemann/theunreproduciblepackage/tree/master/hash

@siddhesh
Copy link
Collaborator

siddhesh commented Apr 2, 2020

I'm curious to know how you'd get reproducible output with luajit at all; its output can vary not only based on the script it compiles but also based on the inputs the script processes. I reckon string hashing ought to be the least of your concerns there.

@bmwiedemann
Copy link
Author

based on the inputs the script processes

reproducible builds means it is possible to get the same output from the same inputs (aka determinism) anytime anywhere.

When you run luajit compilation, it already produces the same .o files from the same .lua files on different machines, as long as all of them have AVX4.2 or all of them dont have it. At least for bcc.lua

@siddhesh
Copy link
Collaborator

siddhesh commented Apr 2, 2020

I don't think I've understood the bcc use case. Could you leave detailed instructions for me to understand and evaluate the issue over the weekend? Thanks.

@bmwiedemann
Copy link
Author

I couldnt attach it here for some reason, so I uploaded it to
https://www.zq1.de/~bernhard/temp/bcc.lua.txt

This is somehow produced from
https://github.com/iovisor/bcc/tree/master/src/lua/bcc

I expect that to just be a normal .lua file that can be compiled.

Here is the diff from compiling on different CPUs (using qemu-kvm -cpu kvm64 for non-AVX4.2)
http://rb.zq1.de/compare.factory-20200402/bcc-compare.out

@bmwiedemann
Copy link
Author

I found another victim of this issue:
When building openSUSE's neovim package, it calls

/usr/bin/luajit /home/abuild/rpmbuild/BUILD/neovim-0.4.3/scripts/genvimvim.lua /home/abuild/rpmbuild/BUILD/neovim-0.4.3/src/nvim /home/abuild/rpmbuild/BUILD/neovim-0.4.3/build/runtime/syntax/vim/generated.vim /home/abuild/rpmbuild/BUILD/neovim-0.4.3/build/funcs_data.mpack

And that produces CPU-type-related order variations in generated.vim
There are also such order variations in generated binaries.

@siddhesh
Copy link
Collaborator

Would a flag to disable this be helpful, perhaps in the build so that you have a way to always build moonjit that gives reproducible results for such cases?

We should probably look at whether the whole string internment into a hash table is worth it too; I haven't looked too closely at it TBH, just assumed that it's useful and improved upon a patch from openresty/luajit2.

@bmwiedemann
Copy link
Author

bmwiedemann commented Apr 14, 2020

In other places we use existence of the SOURCE_DATE_EPOCH environment variable as a flag that a reproducible build is wanted.

https://reproducible-builds.org/docs/source-date-epoch/ documents its contents and use.

If you got patches for moonjit, I can test them.

@siddhesh
Copy link
Collaborator

I've pushed a new flag LUAJIT_ENABLE_REPRODUCIBLE_BUILDS that should disable string hash microarchitecture autodetection. If any other customisations are needed for reproducible builds, they should go under this flag.

Sorry about the delay, but I hope this works for you. I intended to add a PR first so that I can get your feedback, but out of habit I did a git push to master :/

@bmwiedemann
Copy link
Author

bmwiedemann commented May 25, 2020

Some extra documentation on how to enable it would be good. I think it is

make LUAJIT_ENABLE_REPRODUCIBLE_BUILDS=1 ...

The comment in src/Makefile is misleading

# Reproducible builds.  Enable this option if you need output to be the same
# across CPUs.
#XCFLAGS += -DLUAJIT_ENABLE_REPRODUCIBLE_BUILDS

because that would provide it as a define to the compiler where it is not used.

I just finished test-building bcc and neovim and both come out reproducible with this moonjit patch and make LUAJIT_ENABLE_REPRODUCIBLE_BUILDS=1 ...

siddhesh added a commit that referenced this issue May 25, 2020
@siddhesh
Copy link
Collaborator

Fixed comment, thanks!

@bmwiedemann
Copy link
Author

Again no PR? IMHO it should be

-#LUAJIT_ENABLE_REPRODUCIBLE_BUILDS=0
+#LUAJIT_ENABLE_REPRODUCIBLE_BUILDS=1

@siddhesh
Copy link
Collaborator

I just pushed it in since it was a comment fixup. Here's a PR to fix it up the way you ought to like it. Let me know if that looks good to you :)

bmwiedemann added a commit to bmwiedemann/moonjit that referenced this issue May 25, 2020
without every distribution packager reading Makefile comments
and enabling the right option.

The variable is documented at
https://reproducible-builds.org/specs/source-date-epoch/
and its existence is a strong indicator for a desire of
reproducible builds.

Related moonjit#110
bmwiedemann added a commit to bmwiedemann/moonjit that referenced this issue May 25, 2020
without every distribution packager reading Makefile comments
and enabling the right option.

The variable is documented at
https://reproducible-builds.org/specs/source-date-epoch/
and its existence is a strong indicator for a desire of
reproducible builds.

Related moonjit#110
Fixes moonjit#123
bmwiedemann added a commit to bmwiedemann/moonjit that referenced this issue May 26, 2020
without every distribution packager reading Makefile comments
and enabling the right option.

The variable is documented at
https://reproducible-builds.org/specs/source-date-epoch/
and its existence is a strong indicator for a desire of
reproducible builds.

Related moonjit#110
Fixes moonjit#123

Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
siddhesh pushed a commit that referenced this issue Jun 12, 2020
without every distribution packager reading Makefile comments
and enabling the right option.

The variable is documented at
https://reproducible-builds.org/specs/source-date-epoch/
and its existence is a strong indicator for a desire of
reproducible builds.

Related #110
Fixes #123

Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants