Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build not reproducible #2102

Closed
utkarsh2102 opened this issue May 14, 2020 · 13 comments
Closed

Build not reproducible #2102

utkarsh2102 opened this issue May 14, 2020 · 13 comments

Comments

@utkarsh2102
Copy link
Contributor

Whilst working on the Reproducible Builds front, I came across that polybar doesn't build reproducibly.

Here are the CI logs: https://salsa.debian.org/debian/polybar/-/jobs/664513

It'd be nice if we can have a fix for this? :)

@patrick96
Copy link
Member

I can't really tell from that log what exactly is different in between builds.
I have just now done three clean builds and the md5sum of all files that are part of the install are identical.

The only files that differ are the cmake log files, doc/doctrees/environment.pickle, and the xpp generated proto header files. For the xpp files, the only difference is the order of components inside the code and the resulting polybar executable is not impacted by that.

I don't really have the time to do a lot of work here, but patches are welcome and if you can point to what exactly is different on the salsa build server it would certainly make tracking down reproducibilty issues a lot easier.

@samueloph
Copy link
Contributor

Let me try to chime in with a little bit more information, I'm trying to debug the issue but there might be more than one thing involved.

I'm attaching the output of diffoscope from our CI at Salsa, the job builds polybar twice, with different host configurations (timezone, directories location, locale...) and runs diffoscope against the resulting .debs
polybar_diffoscope.log
Build logs of both runs can be seem here:
https://salsa.debian.org/debian/polybar/-/jobs/755275

I'm attaching this log here in case someone wants to investigate, but beware of the following:

  • Polybar's build can't be reproducible if the target "userconfig" gets run (we disable that on Debian an ship an example config file instead).
  • Debian's build of polybar uses specific flags (especially hardening ones) and doesn't calls the userconfig target, the different build flags are usuallynot related to the issue, but it should be noted by anyone debugging this.
  • Another service that we have on Debian that performs reproducibility checks (https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/polybar.html) is currently not able to build the package in a timely manner and times out. This leads me to believe that something else could be wrong as these machines are prepared for heavy builds.

I'll ask for help to investigate why the builds are failing on our side and will keep you posted on updates if I have any.

@patrick96
Copy link
Member

From the diffoscope output, it looks like some things live at a different address, otherwise the assembly code seems exactly the same.

In the .rodata segment there is something interesting though:

   0x0025c270 436f6d70 696c6572 20666c61 67733a20 Compiler flags: 
   0x0025c280 2d67202d 4f32202d 66646562 75672d70 -g -O2 -fdebug-p
   0x0025c290 72656669 782d6d61 703d2f74 6d702f72 refix-map=/tmp/r
   0x0025c2a0 6570726f 74657374 2e576656 62617a2f eprotest.WfVbaz/
-  0x0025c2b0 636f6e73 745f6275 696c645f 70617468 const_build_path
-  0x0025c2c0 2f636f6e 73745f62 75696c64 5f706174 /const_build_pat
-  0x0025c2d0 683d2e20 2d667374 61636b2d 70726f74 h=. -fstack-prot
-  0x0025c2e0 6563746f 722d7374 726f6e67 202d5766 ector-strong -Wf
-  0x0025c2f0 6f726d61 74202d57 6572726f 723d666f ormat -Werror=fo
-  0x0025c300 726d6174 2d736563 75726974 79202d57 rmat-security -W
-  0x0025c310 64617465 2d74696d 65202d44 5f464f52 date-time -D_FOR
-  0x0025c320 54494659 5f534f55 5243453d 32202d57 TIFY_SOURCE=2 -W
-  0x0025c330 616c6c20 2d576578 74726120 2d577065 all -Wextra -Wpe
-  0x0025c340 64616e74 6963200a 00000000 00000000 dantic .........
+  0x0025c2b0 6275696c 642d6578 70657269 6d656e74 build-experiment
+  0x0025c2c0 2d312f62 75696c64 2d657870 6572696d -1/build-experim
+  0x0025c2d0 656e742d 313d2e20 2d667374 61636b2d ent-1=. -fstack-
+  0x0025c2e0 70726f74 6563746f 722d7374 726f6e67 protector-strong
+  0x0025c2f0 202d5766 6f726d61 74202d57 6572726f  -Wformat -Werro
+  0x0025c300 723d666f 726d6174 2d736563 75726974 r=format-securit
+  0x0025c310 79202d57 64617465 2d74696d 65202d44 y -Wdate-time -D
+  0x0025c320 5f464f52 54494659 5f534f55 5243453d _FORTIFY_SOURCE=
+  0x0025c330 32202d57 616c6c20 2d576578 74726120 2 -Wall -Wextra 
+  0x0025c340 2d577065 64616e74 6963200a 00000000 -Wpedantic .....

This seems to suggest polybar is built with different compiler flags, once with:

Compiler flags: -g -O2 -fdebug-prefix-map=/tmp/reprotest.WfVbaz/const_build_path/const_build_path=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wpedantic

and once with

Compiler flags: -g -O2 -fdebug-prefix-map=/tmp/reprotest.WfVbaz/build-experiment-1/build-experiment-1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wpedantic

In particular the -fdebug-prefix-map flag is different. This also explains why most of the .rodata segment is different and is the reason why some of the referenced addresses in the .text segment are different (because they point into the .rodata segment).

Why are different build-flags used when testing for reproducability?

@samueloph
Copy link
Contributor

samueloph commented Jun 7, 2020

@patrick96 Sorry, I didn't mention that one of the issues, this one with fdebug, can be addressed by using -ffile-prefix-map=OLD=NEW[0] which I've tried but then the diff increased by 1MB (which is very weird). I created a new thread on our reproducible build's list to discuss the FTBFS (I might create a new one to discuss the -fdebug issue): https://alioth-lists.debian.net/pipermail/reproducible-builds/Week-of-Mon-20200601/012377.html

I'm gonna attach here the diff from using -ffile-prefix-map=OLD=NEW, the way I'm following with the debugging is that I want to first clear out the building issue, then understand why using -ffile-prefix-map=OLD=NEW increases the diff, and finally sort out any other reproducibility issues.

polybar_fdebug_diffoscope.log
build logs:
https://salsa.debian.org/debian/polybar/-/jobs/757418

[0] https://reproducible-builds.org/docs/build-path/

@patrick96
Copy link
Member

In the log you attached, the compiler flags are still different:

/usr/bin/c++ -g -O2 -ffile-prefix-map=/tmp/reprotest.eSkW5c/const_build_path/const_build_path=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wpedantic

vs

/usr/bin/c++ -g -O2 -ffile-prefix-map=/tmp/reprotest.eSkW5c/build-experiment-1/build-experiment-1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wpedantic

This time it's just the -ffile-prefix-map argument that is different, so I don't really know what that achieved.

The difference in diff size could be that some of the data that is used most-often has changed position, it's still the addresses that the assembly references that produces most of the diff.

@samueloph
Copy link
Contributor

samueloph commented Jun 8, 2020

@patrick96 So the main reason we use -fdebug-prefix-map, and alternatively -ffile-prefix-map, is so we are able to generate debug symbol packages as well, those are available under a different repository and contain debug symbols for Polybar that people can use to run with gdb and generate meaningful stacktraces.

You can read a little bit more about it here:
https://wiki.debian.org/HowToGetABacktrace
https://wiki.debian.org/AutomaticDebugPackages#Implementation

What -ffile-prefix-map does is allow one to make use of the feature while not hardcoding the path into the binaries, you can see that the format of the flag follows the idea of passing the "OLD" and "NEW" value, and in such case the OLD value differs by the absolute location of the build files while the NEW value is set to .

Considering the diff that we have, I believe it's likely that the reproducibility issue is coming from cmake or gcc, I'm now verifying if the issue is coming from CMAKE's RPATH, as described at https://reproducible-builds.org/docs/deterministic-build-systems/#cmake-notes

So the summary of the current situation is that the issue might be on the build tools rather than Polybar's source, which is a not an unusual thing on reproducible builds.

I will keep you updated on any progress made.

@patrick96
Copy link
Member

patrick96 commented Jun 8, 2020

Thanks for your efforts :)

Why do you pass different directories to those flags between the two builds? I thought a reproducible build only needs to reproducible in the same build directory.

@samueloph
Copy link
Contributor

samueloph commented Jun 8, 2020

@patrick96 We aim to have reproducible builds even with varying build directories, so our tooling is setup to build against different locations. In those build logs you are seeing the logs of both builds, so one way of easily identifying where the second build started is by looking at the other build directory.

Thank you for your help as well!

@patrick96
Copy link
Member

I just remembered that you may not actually know this, but polybar saves the compiler flags in the executable so that it can print it as part of polybar -vvv. This is done here.

And a lot of the differences between the binaries are because there are different compiler flags. So it is impossible to reproducibly build polybar in different build directories if you pass the build directory as part of some compiler flag.

@samueloph
Copy link
Contributor

samueloph commented Jun 9, 2020

@patrick96 that explains why I was still seeing the build path on the diffs :)

We can patch that function on Debian to redact the build path to fix it.

Alright so the only non-identified diff we have is the assembly instructions location one.

EDIT- I reread your comment and realized now that the diff on rodata (build path) might be causing the assembly location diff, of course, I'm checking that now

@samueloph
Copy link
Contributor

Confirmed that by removing the flags from -vvv the only diff we get is with NT_GNU_BUILD_ID which is a known issue: https://tests.reproducible-builds.org/debian/issues/unstable/build_id_differences_only_issue.html

Since it doesn't looks like there's a "direct" reproducibility issue on polybar's source[0], I believe this issue could be closed, but it's up to you.

FWIW I just uploaded Polybar 3.4.3-2 with this fixes to Debian unstable, the only diff we will get is the NT_GNU_BUILD_ID one.

What are your thoughts on this @utkarsh2102?

[0] The two issues we found were the generation of config file by the target userconfig, which we disabled on Debian, and the embedding of the build flags for the -vvv parameters, which causes a diff with the flags we use on Debian, which we also fixed with a patch.

@patrick96
Copy link
Member

How is it possible that only the build id differs? I though the build id is just a hash over the rest of the file.

I'm closing this for now. Feel free to ping me, if you identify reproducibility issues that polybar is responsible for.

@samueloph
Copy link
Contributor

@patrick96 I also thought that was weird, then I found out that this happens because the other parts which were causing the build id to differ were stripped out of this binary.

We first build polybar with debugging information (-g, and that's where -ffile-prefix-map comes in) to generate the debugsymbols package, then we strip them out to get the regular package (but the build id stays the same), in this case I can spot the actual diff on the dbgsym package that we have, though it doesn't tell me much and it will require further investigation.
https://salsa.debian.org/snippets/446 (this is a diff with a patch applied to address the RPATH issue )

The build path might still be being embedded to the binary, I'm investigating that. So far there's nothing showing there's an issue on polybar's side. I appreciate the help you gave me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants