New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make iPXE builds reproducible #234
Conversation
I just had the thought of maybe making it so that I could specify BUILD_ID_CMD externally instead of replacing it like I do here. Let me know if you'd like me to do that instead and I can change the PR. |
There has been several that has proposed similar things, all ignoring the issue of that Id should be unique. I think there is issues/prs on GitHub, but also on the mailing list that you might want to read. Also read the comment about the id that you removed in this pr. |
@NiKiZe I admit to not checking issues or mailing list and thats definitely something I'll do. I did read the comment and pondered it. I laid out my thoughts in the commit message where I think this should be fine. That being said I'll check out the other attempts at this. |
I just looked at #82, that implementation hashes the target (which doesn't exist yet?), mine hashes all the input files. This should be good enough imo. If all the inputs are the same and the build is reproducible then you'd get a bit for bit copy of the binary and that would be no different than what build_id is trying to protect against I think. |
I guess if the requirement truly is that iPXE should not be reproducible I'd like to know a bit more of the why, which I'd use to update the BUILD_ID_CMD comment. My reading of the original commit is that NICs may offer up the same iPXE ROM binary multiple times and someone wants to be able to tell the difference between this offering up of the same binary vs another one? The Makeifle.housekeeping comment says
which means what exactly? |
Going through the mailing list I find: https://lists.ipxe.org/pipermail/ipxe-devel/2020-May/007031.html
https://lists.ipxe.org/pipermail/ipxe-devel/2020-May/007033.html
I'll see if my approach ends up taking account of DEBUG= and other make invocation settings. And https://lists.ipxe.org/pipermail/ipxe-devel/2020-May/007038.html
Which I'll happily add to the commit message and likely also Makefile comment. |
Ideally I'd just patch the resulting .tmp file, but I only know of nix's patchelf which likely doesn't work on non-elf files. I'll look into binutils. |
I found this thread https://www.mail-archive.com/ipxe-devel@lists.ipxe.org/msg06874.html So, embed, debug, config etc all need to be part of the id. Could the BUILD_ID_CMD be kept, add additional explanations to the comment (not remove) to make it clear about the requirements, and reduce the risk of it being switched to something that could cause issues |
Yep I think keeping BUILD_ID_CMD and expanding the comment is something I'll do. |
The requirement for uniqueness arises because we may be linking multiple different output binaries from the same set of compiled objects within a single build run. For example: As of commit 8290a9551 I think that including |
Definitions in So I think this is the first version of this long-running suggestion that would actually work. |
src/Makefile.housekeeping
Outdated
@@ -1187,7 +1186,7 @@ $(BIN)/version.%.o : core/version.c $(MAKEDEPS) $(GIT_INDEX) | |||
$(BIN)/%.tmp : $(BIN)/version.%.o $(BLIB) $(MAKEDEPS) $(LDSCRIPT) | |||
$(QM)$(ECHO) " [LD] $@" | |||
$(Q)$(LD) $(LDFLAGS) -T $(LDSCRIPT) $(TGT_LD_FLAGS) $< $(BLIB) -o $@ \ | |||
--defsym _build_id=`$(BUILD_ID_CMD)` \ | |||
--defsym _build_id=$(shell sha1sum $^|sort|cksum|awk '{printf "0x%08x", $$1}') \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix this to avoid exceeding 80 characters, and I'm happy to merge this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix this too before resubmitting updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is very similar to my #82, there is a difference: the sha1sum $^
will include the filenames in output - if those are absolute pathnames(are they?), it would mean that a binary built in /home/bernhard
would be different from yours built from identical sources.
and for some reason I still get 4 random bytes in the binaries - maybe because openSUSE's .o
files are from LTO and gcc's intermediate representations have randomness. I'll retest without LTO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is very similar to my #82, there is a difference: the
sha1sum $^
will include the filenames in output - if those are absolute pathnames(are they?), it would mean that a binary built in/home/bernhard
would be different from yours built from identical sources.
Good point. @mmlb could you please find a way to omit filenames from the resulting hash value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooops didn't see this until now. Let me check on this. I don't think they're absolute filenames though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope these are all relative paths. I checked by running trying out:
diff --git a/src/Makefile.housekeeping b/src/Makefile.housekeeping
index 83665f83..2414fbcf 100644
--- a/src/Makefile.housekeeping
+++ b/src/Makefile.housekeeping
@@ -1201,6 +1201,7 @@ $(BIN)/version.%.o : core/version.c $(MAKEDEPS) $(GIT_INDEX)
#
$(BIN)/%.tmp : $(BIN)/version.%.o $(BLIB) $(MAKEDEPS) $(LDSCRIPT)
$(QM)$(ECHO) " [LD] $@"
+ sha1sum $^| sort | cksum | awk '{printf "0x%08x", $$1}'
$(Q)$(LD) $(LDFLAGS) -T $(LDSCRIPT) $(TGT_LD_FLAGS) $< $(BLIB) -o $@ \
--defsym _build_id=$(shell $(BUILD_ID_CMD)) \
--defsym _build_timestamp=$(BUILD_TIMESTAMP) \
and running make bin-x86_64-efi/ipxe.efi
(in src/) and from my home doing make -C /home/manny/cloned/ipxe/ipxe/src/ bin-x86_64-efi/ipxe.efi
which gave me:
make: Entering directory '/home/manny/cloned/ipxe/ipxe/src'
make: /bin/echo: No such file or directory
make: /bin/echo: No such file or directory
[VERSION] bin-x86_64-efi/version.ipxe.efi.o
[LD] bin-x86_64-efi/ipxe.efi.tmp
sha1sum bin-x86_64-efi/version.ipxe.efi.o bin-x86_64-efi/blib.a Makefile Makefile.housekeeping .echocheck arch/x86_64/Makefile arch/x86/Makefile arch/x86_64/Makefile.efi arch/x86/Makefile.efi Makefile.efi bin-x86_64-efi/.fnrec.state scripts/efi.lds| sort | cksum | awk '{printf "0x%08x", $1}'
0x54378e64 [FINISH] bin-x86_64-efi/ipxe.efi
rm bin-x86_64-efi/version.ipxe.efi.o
make: Leaving directory '/home/manny/cloned/ipxe/ipxe/src'
Yep from my reading of the Makefile I think these configs all make it into the input hash. |
I'll check the _build_id with |
@mcb30 Are you ok with the SOURCE_DATE_EPOCH like this or should I also change the ifdef (SOURCE_DATE_EPOCH)
BUILD_TIMESTAMP=$(SOURCE_DATE_EPOC)
else
ifne ($(shell git ...),)
BUILD_TIMESTAMP=$(shell git...)
else
BUILD_TIMESTAMP=$(shell date %s)
endif
endif or just not bother with checking with git? |
I'm fine with the If you want to allow for reproducible builds without having to define Thanks. |
I had good results with master...bmwiedemann:rb - only ipxe.iso left with timestamps |
Which iso tool do you end up using? I ran xorriso and get:
Welp I spoke too soon, there's still a diff between 2 builds. I'll try to track it down. |
@bmwiedemann Thanks. I've merged f4fcd000a from that series, since that commit at least seems uncontroversial and finalised.
@mmlb FYI: |
Using random values the reproducibility of iPXE, which apart from SOURCE_DATE_EPOCH is quite good afaics. Hashing the inputs should accomplish the same goal that the random value was used for. If any of the inputs change we'll get a different sha1 hash which will give a different cksum and thus a different _build_id. If two independant binaries happen to have the same input hashes I'd argue they are in fact the same binary. The way I do the hashing is not quite complete, the final linked binary could in theory give different outputs for the same input. I don't think thats very plausible and not necessary to worry about. This actually can happen very commonly. If building for multiple ROMs which are built from the same blib.a, the output ends up differing due to different arguments passed to LD. These arguments are not directly being factored into the sha1sum. The hash/build_id is still unique due to including version.o in the sha1 input which contains the ROM unique ROM name.
When using $(shell ), make will first invoke run the BUILD_ID_CMD and then have the value defined when calling $(LD). This means we get to see the _build_id when building with make V=1. Previously the build_id was figured out as a subshell command run during the recipe execution with out being able to see the build_id itself.
src/Makefile.housekeeping
Outdated
ifdef SOURCE_DATE_EPOCH | ||
BUILD_TIMESTAMP := $(SOURCE_DATE_EPOCH) | ||
else ifdef GITVERSION | ||
BUILD_TIMESTAMP := $(git log -1 --pretty=%ct) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the broken use of $(git ...)
instead of $(shell git ...)
and pushed as 9d000c9fd
Hmm I caught the missing shell invocation too maybe I forgot to push though. 👍 |
Hey @mcb30, I noticed some follow up commits dealing with FreeBSD differences, glad those were caught quickly! |
I was seeing reproducibility from 2 sources, recording the build timestamp and injecting random numbers for BUILD_ID. I've fixed both upstream in ipxe/ipxe#234 Signed-off-by: Manuel Mendez <mmendez@equinix.com>
To get an reproducible --- ipxe-1.20.1+git20210129.f4fcd000.orig/src/util/genfsimg
+++ ipxe-1.20.1+git20210129.f4fcd000/src/util/genfsimg
@@ -276,7 +276,7 @@ if [ -n "${ISOIMG}" ] ; then
echo "${0}: cannot find a suitable mkisofs or equivalent" >&2
exit 1
fi
- "${MKISOFS}" -quiet -volid "iPXE" -preparer "iPXE build system" \
+ "${MKISOFS}" --set_all_file_dates "2007110814511300" --modification-date=2007110814511300 -quiet -volid "iPXE" -preparer "iPXE build system" \
-appid "iPXE - Open Source Network Boot Firmware" \ |
@bmwiedemann The Also, the patch should probably use Would you be able to submit a (separate) PR that addresses those issues? Thanks. |
There you are: #252 |
I was seeing reproducibility from 2 sources, recording the build timestamp and injecting random numbers for BUILD_ID. I've fixed both upstream in ipxe/ipxe#234 Signed-off-by: Manuel Mendez <mmendez@equinix.com>
These 2 commits seem to be all that is needed to make iPXE builds reproducible. I'd love to have reproducible iPXE builds so that I could verify CI builds against local ones for https://github.com/tinkerbell/boots.