Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make iPXE builds reproducible #234

Closed
wants to merge 3 commits into from
Closed

Make iPXE builds reproducible #234

wants to merge 3 commits into from

Conversation

mmlb
Copy link
Contributor

@mmlb mmlb commented Jan 28, 2021

These 2 commits seem to be all that is needed to make iPXE builds reproducible. I'd love to have reproducible iPXE builds so that I could verify CI builds against local ones for https://github.com/tinkerbell/boots.

@mmlb
Copy link
Contributor Author

mmlb commented Jan 28, 2021

I just had the thought of maybe making it so that I could specify BUILD_ID_CMD externally instead of replacing it like I do here. Let me know if you'd like me to do that instead and I can change the PR.

@NiKiZe
Copy link
Contributor

NiKiZe commented Jan 28, 2021

There has been several that has proposed similar things, all ignoring the issue of that Id should be unique.

I think there is issues/prs on GitHub, but also on the mailing list that you might want to read.

Also read the comment about the id that you removed in this pr.

@mmlb
Copy link
Contributor Author

mmlb commented Jan 28, 2021

@NiKiZe I admit to not checking issues or mailing list and thats definitely something I'll do. I did read the comment and pondered it. I laid out my thoughts in the commit message where I think this should be fine.

That being said I'll check out the other attempts at this.

@mmlb
Copy link
Contributor Author

mmlb commented Jan 28, 2021

I just looked at #82, that implementation hashes the target (which doesn't exist yet?), mine hashes all the input files. This should be good enough imo. If all the inputs are the same and the build is reproducible then you'd get a bit for bit copy of the binary and that would be no different than what build_id is trying to protect against I think.

@mmlb
Copy link
Contributor Author

mmlb commented Jan 28, 2021

I guess if the requirement truly is that iPXE should not be reproducible I'd like to know a bit more of the why, which I'd use to update the BUILD_ID_CMD comment. My reading of the original commit is that NICs may offer up the same iPXE ROM binary multiple times and someone wants to be able to tell the difference between this offering up of the same binary vs another one?

The Makeifle.housekeeping comment says

Must be unique for each $(BIN)/%.tmp, even within the same build run

which means what exactly? make bin-x86_64-efi/ipxe.efi bin-x86_64-efi/ipxe.efi should build twice and each should have a different _build_id? Why would this matter if one is going to be immediage overwritten with the last built one?

@mmlb
Copy link
Contributor Author

mmlb commented Jan 28, 2021

Going through the mailing list I find:

https://lists.ipxe.org/pipermail/ipxe-devel/2020-May/007031.html

The build ID is supposed to be collision-free across all ROMs that might
ever end up installed in the same system. It doesn't just disambiguate
targets within a single build; it also disambiguates different builds.

https://lists.ipxe.org/pipermail/ipxe-devel/2020-May/007033.html

Different build configurations (e.g. config/*.h setting, or even
DEBUG=... options specified on the make command line) also count as
needing different build IDs. Basically: anything that could possibly
affect the runtime memory layout is relevant.

It should really be a hash of the object that results from the link.

I'll see if my approach ends up taking account of DEBUG= and other make invocation settings.

And https://lists.ipxe.org/pipermail/ipxe-devel/2020-May/007038.html

BUILD_TIMESTAMP has a weaker requirement than BUILD_ID: it gets used
only as a means to automatically select the newest version of iPXE if
multiple iPXE drivers are loaded concurrently in a UEFI system. It's
already rounded down to the nearest minute when used for that purpose.

For BUILD_TIMESTAMP (but not BUILD_ID), I am happy with using the commit
date as extracted from git.

Which I'll happily add to the commit message and likely also Makefile comment.

@mmlb
Copy link
Contributor Author

mmlb commented Jan 28, 2021

Ideally I'd just patch the resulting .tmp file, but I only know of nix's patchelf which likely doesn't work on non-elf files. I'll look into binutils.

@NiKiZe
Copy link
Contributor

NiKiZe commented Jan 28, 2021

I found this thread https://www.mail-archive.com/ipxe-devel@lists.ipxe.org/msg06874.html

So, embed, debug, config etc all need to be part of the id.
Not saying what you suggest is missing any of that.
But this has come up several times, 4-5 at least.

Could the BUILD_ID_CMD be kept, add additional explanations to the comment (not remove) to make it clear about the requirements, and reduce the risk of it being switched to something that could cause issues

@mmlb
Copy link
Contributor Author

mmlb commented Jan 28, 2021

Yep I think keeping BUILD_ID_CMD and expanding the comment is something I'll do.

@mcb30
Copy link
Member

mcb30 commented Jan 28, 2021

The Makeifle.housekeeping comment says

Must be unique for each $(BIN)/%.tmp, even within the same build run

which means what exactly? make bin-x86_64-efi/ipxe.efi bin-x86_64-efi/ipxe.efi should build twice and each should have a different _build_id? Why would this matter if one is going to be immediage overwritten with the last built one?

The requirement for uniqueness arises because we may be linking multiple different output binaries from the same set of compiled objects within a single build run. For example: make bin/8086100e.rom bin/10ec8139.rom. These must not have the same build ID, even though they are built from the same blib.a archive (containing all possible binary objects), because the set of linked-in objects (determined solely by TGT_LD_FLAGS) will differ.

As of commit 8290a9551 I think that including $(BIN)/%.version.o within the hash input (as you do) is probably sufficient, because TGT_LD_FLAGS is essentially constructed from the build name, which is included within $(BIN)/%.version.o as the build_name string.

@mcb30
Copy link
Member

mcb30 commented Jan 28, 2021

EMBED= affects the contents of $(BIN)/embedded.o and hence $(BLIB)

DEBUG= affects the set of objects that form $(BLIB)

Definitions in config/*.h would affect the relevant objects that form $(BLIB)

So I think this is the first version of this long-running suggestion that would actually work.

@@ -1187,7 +1186,7 @@ $(BIN)/version.%.o : core/version.c $(MAKEDEPS) $(GIT_INDEX)
$(BIN)/%.tmp : $(BIN)/version.%.o $(BLIB) $(MAKEDEPS) $(LDSCRIPT)
$(QM)$(ECHO) " [LD] $@"
$(Q)$(LD) $(LDFLAGS) -T $(LDSCRIPT) $(TGT_LD_FLAGS) $< $(BLIB) -o $@ \
--defsym _build_id=`$(BUILD_ID_CMD)` \
--defsym _build_id=$(shell sha1sum $^|sort|cksum|awk '{printf "0x%08x", $$1}') \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this to avoid exceeding 80 characters, and I'm happy to merge this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this too before resubmitting updates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is very similar to my #82, there is a difference: the sha1sum $^ will include the filenames in output - if those are absolute pathnames(are they?), it would mean that a binary built in /home/bernhard would be different from yours built from identical sources.
and for some reason I still get 4 random bytes in the binaries - maybe because openSUSE's .o files are from LTO and gcc's intermediate representations have randomness. I'll retest without LTO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is very similar to my #82, there is a difference: the sha1sum $^ will include the filenames in output - if those are absolute pathnames(are they?), it would mean that a binary built in /home/bernhard would be different from yours built from identical sources.

Good point. @mmlb could you please find a way to omit filenames from the resulting hash value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops didn't see this until now. Let me check on this. I don't think they're absolute filenames though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope these are all relative paths. I checked by running trying out:

diff --git a/src/Makefile.housekeeping b/src/Makefile.housekeeping
index 83665f83..2414fbcf 100644
--- a/src/Makefile.housekeeping
+++ b/src/Makefile.housekeeping
@@ -1201,6 +1201,7 @@ $(BIN)/version.%.o : core/version.c $(MAKEDEPS) $(GIT_INDEX)
 #
 $(BIN)/%.tmp : $(BIN)/version.%.o $(BLIB) $(MAKEDEPS) $(LDSCRIPT)
 	$(QM)$(ECHO) "  [LD] $@"
+	sha1sum $^| sort | cksum | awk '{printf "0x%08x", $$1}'
 	$(Q)$(LD) $(LDFLAGS) -T $(LDSCRIPT) $(TGT_LD_FLAGS) $< $(BLIB) -o $@ \
 		--defsym _build_id=$(shell $(BUILD_ID_CMD)) \
 		--defsym _build_timestamp=$(BUILD_TIMESTAMP) \
 

and running make bin-x86_64-efi/ipxe.efi (in src/) and from my home doing make -C /home/manny/cloned/ipxe/ipxe/src/ bin-x86_64-efi/ipxe.efi which gave me:

make: Entering directory '/home/manny/cloned/ipxe/ipxe/src'
make: /bin/echo: No such file or directory
make: /bin/echo: No such file or directory
  [VERSION] bin-x86_64-efi/version.ipxe.efi.o
  [LD] bin-x86_64-efi/ipxe.efi.tmp
sha1sum bin-x86_64-efi/version.ipxe.efi.o bin-x86_64-efi/blib.a Makefile Makefile.housekeeping .echocheck arch/x86_64/Makefile arch/x86/Makefile arch/x86_64/Makefile.efi arch/x86/Makefile.efi Makefile.efi bin-x86_64-efi/.fnrec.state scripts/efi.lds| sort | cksum | awk '{printf "0x%08x", $1}'
0x54378e64  [FINISH] bin-x86_64-efi/ipxe.efi
rm bin-x86_64-efi/version.ipxe.efi.o
make: Leaving directory '/home/manny/cloned/ipxe/ipxe/src'

@mmlb
Copy link
Contributor Author

mmlb commented Jan 28, 2021

EMBED= affects the contents of $(BIN)/embedded.o and hence $(BLIB)

DEBUG= affects the set of objects that form $(BLIB)

Definitions in config/*.h would affect the relevant objects that form $(BLIB)

Yep from my reading of the Makefile I think these configs all make it into the input hash.

@mmlb
Copy link
Contributor Author

mmlb commented Jan 28, 2021

The Makeifle.housekeeping comment says

Must be unique for each $(BIN)/%.tmp, even within the same build run

which means what exactly? make bin-x86_64-efi/ipxe.efi bin-x86_64-efi/ipxe.efi should build twice and each should have a different _build_id? Why would this matter if one is going to be immediage overwritten with the last built one?

The requirement for uniqueness arises because we may be linking multiple different output binaries from the same set of compiled objects within a single build run. For example: make bin/8086100e.rom bin/10ec8139.rom. These must not have the same build ID, even though they are built from the same blib.a archive (containing all possible binary objects), because the set of linked-in objects (determined solely by TGT_LD_FLAGS) will differ.

As of commit 8290a95 I think that including $(BIN)/%.version.o within the hash input (as you do) is probably sufficient, because TGT_LD_FLAGS is essentially constructed from the build name, which is included within $(BIN)/%.version.o as the build_name string.

I'll check the _build_id with make V=1 ..., which should be different and add confidence to uniqueness.

@mmlb
Copy link
Contributor Author

mmlb commented Jan 29, 2021

@mcb30 Are you ok with the SOURCE_DATE_EPOCH like this or should I also change the date call to first attempting to check with git? Something like

ifdef (SOURCE_DATE_EPOCH)
BUILD_TIMESTAMP=$(SOURCE_DATE_EPOC)
else
ifne ($(shell git ...),)
BUILD_TIMESTAMP=$(shell git...)
else
BUILD_TIMESTAMP=$(shell date %s)
endif
endif

or just not bother with checking with git?

@mcb30
Copy link
Member

mcb30 commented Jan 29, 2021

@mcb30 Are you ok with the SOURCE_DATE_EPOCH like this or should I also change the date call to first attempting to check with git? Something like

I'm fine with the SOURCE_DATE_EPOCH patch as-is.

If you want to allow for reproducible builds without having to define SOURCE_DATE_EPOCH then feel free to extract something sensible from git, whatever that might be. (https://reproducible-builds.org/docs/source-date-epoch/ seems to suggest git log -1 --pretty=%ct). The build system already defines GITVERSION iff building from a git tree, so you might want to use that as the test.

Thanks.

@bmwiedemann
Copy link
Contributor

I had good results with master...bmwiedemann:rb - only ipxe.iso left with timestamps

@mmlb
Copy link
Contributor Author

mmlb commented Jan 29, 2021

I had good results with master...bmwiedemann:rb - only ipxe.iso left with timestamps

Which iso tool do you end up using? I ran xorriso and get:

GNU xorriso 1.5.2 : RockRidge filesystem manipulator, libburnia project.

xorriso : NOTE : Environment variable SOURCE_DATE_EPOCH encountered with value 315532800

Welp I spoke too soon, there's still a diff between 2 builds. I'll try to track it down.

@mcb30
Copy link
Member

mcb30 commented Jan 29, 2021

I had good results with master...bmwiedemann:rb - only ipxe.iso left with timestamps

@bmwiedemann Thanks. I've merged f4fcd000a from that series, since that commit at least seems uncontroversial and finalised.

Which iso tool do you end up using? I ran xorriso and get:

@mmlb FYI: xorriso will be used only if nothing quieter can be found. (I couldn't find a way to get xorriso to not spit out a startup banner, even when -quiet is used.)

@mmlb mmlb requested a review from mcb30 January 29, 2021 16:35
Using random values the reproducibility of iPXE, which apart from
SOURCE_DATE_EPOCH is quite good afaics. Hashing the inputs should
accomplish the same goal that the random value was used for. If
any of the inputs change we'll get a different sha1 hash which
will give a different cksum and thus a different _build_id.

If two independant binaries happen to have the same input hashes
I'd argue they are in fact the same binary.

The way I do the hashing is not quite complete, the final linked
binary could in theory give different outputs for the same input.
I don't think thats very plausible and not necessary to worry about.
This actually can happen very commonly. If building for multiple ROMs
which are built from the same blib.a, the output ends up differing due
to different arguments passed to LD. These arguments are not directly
being factored into the sha1sum. The hash/build_id is still unique
due to including version.o in the sha1 input which contains the ROM
unique ROM name.
When using $(shell ), make will first invoke run the BUILD_ID_CMD
and then have the value defined when calling $(LD). This means we
get to see the _build_id when building with make V=1. Previously
the build_id was figured out as a subshell command run during the
recipe execution with out being able to see the build_id itself.
ifdef SOURCE_DATE_EPOCH
BUILD_TIMESTAMP := $(SOURCE_DATE_EPOCH)
else ifdef GITVERSION
BUILD_TIMESTAMP := $(git log -1 --pretty=%ct)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the broken use of $(git ...) instead of $(shell git ...) and pushed as 9d000c9fd

@mcb30
Copy link
Member

mcb30 commented Jan 30, 2021

Merged (with assorted fixes to avoid build breakage) as 9d000c9fd, bc4979e2c, and d8dc06fbf

Thanks!

@mcb30 mcb30 closed this Jan 30, 2021
@mmlb
Copy link
Contributor Author

mmlb commented Jan 31, 2021

Hmm I caught the missing shell invocation too maybe I forgot to push though. 👍

@mmlb mmlb deleted the reproducible-builds branch January 31, 2021 23:37
@mmlb
Copy link
Contributor Author

mmlb commented Feb 1, 2021

Hey @mcb30, I noticed some follow up commits dealing with FreeBSD differences, glad those were caught quickly!

mmlb added a commit to mmlb/tinkerbell-boots that referenced this pull request Feb 1, 2021
I was seeing reproducibility from 2 sources, recording the build
timestamp and injecting random numbers for BUILD_ID. I've fixed
both upstream in ipxe/ipxe#234

Signed-off-by: Manuel Mendez <mmendez@equinix.com>
@bmwiedemann
Copy link
Contributor

To get an reproducible ipxe.iso with xorriso I had to patch syslinux isohybrid MBR ID generation https://github.com/distropatches/syslinux/tree/isohybrid and

--- ipxe-1.20.1+git20210129.f4fcd000.orig/src/util/genfsimg
+++ ipxe-1.20.1+git20210129.f4fcd000/src/util/genfsimg
@@ -276,7 +276,7 @@ if [ -n "${ISOIMG}" ] ; then
        echo "${0}: cannot find a suitable mkisofs or equivalent" >&2
        exit 1
     fi
-    "${MKISOFS}" -quiet -volid "iPXE" -preparer "iPXE build system" \
+    "${MKISOFS}" --set_all_file_dates "2007110814511300" --modification-date=2007110814511300 -quiet -volid "iPXE" -preparer "iPXE build system" \
            -appid "iPXE - Open Source Network Boot Firmware" \

@mcb30
Copy link
Member

mcb30 commented Feb 8, 2021

@bmwiedemann The --set_all_file_dates and --modification-date options seem not to be supported by mkisofs or genisoimage, so you'd have to test for support before using them.

Also, the patch should probably use SOURCE_DATE_EPOCH to construct the date, and override the default behaviour only if SOURCE_DATE_EPOCH is specified.

Would you be able to submit a (separate) PR that addresses those issues?

Thanks.

@bmwiedemann
Copy link
Contributor

There you are: #252

mmlb added a commit to mmlb/tinkerbell-boots that referenced this pull request Jun 29, 2021
I was seeing reproducibility from 2 sources, recording the build
timestamp and injecting random numbers for BUILD_ID. I've fixed
both upstream in ipxe/ipxe#234

Signed-off-by: Manuel Mendez <mmendez@equinix.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants