Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gokr-rebuild-kernel fails with "Exec format error" on OSX 10.11.6 #88

Closed
BinaryPaean opened this issue May 3, 2018 · 12 comments
Closed

Comments

@BinaryPaean
Copy link

BinaryPaean commented May 3, 2018

Output on my machine:

 ~  gokr-rebuild-kernel
2018/05/03 13:44:20 building docker container for kernel compilation
Sending build context to Docker daemon  6.246MB
Step 1/9 : FROM debian:stretch
 ---> 8626492fecd3
Step 2/9 : RUN apt-get update && apt-get install -y crossbuild-essential-arm64 bc libssl-dev bison flex
 ---> Using cache
 ---> 74ed838c00a2
Step 3/9 : COPY gokr-build-kernel /usr/bin/gokr-build-kernel
 ---> Using cache
 ---> 0f49723b3ee6
Step 4/9 : COPY 0001-expose-UART0-ttyAMA0-on-GPIO-14-15-disable-UART1-tty.patch /usr/src/0001-expose-UART0-ttyAMA0-on-GPIO-14-15-disable-UART1-tty.patch
 ---> Using cache
 ---> 597e1d8d4ff7
Step 5/9 : COPY 0001-Revert-add-index-to-the-ethernet-alias.patch /usr/src/0001-Revert-add-index-to-the-ethernet-alias.patch
 ---> Using cache
 ---> 00e9560c90d8
Step 6/9 : RUN echo 'builduser:x:501:20:nobody:/:/bin/sh' >> /etc/passwd &&     chown -R 501:20 /usr/src
 ---> Using cache
 ---> 8a0235e1d81c
Step 7/9 : USER builduser
 ---> Using cache
 ---> 0b81d685fb70
Step 8/9 : WORKDIR /usr/src
 ---> Using cache
 ---> 9857ba1b6bf8
Step 9/9 : ENTRYPOINT /usr/bin/gokr-build-kernel
 ---> Using cache
 ---> 05be206b1152
Successfully built 05be206b1152
Successfully tagged gokr-rebuild-kernel:latest
2018/05/03 13:44:21 compiling kernel
/bin/sh: 1: /usr/bin/gokr-build-kernel: Exec format error
2018/05/03 13:44:23 docker build: exit status 2

From the output it looks like somehow the "make" commands are blowing up.

Using:

  • go 1.10.1
  • docker 17.05.0-ce_1
  • docker-machine 0.13.0
  • "default" docker image from docker-machine.

I have made the following modifications (which is why I was trying to manually rebuild the kernel in the first place:

diff --git a/cmd/gokr-build-kernel/build.go b/cmd/gokr-build-kernel/build.go
index 7453821..01750bb 100644
--- a/cmd/gokr-build-kernel/build.go
+++ b/cmd/gokr-build-kernel/build.go
@@ -216,6 +216,21 @@ CONFIG_GPIO_PCA953X=y
 CONFIG_GPIO_PCA953X_IRQ=y
 CONFIG_GPIO_MAX77620=y
 
+##
+## file: drivers/w1/Kconfig
+##
+CONFIG_W1=y
+
+##
+## file: drivers/w1/masters/Kconfig
+##
+CONFIG_W1_MASTER_GPIO=y
+
+##
+## file: drivers/w1/slaves/Kconfig
+##
+CONFIG_W1_SLAVE_THERM=y
+
 ##
 ## file: drivers/gpu/drm/Kconfig
 ##
diff --git a/config.txt b/config.txt
index 3c23727..b7299c2 100644
--- a/config.txt
+++ b/config.txt
@@ -5,3 +5,4 @@ enable_uart=0
 
 device_tree=rpi-3-b.dtb
 kernel=vmlinuz
+dtoverlay=w1-gpio,gpiopin=4
@stapelberg
Copy link
Contributor

I guess this is because you compiled gokr-build-kernel for darwin instead of linux.

Could you try GOOS=linux go install github.com/gokrazy/kernel/cmd/gokr-build-kernel and confirm whether gokr-rebuild-kernel works afterwards please?

@BinaryPaean
Copy link
Author

That was it; thanks!

I didn't realize that the gokr-build-kernel binary was injected into the docker image. Perhaps it makes sense to have gokr-rebuild-kernel trigger the on-demand build of gokr-build-kernel (yo, dawg I heard you like builds...) with the GOOS set for the docker target?

@BinaryPaean
Copy link
Author

BinaryPaean commented May 5, 2018

Although unfortunately this leads to a new error. After compiling the kernel, I get:

... many lines of kernel build omitted...
 AR      built-in.o
  LD      vmlinux.o
  MODPOST vmlinux.o
  KSYM    .tmp_kallsyms1.o
  KSYM    .tmp_kallsyms2.o
  LD      vmlinux
  SORTEX  vmlinux
  SYSMAP  System.map
  OBJCOPY arch/arm64/boot/Image
  GZIP    arch/arm64/boot/Image.gz
2018/05/04 23:59:02 open /tmp/buildresult/vmlinuz: permission denied
2018/05/04 16:59:30 docker build: exit status 1

I'm not sure if that is blowing up "in" docker or "out" of docker, but seems bizarre either way.

stapelberg added a commit that referenced this issue May 5, 2018
gokr-build-kernel is used in a linux docker container, even when rebuilding the
kernel from another operating system (e.g. darwin).

related to #88
@stapelberg
Copy link
Contributor

gokr-rebuild-kernel creates a temporary directory and a Dockerfile which runs as a user with the same uid/gid inside the Docker container that you’re running the tool with outside the Docker container. The temporary directory is mounted as a volume for /tmp/buildresult.

The assumption is that writing files under the same uid/gid should work across the Docker boundary, but perhaps that isn’t true on macOS?

As an immediate workaround, you could delete the "--rm" argument in kernel.go’s Docker call and manually copy arch/arm64/boot/Image and arch/arm64/boot/dts/broadcom/bcm2837-rpi-3-b.dtb.

@BinaryPaean
Copy link
Author

OK, I think I can workaround by sharing UID and GUID into the docker image with the command, based on this. I hadn't realized that docker volumes were totally transparent with file ownership bits, but now that I've thought about it for a hot minute it makes some sense.
Thanks for the help. I'll try to end up with the OSX fixes bundled into a pull request when I'm done.

@stapelberg
Copy link
Contributor

Unfortunately, the link seems dead, and neither Google Cache nor the wayback machine has it. Maybe you could summarize what you gathered from that link?

Note that I already committed 8da8b7a, which should solve the GOOS=linux part of this issue.

@BinaryPaean
Copy link
Author

BinaryPaean commented May 14, 2018

8da8b7a looks like the fix I ended up with, except to avoid leaving a wrong-architecture binary in the hosts' gopath I was dynamically building it each run of gokr-rebuild-kernel and writing the output directly. Your solution is shorter, simpler, and avoids frequent re-compiles, though.

It looks like the guy just revamped the blog and broke all the permalinks.

The new write-up is here. Unfortunately the original fix I tried (the one proposed in the top comment) didn't work. The basic idea was to additionally mount the hosts /etc/passwd and /etc/group into the docker container as read-only volumes. I'm not fully sure WHY this didn't work, but poking around in the docker image showed the volumes weren't shadowing the default files.

Instead I ended up modifying the dockerfile generation some to reuse host UID and GID even with a GID clash. This is mixed in with some tidying as I was going through the code for my own understanding, but the result was a separate "dockerfile.go" :

package main

import (
	"path/filepath"
	"strings"
	"text/template"
)

const dockerFileContents = `
FROM debian:stretch

ENV KERNEL_SOURCE_PATH /usr/src/{{ .KernelPath}}
RUN apt-get update && apt-get install -y crossbuild-essential-arm64 bc libssl-dev bison flex curl

COPY gokr-build-kernel /usr/bin/gokr-build-kernel
{{- range $idx, $path := .Patches }}
COPY {{ $path }} /usr/src/{{ $path }}
{{- end }}
RUN groupadd -r -o -g {{.GID}} builduser && \
		useradd -r -m -u {{.UID}} -g {{.GID}} builduser && \
		chown -R {{.UID}}:{{.GID}} /usr/src /tmp

USER builduser
RUN curl --output /tmp/{{.KernelArchive}} {{.KernelURL}}
WORKDIR /usr/src
RUN tar -xvf /tmp/{{.KernelArchive}}
ENTRYPOINT /usr/bin/gokr-build-kernel
`

// On OSX volumes require root to access, despite the host UID:GID
// the output directory thus gets chown'd before use.

type dockerArgs struct {
	UID           string
	GID           string
	Patches       []string
	KernelURL     string
	KernelArchive string
	KernelPath    string
}

func newDockerArgs(uid string, gid string, patches []string, kernelURL string) *dockerArgs {
	d := dockerArgs{UID: uid, GID: gid, Patches: patches, KernelURL: kernelURL}
	d.KernelArchive = filepath.Base(d.KernelURL)
	d.KernelPath = string(filepath.Separator) + strings.TrimSuffix(d.KernelArchive, ".tar.xz")
	return &d
}

var dockerFileTmpl = template.Must(template.New("dockerfile").Parse(dockerFileContents))

Sorry for the delay and lack of PR yet; but after fixing up the OSX parts I still ran into an even thornier problem with writing the image to SD card on OSX. OSX does not support setcap, and even when running as root with the volume unmounted the gokr-packer command was failing due to some kind of "volume in use" error.

This is part of a work project, so for the sake of schedule I have fallen back to a standard Pi userland to get things working, but I still plan to circle back around to cleaning up my commits and serving them up in a branch you can merge, cherry pick, or avoid.

@stapelberg
Copy link
Contributor

Thanks for following up. Do I understand correctly that the minimal fix is to extend the chown command to not only cover /usr/src, but also /tmp?

@BinaryPaean
Copy link
Author

BinaryPaean commented May 14, 2018

Also to change from echo'ing a string into passwd to using useradd and groupadd with specific flags. The latter in particular allows for a GID conflict that will still "succeed" with permissions.

@stapelberg
Copy link
Contributor

While useradd/groupadd seems cleaner, I’m having some trouble seeing why it would be necessary. We need the uid to be present in passwd so that “USER builduser” works, but otherwise they are used only numerically, I thought.

Anyway, I’ll need to reproduce this.

Regarding the “volume in use”, have a look at gokrazy/gokrazy#22, and specifically the referenced gokrazy/gokrazy#14 (comment)

@BinaryPaean
Copy link
Author

I didn't investigate the error thoroughly; my "test cycle" was about 2 hours while the kernel rebuilt to find out what part would fail (Using a 2011 Macbook Air). Part of moving to useradd/groupadd was to get more useful error messages if that was the faulty step. Similary when I moved the kernel fetch step into the docker image with curl it was to clean that code out of gokr-rebuild-kernel but to also cache the docker layer post-download to speed up testing.

I discovered that on my machine the host GID conflicted with one in the docker instance. This may or may not have been "the problem", but those flags for groupadd allow for the conflicting GID, while I'm not totally sure of the outcome of just writing to the passwd file.

@stapelberg
Copy link
Contributor

Finally had a chance to look into this. Commit c16e76a was required to make things work with Docker 18.06.0-ce-mac70 (26399) on macOS 10.11.6. Note that I didn’t have any trouble regarding user/group ids.

Also note that it might be way quicker to spin up a VM in your cloud of choice and build there. For some reason, building the kernel takes way longer on macOS than on Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants