Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize heredocs 🎉 #2589

Merged
merged 2 commits into from
Feb 10, 2022
Merged

Stabilize heredocs 🎉 #2589

merged 2 commits into from
Feb 10, 2022

Conversation

jedevc
Copy link
Member

@jedevc jedevc commented Jan 31, 2022

Fixes #2574.

The only relevant commits here are 351c15c and 07a6011 (the rest are the commits from #2442, since I didn't particularly enjoy the idea of trying to resolve merge conflicts after new tests got added there).

Essentially, we just remove the dfheredoc tag, and combine all the files in together now that we don't need the separation. Additionally, I've also updated the docs for what I assume will probably be a dockerfile 1.4.0 release? Not sure if that's needed, so I'm happy if that patch gets dropped.

Let me know if there's anything else needed.

@thaJeztah
Copy link
Member

Thanks!

Looks like this needs a rebase 😅

I guess we want to have #2442 merged as well before "stable", correct? (@tonistiigi ?)

@jedevc
Copy link
Member Author

jedevc commented Jan 31, 2022

Woops, thanks - I did rebase, but only onto origin/master which wasn't pointing to the same as upstream/master 🤦

Definitely would argue for #2442 first (though ultimately up to you all)

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other PR is merged now so this can be rebased.

You don't necessarily need to merge everything into the same files. Especially the tests where the files are already huge and hard to manage. Where additional complexity was needed before just to have this functionality under build tags merging is ok.

@tonistiigi tonistiigi added this to the v0.10.0 milestone Feb 1, 2022
@jedevc
Copy link
Member Author

jedevc commented Feb 1, 2022

Have kept the separate test files, but merged in the parts that enabled the parsing of heredocs.

@tonistiigi
Copy link
Member

CI error unrelated but somewhat interesting https://gist.github.com/tonistiigi/6edbcfbe9dd09488cf956b2df704a792 . @AkihiroSuda @crazy-max Possibly related to the containerd update?

@crazy-max
Copy link
Member

CI error unrelated but somewhat interesting https://gist.github.com/tonistiigi/6edbcfbe9dd09488cf956b2df704a792 . @AkihiroSuda @crazy-max Possibly related to the containerd update?

For the stack I'm not sure it's linked to containerd as it happens for the containerd-1.4 worker. It also happened in the previous build but this time in TestDiffSingleDeleteLayerOnMerge test: https://github.com/moby/buildkit/runs/5028528981?check_suite_focus=true#step:6:927 but was not able to repro yet.

@tonistiigi
Copy link
Member

Hmm. It failed again the same way on this PR and I've never seen it anywhere else. I don't see how it could be related though.

@tonistiigi
Copy link
Member

Looks like the panic might be somehow related to containerd/go-cni#76 . It is indeed a new codepath 39f6b4e#diff-2af9be1e48878c8f36db87230b5f840ffcc2f2f7cf5c753ecc081b8b4c9e2eb9 that we just vendored. Any ideas @mikebrow ?

@crazy-max
Copy link
Member

We encountered the same one while merging #2606 on master branch.

image

=== CONT  TestIntegration/TestMultiStageCaseInsensitive/worker=oci/frontend=builtin
2022/02/07 21:10:34 stderr: /usr/bin/buildkitd
2022/02/07 21:10:34 > startCmd 2022-02-07 21:10:19.078179364 +0000 UTC m=+167.675667420 [buildkitd --oci-worker=true --containerd-worker=false --oci-worker-gc=false --oci-worker-labels=org.mobyproject.buildkit.worker.sandbox=true --config=/tmp/bktest_config1688217650/buildkitd.toml --root /tmp/bktest_buildkitd234413752 --addr unix:///tmp/bktest_buildkitd234413752/buildkitd.sock --debug]
2022/02/07 21:10:34 time="2022-02-07T21:10:19Z" level=info msg="auto snapshotter: using overlayfs"
2022/02/07 21:10:34 panic: runtime error: invalid memory address or nil pointer dereference
2022/02/07 21:10:34 [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xf2c6ec]
2022/02/07 21:10:34 
2022/02/07 21:10:34 goroutine 21 [running]:
2022/02/07 21:10:34 github.com/containernetworking/cni/pkg/invoke.(*DefaultExec).FindInPath(0x0, {0xc0008b5538, 0x0}, {0xc000972bf0, 0x7f386d7ee750, 0x58})
2022/02/07 21:10:34 	<autogenerated>:1 +0x2c
2022/02/07 21:10:34 github.com/containernetworking/cni/libcni.(*CNIConfig).addNetwork(0xc0003f4a00, {0x151db78, 0xc000044068}, {0xc0008b51d0, 0xc}, {0xc0008b51c0, 0x5}, 0xc000978220, {0x0, 0x0}, ...)
2022/02/07 21:10:34 	/src/vendor/github.com/containernetworking/cni/libcni/api.go:395 +0x13a
2022/02/07 21:10:34 github.com/containernetworking/cni/libcni.(*CNIConfig).AddNetworkList(0xc0006324b0, {0x151db78, 0xc000044068}, 0xc000956d80, 0x0)
2022/02/07 21:10:34 	/src/vendor/github.com/containernetworking/cni/libcni/api.go:422 +0x10b
2022/02/07 21:10:34 github.com/containerd/go-cni.(*Network).Attach(0xc000974a50, {0x151db78, 0xc000044068}, 0x0)
2022/02/07 21:10:34 	/src/vendor/github.com/containerd/go-cni/namespace.go:33 +0x6d
2022/02/07 21:10:34 github.com/containerd/go-cni.asynchAttach({0x151db78, 0xc000044068}, 0x0, 0x0, 0x0, 0x0, 0x0)
2022/02/07 21:10:34 	/src/vendor/github.com/containerd/go-cni/cni.go:165 +0x85
2022/02/07 21:10:34 created by github.com/containerd/go-cni.(*libcni).attachNetworks
2022/02/07 21:10:34 	/src/vendor/github.com/containerd/go-cni/cni.go:177 +0xb3
2022/02/07 21:10:34 > stopped 2022-02-07 21:10:19.177142347 +0000 UTC m=+167.774630403 exit status 2 2
2022/02/07 21:10:34 stdout: /usr/bin/buildkitd

https://github.com/moby/buildkit/runs/5099972572?check_suite_focus=true#step:6:1362

@tonistiigi
Copy link
Member

@dmcgowan @AkihiroSuda @mikebrow Ideas? Seems to be somehow related to containerd/go-cni#76

@thaJeztah
Copy link
Member

FWIW; I did a quick post on the CNCF slack to ask @mikebrow if he had any ideas https://cloud-native.slack.com/archives/CGEQHPYF4/p1644366840395359

From there;

trying to figure out why finding a plugin in a directory would cause a seg fault
best guess I have is the paths it is looking in causes the seg fault
or maybe the default exec isn't mapped right?

I haven't looked closely myself, but if anyone on this PR has ideas, feel free to continue the chat there as well 😅 (I'm just trying to get some visibility on the issue)

@mikebrow
Copy link

mikebrow commented Feb 9, 2022

checking over other possible causes we updated to github.com/containernetworking/plugins v1.0.1 .. https://github.com/containerd/containerd/blob/main/go.mod#L23 (I should've mentioned the reason I noted this is your log showed you are still on plugins v0.9.1 https://github.com/moby/buildkit/runs/5044909831?check_suite_focus=true#step:6:260)

after upgrading the cni api to v1.0.1.. (which is really still 1.0.0 as far as what the supported config version is) https://github.com/containerd/go-cni/blame/main/go.mod#L4

I also note the cni config you guys have here seems .. out of date and actually might not be supported anymore in v1.0.0 +
https://github.com/moby/buildkit/blob/master/hack/fixtures/cni.json#L2 says "cniVersion": "0.4.0",

I think that should be 1.0.0 now and read more like this:
https://github.com/containerd/containerd/blob/main/script/setup/install-cni#L36-L69

@crazy-max
Copy link
Member

@jedevc we have pushed some changes, can you rebase?

Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
fuweid added a commit to fuweid/go-cni that referenced this pull request Feb 10, 2022
Before this patch, the package doesn't init CNIConfig.exec field. It is
lazy init by CNIConfig.ensureExec() during AddNetworkList. After
enhancement [1] , there are always more than two goroutines(one for
loopback and other one for eth0) to call ensureExec(). Go runtime
doesn't ensure that value-assignment is atomic, and then it is possible
to use nil and panic, like [2].

> Programs that modify data being simultaneously accessed by multiple
> goroutines must serialize such access.
>
> from https://golang.org/ref/mem Advice section.

I think it should be fixed in github.com/containernetworking/cni library
but we need to delivery containerd release 1.6 version in time. I would
like to init CNIConfig like what the `ensureExec()` [3] does instead of lazy
init.

------------

How to reproduce it:

```golang
package main

import "sync"

type Exec interface {
	FindInPath(v int)
}

type rawExec struct {
	v int
}

func (r *rawExec) FindInPath(v int) {
	r.v += v
}

type cniConfig struct {
	exec Exec
}

func (c *cniConfig) addNetwork(v int) {
	if c.exec == nil {
		c.exec = &rawExec{}
	}
	c.exec.FindInPath(v)
}

func main() {
	c := &cniConfig{}

	for i := 0; i < 100000; i++ {
		c.exec = nil // reset

		var wg sync.WaitGroup
		for j := 0; j <= 10; j++ {
			wg.Add(1)
			go func(k int) {
				defer wg.Done()

				c.addNetwork(k)
			}(j)
		}
		wg.Wait()
	}
}
```

And run it

```bash
➜  /tmp go version
go version go1.17.3 linux/amd64

➜  /tmp go run main.go
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x455a40]

goroutine 264932 [running]:
main.(*rawExec).FindInPath(0x0, 0x0)
	/tmp/main.go:14
main.(*cniConfig).addNetwork(...)
	/tmp/main.go:25
main.main.func1(0x0)
	/tmp/main.go:40 +0xb6
created by main.main
	/tmp/main.go:37 +0xb8
exit status 2
➜  /tmp

```

Root Cause:

The panic meesage is `main.(*rawExec).FindInPath(0x0, 0x0)` which means
that the interface `exec` var has the underly type `rawExec` but its
value is zero `FindInPath(0x0`, like what [2] mentioned.

```golang-assembly
// go tool compile -S ./main.go
//
"".(*cniConfig).addNetwork STEXT size=148 args=0x10 locals=0x20 funcid=0x0
	...
	...

	# (AX) is cniConfig address -> checkout c.exec == nil
        0x0014 00020 (./main.go:22)     CMPQ    (AX), $0
        0x0018 00024 (./main.go:22)     JNE     95

	# if c.exec == nil
	#
	# Since the go interface is struct iface
	#
	# https://github.com/golang/go/blob/release-branch.go1.17/src/runtime/runtime2.go#L202
	#
	# type iface struct {
    	# 	tab  *itab
    	#	data unsafe.Pointer
	# }
	#
	# The value-assignment will be two instructions(one for tab and
	# the other one for data.
        0x001a 00026 (./main.go:21)     MOVQ    AX, "".c+40(SP)
        0x001f 00031 (./main.go:25)     MOVQ    BX, ""..autotmp_4+16(SP)
	#
	# AX will be rawExec type information
	#
	#   	type."".rawExec SRODATA size=120
	#		...
        #	rel 24+8 t=1 runtime.memequal64·f+0
        #	rel 32+8 t=1 runtime.gcbits.+0
        #	rel 40+4 t=5 type..namedata.*main.rawExec-+0
        #	rel 44+4 t=5 type.*"".rawExec+0
        #	rel 48+8 t=1 type..importpath."".+0
        #	rel 56+8 t=1 type."".rawExec+96
        #	rel 80+4 t=5 type..importpath."".+0
        #	rel 96+8 t=1 type..namedata.v-+0
        #	rel 104+8 t=1 type.int+0
	#
        0x0024 00036 (./main.go:23)     LEAQ    type."".rawExec(SB), AX
        0x002b 00043 (./main.go:23)     PCDATA  $1, $0
        0x002b 00043 (./main.go:23)     CALL    runtime.newobject(SB)
	#
	# CX will be itab and get FindInPath TEXT address.
	#
	# 	go.itab.*"".rawExec,"".Exec SRODATA dupok size=32
	#		...
        # 	rel 0+8 t=1 type."".Exec+0
        #	rel 8+8 t=1 type.*"".rawExec+0
        #	rel 24+8 t=-32767 "".(*rawExec).FindInPath+0
	#
        0x0030 00048 (./main.go:23)     LEAQ    go.itab.*"".rawExec,"".Exec(SB), CX
        0x0037 00055 (./main.go:23)     MOVQ    "".c+40(SP), DX
        0x003c 00060 (./main.go:23)     MOVQ    CX, (DX)
        0x003f 00063 (./main.go:23)     PCDATA  $0, $-2
        0x003f 00063 (./main.go:23)     CMPL    runtime.writeBarrier(SB), $0
        0x0046 00070 (./main.go:23)     JNE     78
        0x0048 00072 (./main.go:23)     MOVQ    AX, 8(DX)
        0x004c 00076 (./main.go:23)     JMP     87
        0x004e 00078 (./main.go:23)     LEAQ    8(DX), DI
        0x0052 00082 (./main.go:23)     CALL    runtime.gcWriteBarrier(SB)
        0x0057 00087 (./main.go:25)     PCDATA  $0, $-1
        0x0057 00087 (./main.go:25)     MOVQ    DX, AX
        0x005a 00090 (./main.go:25)     MOVQ    ""..autotmp_4+16(SP), BX

	# if c.exec != nil and try to call FindInPath
	#
	# (AX) is cniConfig
	#
        0x005f 00095 (./main.go:25)     MOVQ    (AX), CX
	#
	# 8(AX) is rawExec address (receiver)
	#
        0x0062 00098 (./main.go:25)     MOVQ    8(AX), AX
	#
	# 24(CX) is the FindInPath TEXT address.
	#
        0x0066 00102 (./main.go:25)     MOVQ    24(CX), CX
        0x006a 00106 (./main.go:25)     PCDATA  $1, $1
        0x006a 00106 (./main.go:25)     CALL    CX
        0x006c 00108 (./main.go:26)     MOVQ    24(SP), BP
        0x0071 00113 (./main.go:26)     ADDQ    $32, SP
        0x0075 00117 (./main.go:26)     RET
	...
	...
```

Since there are multiple goroutines to check `c.exec == nil`, if the
lazy-init only assigns iface.tab and the FindInPath TEXT address,
interface's value is still nil and the FindInPath will use the
interface's value as method receiver. It will panic if try to read the
value from receiver.

Reference:

[1] containerd#76
[2] moby/buildkit#2589 (comment)
[3] https://github.com/containernetworking/cni/blob/1694fd7b57e0176a98a12823a5ffc03337fdc152/libcni/api.go#L182

Fixes: containerd#76

Signed-off-by: Wei Fu <fuweid89@gmail.com>
@fuweid
Copy link
Contributor

fuweid commented Feb 10, 2022

Try to fix it in go-cni containerd/go-cni#82

@tonistiigi tonistiigi merged commit a9c6a48 into moby:master Feb 10, 2022
@thaJeztah
Copy link
Member

Whoop!

And thanks, @fuweid!

@jedevc jedevc deleted the heredoc-stabilize branch February 11, 2022 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dockerfile: bring heredocs out of labs
7 participants