Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid memory address or nil pointer dereference #481

Closed
tedyu opened this issue Apr 16, 2020 · 14 comments · Fixed by #482
Closed

invalid memory address or nil pointer dereference #481

tedyu opened this issue Apr 16, 2020 · 14 comments · Fixed by #482

Comments

@tedyu
Copy link
Contributor

tedyu commented Apr 16, 2020

What happend:

Apr 16 19:03:00.175064 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: 2020-04-16T19:03:00Z [verbose] Add: openshift-console:downloads-dc9498995-p4vgn:unknownUID:(openshift-sdn):eth0 {"cniVersion":"0.3.1","interfaces":[{"name":"eth0","sandbox":"/proc/20411/ns/net"}],"ips":[{"version":"4","interface":0,"address":"10.129.0.18/23"}],"routes":[{"dst":"0.0.0.0/0","gw":"10.129.0.1"},{"dst":"224.0.0.0/4"},{"dst":"10.128.0.0/14"}],"dns":{}}
Apr 16 19:03:00.177321 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: panic: runtime error: invalid memory address or nil pointer dereference
Apr 16 19:03:00.177321 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xa62075]
Apr 16 19:03:00.177321 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: goroutine 1 [running]:
Apr 16 19:03:00.177545 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: panic(0x11ed400, 0x1e95e80)
Apr 16 19:03:00.177545 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]:         /usr/lib/golang/src/runtime/panic.go:722 +0x2c2 fp=0xc000243300 sp=0xc000243270 pc=0x433e32
Apr 16 19:03:00.177545 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: runtime.panicmem(...)
Apr 16 19:03:00.177545 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]:         /usr/lib/golang/src/runtime/panic.go:199
Apr 16 19:03:00.177545 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: runtime.sigpanic()
Apr 16 19:03:00.177545 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]:         /usr/lib/golang/src/runtime/signal_unix.go:394 +0x3ec fp=0xc000243330 sp=0xc000243300 pc=0x448b5c
Apr 16 19:03:00.177545 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: github.com/intel/multus-cni/vendor/k8s.io/api/core/v1.(*Pod).GetObjectKind(0x0, 0xc000243378, 0x4e4ec1)
Apr 16 19:03:00.177545 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]:         <autogenerated>:1 +0x5 fp=0xc000243338 sp=0xc000243330 pc=0xa62075
Apr 16 19:03:00.177545 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: github.com/intel/multus-cni/vendor/k8s.io/client-go/tools/reference.GetReference(0xc00013bce0, 0x1512ba0, 0x0, 0x4edb6b, 0xc0003e84e0, 0x115ef60)
Apr 16 19:03:00.177545 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]:         /usr/src/multus-cni/gopath/src/github.com/intel/multus-cni/vendor/k8s.io/client-go/tools/reference/ref.go:50 +0x73 fp=0xc0002434a8 sp=0xc000243338 pc=0xf1c073
Apr 16 19:03:00.177545 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: github.com/intel/multus-cni/vendor/k8s.io/client-go/tools/record.(*recorderImpl).generateEvent(0xc0002d0440, 0x1512ba0, 0x0, 0x0, 0xbf9e48d90a6f6fc5, 0x4e12d0a6c3, 0x1ead580, 0x135b538, 0x6, 0x1363a2a, ...)
Apr 16 19:03:00.177973 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]:         /usr/src/multus-cni/gopath/src/github.com/intel/multus-cni/vendor/k8s.io/client-go/tools/record/event.go:257 +0x5d fp=0xc000243590 sp=0xc0002434a8 pc=0xf9d7cd
Apr 16 19:03:00.177973 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: github.com/intel/multus-cni/vendor/k8s.io/client-go/tools/record.(*recorderImpl).Event(0xc0002d0440, 0x1512ba0, 0x0, 0x135b538, 0x6, 0x1363a2a, 0xe, 0xc0003da140, 0x19)
Apr 16 19:03:00.177973 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]:         /usr/src/multus-cni/gopath/src/github.com/intel/multus-cni/vendor/k8s.io/client-go/tools/record/event.go:287 +0xc2 fp=0xc000243608 sp=0xc000243590 pc=0xf9dc32
Apr 16 19:03:00.177973 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: github.com/intel/multus-cni/vendor/k8s.io/client-go/tools/record.(*recorderImpl).Eventf(0xc0002d0440, 0x1512ba0, 0x0, 0x135b538, 0x6, 0x1363a2a, 0xe, 0x135ebcb, 0x9, 0xc0000e44c0, ...)
Apr 16 19:03:00.177973 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]:         /usr/src/multus-cni/gopath/src/github.com/intel/multus-cni/vendor/k8s.io/client-go/tools/record/event.go:291 +0xca fp=0xc000243660 sp=0xc000243608 pc=0xf9dd1a
Apr 16 19:03:00.177973 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: github.com/intel/multus-cni/k8sclient.(*ClientInfo).Eventf(...)
Apr 16 19:03:00.177973 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]:         /usr/src/multus-cni/gopath/src/github.com/intel/multus-cni/k8sclient/k8sclient.go:89
Apr 16 19:03:00.177973 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: main.delegateAdd(0x0, 0x0, 0xc0002d0480, 0x0, 0xc00004001b, 0x4, 0xc000178d00, 0xc000308120, 0xc000244560, 0xf, ...)
Apr 16 19:03:00.177973 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]:         /usr/src/multus-cni/gopath/src/github.com/intel/multus-cni/multus/multus.go:362 +0xed1 fp=0xc000243970 sp=0xc000243660 pc=0x10c6cd1
Apr 16 19:03:00.177973 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: main.cmdAdd(0xc0002264d0, 0x0, 0x0, 0xc0002d0480, 0xc0000b4000, 0x5, 0x5, 0x0)
Apr 16 19:03:00.177973 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]:         /usr/src/multus-cni/gopath/src/github.com/intel/multus-cni/multus/multus.go:522 +0x829 fp=0xc000243c48 sp=0xc000243970 pc=0x10c9859
Apr 16 19:03:00.177973 ci-op-nqdgf-m-0.c.openshift-gce-devel-ci.internal crio[1370]: main.main.func1(0xc0002264d0, 0xc000041a88, 0x5)

What you expected to happen:
Logging event shouldn't crash.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
See masters-journal in:
https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.5/798/artifacts/e2e-gcp/nodes/

Environment:

  • Multus version
    image path and image ID (from 'docker images')
  • Kubernetes version (use kubectl version):
  • Primary CNI for Kubernetes cluster:
  • OS (e.g. from /etc/os-release):
Apr 16 18:45:21.945896 localhost kernel: Linux version 4.18.0-147.5.1.el8_1.x86_64 (mockbuild@x86-vm-07.build.eng.bos.redhat.com) (gcc version 8.3.1 20190507 (Red Hat 8.3.1-4) (GCC)) #1 SMP Tue Jan 14 15:50:19 UTC 2020
  • File of '/etc/cni/net.d/'
  • File of '/etc/cni/multus/net.d'
  • NetworkAttachment info (use kubectl get net-attach-def -o yaml)
  • Target pod yaml info (with annotation, use kubectl get pod <podname> -o yaml)
  • Other log outputs (if you use multus logging)
@tedyu
Copy link
Contributor Author

tedyu commented Apr 17, 2020

It seems Pod parameter may be nil - see the following from delegateAdd().

		podUID := "unknownUID"
		if pod != nil {

However, GetReference() checks obj against nil at the beginning:

func GetReference(scheme *runtime.Scheme, obj runtime.Object) (*v1.ObjectReference, error) {
	if obj == nil {
		return nil, ErrNilObject
	}

@tedyu
Copy link
Contributor Author

tedyu commented Apr 17, 2020

Looking at staging/src/k8s.io/client-go/tools/reference/ref.go from k8s, the following code:

        var listMeta metav1.Common
        objectMeta, err := meta.Accessor(obj)
        if err != nil {
                listMeta, err = meta.CommonAccessor(obj)

is not in vendor/k8s.io/client-go/tools/reference/ref.go

Probably the golang version is a hurdle to adopting newer k8s release for multus .

@s1061123

Do you have some suggestion how vendor/k8s.io/client-go/tools/reference/ref.go should evolve ?

Thanks

@tedyu
Copy link
Contributor Author

tedyu commented Apr 17, 2020

fix-get-ref.txt

@tedyu
Copy link
Contributor Author

tedyu commented Apr 17, 2020

fix-get-ref.txt is the application of the following from k8s

commit 399d09ce4ad1728fcecdce09503a07cf1bfecef6
Fix GetReference function

to vendor/k8s.io/client-go/tools/reference/ref.go

But it seems with the above patch, obj.GetObjectKind() would be called without obj being changed (compared to what multus has now). The panic would still likely occur.

@tedyu
Copy link
Contributor Author

tedyu commented Apr 17, 2020

One option I can think of is to add recovery around the calls to Eventf:

	if delegate.Name != "" {
		kubeClient.Eventf(pod, v1.EventTypeNormal, "AddedInterface", "Add %s %v from %s", rt.IfName, ips, delegate.Name)

@s1061123
Copy link
Member

@tedyu Thank you for your report and analysis! Let me try to fix that.

@s1061123
Copy link
Member

Now trying to update lib to v1.18.0, including vendored packages....

@tedyu
Copy link
Contributor Author

tedyu commented Apr 17, 2020

k8s 1.18 is on go 1.13
multus is on go 1.12

Not sure about the impact for upgrading golang version.

I ran multus test suite which passed. Not sure if the kubeClient.Eventf() call is exercised.
If you can tell me how to retrieve test output for multus/multus_test.go, I will dig deeper.

Thanks

@s1061123
Copy link
Member

@tedyu Currently we don't test the kubeClient.Eventf() because it invokes just Eventf of client-go, so I'm going to add some test in its fix.

BTW, in intel/multus-cni uses go1.13 to build multus-cni. Which one do you mention (that builds with go1.12)? OpenShift or some?

@s1061123
Copy link
Member

oh, I got it. you mentioned that container build (Dockerfile)...

@tedyu
Copy link
Contributor Author

tedyu commented Apr 17, 2020

w.r.t. golang version, I saw the following in go.mod:

go 1.12

I took a look at vendor/k8s.io/api/core/v1/generated.pb.go but didn't find (*Pod).GetObjectKind
In k8s, I checked ./staging/src/k8s.io/api/core/v1/generated.pb.go which doesn't have GetObjectKind either.

Please publish your test so that we know the crash can be reproduced locally.

Thanks

@s1061123
Copy link
Member

@tedyu thank you for your advice! sure. I'll add test first.

@tedyu
Copy link
Contributor Author

tedyu commented Apr 19, 2020

Contemplating possible cause for nil dereference, that may happen if Pod.GetObjectKind tries to dereference some pointer.

Looking at GetObjectKind methods in k8s, one example is the following:

diff --git a/staging/src/k8s.io/apimachinery/pkg/runtime/embedded.go b/staging/src/k8s.io/apimachinery/pkg/runtime/embedded.go
index 7251e65f6e0..bdbf1f9fee7 100644
--- a/staging/src/k8s.io/apimachinery/pkg/runtime/embedded.go
+++ b/staging/src/k8s.io/apimachinery/pkg/runtime/embedded.go
@@ -29,7 +29,12 @@ type encodable struct {
        versions []schema.GroupVersion
 }

-func (e encodable) GetObjectKind() schema.ObjectKind { return e.obj.GetObjectKind() }
+func (e encodable) GetObjectKind() schema.ObjectKind {
+       if e.obj == nil {
+               return schema.EmptyObjectKind
+       }
+       return e.obj.GetObjectKind()
+}
 func (e encodable) DeepCopyObject() Object {
        out := e
        out.obj = e.obj.DeepCopyObject()

Not sure if this applies to Pod.GetObjectKind since I haven't found its body.

@s1061123
Copy link
Member

I am not sure but GetReference() invokes GetObjectKind() here, so it might be the one.

https://github.com/kubernetes/client-go/blob/9bea17718df8234c2556ffc1ea7416c13effc844/tools/reference/ref.go#L50

So the candidate flows seems to be
clientInfo.Eventf() -> recorder.Eventf() -> recorder.generateEvent() -> ref.GetReference()`

s1061123 added a commit to s1061123/multus-cni that referenced this issue Apr 20, 2020
To troubleshooting k8snetworkplumbingwg#481, adds error message for further analysis.
s1061123 added a commit that referenced this issue Apr 21, 2020
To troubleshooting #481, adds error message for further analysis.
dougbtv pushed a commit that referenced this issue Jun 18, 2020
To troubleshooting #481, adds error message for further analysis.
dougbtv pushed a commit that referenced this issue Jun 18, 2020
To troubleshooting #481, adds error message for further analysis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants