Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOB-K8S-023: kubelet crash due to improperly handled errors #81135

Open
cji opened this issue Aug 8, 2019 · 15 comments
Open

TOB-K8S-023: kubelet crash due to improperly handled errors #81135

cji opened this issue Aug 8, 2019 · 15 comments

Comments

@cji
Copy link
Member

@cji cji commented Aug 8, 2019

This issue was reported in the Kubernetes Security Audit Report

Description
The kubelet will periodically poll a directory for its disk usage with the GetDirDiskUsage function. To do this, it parses the STDOUT of the ionice command. If there is an error when reading from STDOUT, the error is logged, but execution continues (Figure 2). Due to this continuation, STDOUT is parsed as an empty string, then indexed (Figure 3), resulting in an out-of-bounds (OOB) panic (Figure 1).

E0320 19:31:54.493854    6450 fs.go:591] Failed to read from stdout for cmd [ionice -c3 nice -n 19 du -s /var/lib/docker/overlay2/bbfc9596c0b12fb31c70db5ffdb78f47af303247bea7b93eee2cbf9062e307d8/diff] - read |0: bad file descriptor
panic: runtime error: index out of range

goroutine 289 [running]:
k8s.io/kubernetes/vendor/github.com/google/cadvisor/fs.GetDirDiskUsage(0xc001192c60, 0x5e, 0x1bf08eb000, 0x1, 0x0, 0xc0011a7188)
    /workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/fs/fs.go:600 +0xa86
k8s.io/kubernetes/vendor/github.com/google/cadvisor/fs.(*RealFsInfo).GetDirDiskUsage(0xc000bdbb60, 0xc001192c60, 0x5e, 0x1bf08eb000, 0x0, 0x0, 0x0)
    /workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/fs/fs.go:565 +0x89
k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/common.(*realFsHandler).update(0xc000ee7560, 0x0, 0x0)
    /workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/common/fsHandler.go:82 +0x36a
k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/common.(*realFsHandler).trackUsage(0xc000ee7560)
    /workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/common/fsHandler.go:120 +0x13b
created by
k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/common.(*realFsHandler).Start
    /workspace/anago-v1.13.4-beta.0.55+c27b913fddd1a6/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/google/cadvisor/container/common/fsHandler.go:142 +0x3f

Figure 25.1: Stacktrace of a kubelet crash resulting from a bad file descriptor.

stdoutb, souterr := ioutil.ReadAll(stdoutp)
if souterr != nil {
	klog.Errorf("Failed to read from stdout for cmd %v - %v", cmd.Args, souterr)
}

Figure 25.2: Only the error is logged, execution flow is not affected by the error.

usageInKb, err := strconv.ParseUint(strings.Fields(stdout)[0], 10, 64)

Figure 25.3: stdout is indexed, even if it is empty.

Additionally, if the command produces no output for any reason, the command will also fail due to an empty string being indexed.

Exploit Scenario
The ionice command fails to execute as expected, resulting in a kubelet crash.

Recommendation
Short term, ensure stdout is validated before attempting to parse the output.

Long term, improve unit testing to cover failures of dependent tooling.

Anything else we need to know?:

See #81146 for current status of all issues created from these findings.

The vendor gave this issue an ID of TOB-K8S-023 and it was finding 27 of the report.

The vendor considers this issue Low Severity.

To view the original finding, begin on page 70 of the Kubernetes Security Review Report

Environment:

  • Kubernetes version: 1.13.4
@tedyu
Copy link
Contributor

@tedyu tedyu commented Aug 8, 2019

The referenced code seems to come from certain vendor.

@joelsmith
Copy link
Contributor

@joelsmith joelsmith commented Aug 8, 2019

/sig node
/area security
/kind bug
/remove-kind feature

@zouyee
Copy link
Member

@zouyee zouyee commented Aug 8, 2019

/area cadvisor

@liggitt liggitt changed the title kubelet crash due to improperly handled errors TOB-K8S-023: kubelet crash due to improperly handled errors Aug 8, 2019
@tallclair
Copy link
Member

@tallclair tallclair commented Aug 12, 2019

/cc @dashpole

@dashpole
Copy link
Contributor

@dashpole dashpole commented Aug 12, 2019

This is no longer relevant since we now walk the filesystem directly: https://github.com/google/cadvisor/blob/master/fs/fs.go#L552

@dashpole
Copy link
Contributor

@dashpole dashpole commented Aug 12, 2019

We should make sure this same problem doesn't exist with the kubelet's empty-dir monitoring code, which still uses du: https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/fs/fs.go#L59

@tedyu
Copy link
Contributor

@tedyu tedyu commented Aug 12, 2019

@dashpole
Code from cadvisor calculates bytes and inodes.
For kubelet, would int64Amount of the quantity store the value for bytes ?

thanks

@dashpole
Copy link
Contributor

@dashpole dashpole commented Aug 21, 2019

/assign @dashpole

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Nov 19, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@joelsmith
Copy link
Contributor

@joelsmith joelsmith commented Nov 20, 2019

/remove-lifecycle stale

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Feb 18, 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@tedyu
Copy link
Contributor

@tedyu tedyu commented Feb 18, 2020

@dashpole
If you're busy, I can try to tackle this issue.

@tedyu
Copy link
Contributor

@tedyu tedyu commented Feb 18, 2020

After obtaining the bytes using code from https://github.com/google/cadvisor/blob/master/fs/fs.go#L552 , I convert the bytes with following code:

	used, err := resource.ParseQuantity(strconv.FormatInt(int64(bytes), 10))

	used.Format = resource.BinarySI

Is the above correct ?

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Mar 19, 2020

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@cji
Copy link
Member Author

@cji cji commented Mar 24, 2020

/lifecycle frozen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

9 participants
You can’t perform that action at this time.