New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal of coredump detector #1311

Closed
wants to merge 2 commits into
base: master
from

Conversation

@CaoShuFeng
Member

CaoShuFeng commented Nov 1, 2017

This is a proposal which implements coredump isolation in kubernetes cluster.
It will work as an add-on which could be deployed in kubernetes cluster.

A demonstrate for this design has been implemented and put here

With this add-on deployed into kubernetes, when coredump happens in pods, users may get coredump info with kubectl command:
coredump-metadata

and check the quota:
coredump-quota

Partial-fix: kubernetes/kubernetes#48787

```
# coredump-controller
Now CRD in kubernetes doesn't support quota, so we deploy a controller to work as

This comment has been minimized.

@derekwaynecarr

derekwaynecarr Nov 1, 2017

Member

the plan is to support this soon once we can have a shared cache with garbage collection.

so in your case, we would just quota: count/coredumps.coredump

This comment has been minimized.

@CaoShuFeng

CaoShuFeng Nov 2, 2017

Member

Hi, I will update this according to the new feature.

Does quota for crd support resource.Quantity?
For example: we allow 2Gi coredump files in namespace A.
But not: we allow 200 coredump files in namespace A.

@CaoShuFeng

This comment has been minimized.

Member

CaoShuFeng commented Dec 4, 2017

/cc @vishh for suggestions.

@CaoShuFeng

This comment has been minimized.

Member

CaoShuFeng commented Dec 13, 2017

/assign @dchen1107

To determine whether a core file is generated for process in a k8s container, we
override /proc/sys/kernel/core_pattern kernel parameter in kubelet node.
```
|/coredump/coredump-detector -P=%P -p=%p -e=%e -t=%t -c=/coredump/config --log_dir=/coredump/

This comment has been minimized.

@lucab

lucab Dec 19, 2017

Which mount namespace would this /coredump path belongs to? In particular, are you assuming that this is going to be in the pod-app mount namespace, in the kubelet mount namespace or in the kernel-init/host one?

This comment has been minimized.

@CaoShuFeng

CaoShuFeng Dec 19, 2017

Member

kernel does not support namespace for coredump.
So this would be kernel-init/host namespace.

@CaoShuFeng

This comment has been minimized.

Member

CaoShuFeng commented Jan 22, 2018

@luxas Do you have some suggestions about this?

When coredump happens, linux kernel will call coredump-detector and give core
dump file as standard input to coredump-detector.
coredump-detector will:
* access the docker api and distinguish where(which container) the core dump comes from

This comment has been minimized.

@vikaschoudhary16

vikaschoudhary16 Jan 22, 2018

Member

s/docker/runtime

This comment has been minimized.

@CaoShuFeng

CaoShuFeng Jan 22, 2018

Member

Good suggestion.
Now kubelet is supporting container runtimes, so I need to update this and support different container runtimes too.

CaoShuFeng added some commits Nov 1, 2017

@CaoShuFeng CaoShuFeng force-pushed the CaoShuFeng:coredump-daemonset branch from 93a9b85 to b003587 Jan 30, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Jan 30, 2018

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: CaoShuFeng
We suggest the following additional approver: brendandburns

Assign the PR to them by writing /assign @brendandburns in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

fejta-bot commented Apr 30, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot

This comment has been minimized.

fejta-bot commented May 30, 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@fejta-bot

This comment has been minimized.

fejta-bot commented Aug 28, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot

This comment has been minimized.

fejta-bot commented Sep 27, 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@fejta-bot

This comment has been minimized.

fejta-bot commented Nov 21, 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Nov 21, 2018

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment