Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: add pod lifecycle event generator for kubelet #12802

Merged
merged 1 commit into from Feb 2, 2016

Conversation

@yujuhong
Copy link
Member

@yujuhong yujuhong commented Aug 17, 2015

No description provided.

@googlebot googlebot added the cla: yes label Aug 17, 2015
@k8s-bot
Copy link

@k8s-bot k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 96d3e54.

@yujuhong yujuhong force-pushed the yujuhong:kubelet_proposal branch 2 times, most recently from bf40ff2 to 00fcbb0 Aug 17, 2015
@k8s-bot
Copy link

@k8s-bot k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit bf40ff2.

@brendandburns
Copy link
Contributor

@brendandburns brendandburns commented Aug 17, 2015

@k8s-bot
Copy link

@k8s-bot k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 00fcbb0.

@yujuhong yujuhong force-pushed the yujuhong:kubelet_proposal branch from 00fcbb0 to 6d21bdc Aug 17, 2015
@k8s-bot
Copy link

@k8s-bot k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 6d21bdc.

@yujuhong yujuhong force-pushed the yujuhong:kubelet_proposal branch from 6d21bdc to 7637190 Aug 17, 2015
@yujuhong yujuhong force-pushed the yujuhong:kubelet_proposal branch from 7637190 to 919dbf6 Aug 17, 2015
@k8s-bot
Copy link

@k8s-bot k8s-bot commented Aug 17, 2015

GCE e2e build/test failed for commit 7637190.

@k8s-bot
Copy link

@k8s-bot k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 919dbf6.

- Probing: Now that pod workers only wake up on events, we need to
rethink how liveness/readiness probing should be implemented. One
solution is to create a separate probing component that can be
plugged in to PLEG to generate corresponding pod lifecylce

This comment has been minimized.

@ironcladlou

ironcladlou Aug 19, 2015
Contributor

This is cool. I like the idea of a plugin to present the probe outcomes as events.

@ironcladlou
Copy link
Contributor

@ironcladlou ironcladlou commented Aug 19, 2015

This would also enhance the apparent responsiveness of UIs in some cases (openshift/origin#3781). As just one example in OpenShift, a deployment's outcome is tied closely to the status of a Pod, and by proxy its containers; but no matter how fast the deployment container returns, the apparent running time can be inflated artificially by up to 10 seconds depending on where we land in the pod polling.

### Bootstrapping/restart
Upon restart, PLEG needs to perform a reslit to retrieve all container

This comment has been minimized.

@dchen1107

dchen1107 Aug 20, 2015
Member

Please indicate this is kubelet restart case here

This comment has been minimized.

@dchen1107

dchen1107 Aug 20, 2015
Member

s/reslit/relist

This comment has been minimized.

@dchen1107

dchen1107 Aug 20, 2015
Member

Docker restart is another case we have to detect and handle here too. You need to detect event_stream is disconnect. When docker is starting up, it might send you tons of events of old containers.

@dchen1107
Copy link
Member

@dchen1107 dchen1107 commented Aug 20, 2015

@yujuhong On the way back home, I realized that cadvisor only support 4 types events: containerCreation, containerDeletion, OOM, and OOMKilled, and none of them are the key event you want or sufficient enough for you to make any decision:

  • containerCreation doesn't mean we can start a container.
  • containerDeletion is only fired when container GC removes dead container
  • The only termination event you have is OOMKilled when reach cgroup's memory limit
  • When sys oom and kill a random process, you won't have container information. This is known issue in cAdvisor, but I don't think it is easy to fix.
  • Other failures cause a termination of container, there is no event generated.

From above, you can see we pretty much cannot rely on cAdvisor's event stream at all, and I don't think there is an easy to add those missing events to cadvisor. Here is the code defined event:
https://github.com/google/cadvisor/blob/2a022a4a74f4f994c1ae7684de7655aaa0a898ba/info/v1/container.go#L521

Let's just stick on docker events for now. For rkt case, we could invoke relist operation all the time until they have event stream support.

cc/ @kubernetes/goog-node

@dchen1107 dchen1107 added the sig/node label Aug 20, 2015
@thereallukl
Copy link

@thereallukl thereallukl commented Aug 20, 2015

Adding such strong dependency on cadvisor to provide information about lifecycle might block adding new container technologies in the future ( one might want to add for example lxc, qemu, ) and dependency for those drivers on something that is just metering component does not sound very good.

That being said I prefer option 1 for source of PLEG events ( Each runtime provides events by itself).

@vishh
Copy link
Member

@vishh vishh commented Aug 20, 2015

I believe the intention is to abstract out container runtimes using the cadvisor interface. Since cadvisor is already trying to handle container runtimes as first class entities, it seems natural to use it as an abstraction layer.
+1 to @dchen1107's arguments against using raw cadvisor events. May be we can extend cadvisor to support docker events?

@yujuhong
Copy link
Member Author

@yujuhong yujuhong commented Aug 20, 2015

This would also enhance the apparent responsiveness of UIs in some cases (openshift/origin#3781). As just one example in OpenShift, a deployment's outcome is tied closely to the status of a Pod, and by proxy its containers; but no matter how fast the deployment container returns, the apparent running time can be inflated artificially by up to 10 seconds depending on where we land in the pod polling.

This should definitely improve the response latency provided that kubelet is not already overwhelmed.

@dchen1107
Copy link
Member

@dchen1107 dchen1107 commented Aug 20, 2015

May be we can extend cadvisor to support docker events?

I am not sure this is a good idea though, but could be convinced. Docker is one implementation of container runtime, rkt is another one. To extend cadvisor to support docker event, we have to complicate cadvisor to make it composable too.

@vishh
Copy link
Member

@vishh vishh commented Aug 20, 2015

We will have to abstract out events from multiple container runtimes in
kubelet anyways. I assume your argument is whether cAdvisor is the right
abstraction layer?

On Thu, Aug 20, 2015 at 11:35 AM, Dawn Chen notifications@github.com
wrote:

May be we can extend cadvisor to support docker events?
I am not sure this is a good idea though, but could be convinced. Docker
is one implementation of container runtime, rkt is another one. To extend
cadvisor to support docker event, we have to complicate cadvisor to make it
composable too.


Reply to this email directly or view it on GitHub
#12802 (comment)
.

@yujuhong
Copy link
Member Author

@yujuhong yujuhong commented Aug 20, 2015

Some clarifications since we (me, @vishh and @dchen1107) discussed this offline :-)

@yujuhong On the way back home, I realized that cadvisor only support 4 types events: containerCreation, containerDeletion, OOM, and OOMKilled, and none of them are the key event you want or sufficient enough for you to make any decision:

containerCreation doesn't mean we can start a container.

containerCreation means that we tried to start a container. It may fail to start because of other reasons (image, etc), but we'll receive a containerDeletion in those cases.

@vishh brought up a valid point that this would make kubelet interprets "container starts" differently today. I agree with that, but personally I think this does not affect how kubelet works internally, and is not that critical.

containerDeletion is only fired when container GC removes dead container
The only termination event you have is OOMKilled when reach cgroup's memory limit
When sys oom and kill a random process, you won't have container information. This is known issue in cAdvisor, but I don't think it is easy to fix.
Other failures cause a termination of container, there is no event generated.

The conclusion from the discussion is that cAdvisor would send out a Deletion event if a container terminates because cgroups cleanup upon pid 1 exiting (thanks @vishh for pointing that out). I can verify this since I have tested my prototype by docker stop containers. kubelet reacts to that immediately.

From above, you can see we pretty much cannot rely on cAdvisor's event stream at all, and I don't think there is an easy to add those missing events to cadvisor. Here is the code defined event:
https://github.com/google/cadvisor/blob/2a022a4a74f4f994c1ae7684de7655aaa0a898ba/info/v1/container.go#L521

Let's just stick on docker events for now. For rkt case, we could invoke relist operation all the time until they have event stream support.

@yujuhong
Copy link
Member Author

@yujuhong yujuhong commented Aug 20, 2015

We will have to abstract out events from multiple container runtimes in
kubelet anyways. I assume your argument is whether cAdvisor is the right
abstraction layer?

I agree that we need an abstraction for events across different container runtimes.

cadvisor provides the bare-minimum event stream for kubelet to work, and we can move faster with it for now. That's why I chose cadvisor as a starting point. I think it's worth giving it a try since it should be easy to swap out the container event watcher.

That said, if the consensus is to use docker event, I'd be okay with that.

@yifan-gu
Copy link
Member

@yifan-gu yifan-gu commented Aug 20, 2015

I think we need to define the event stream interface in kubelet. Whether we implement it using cadvisor or docker event stream should not matter.

ContainerStopped PodLifeCycleEventType = "ContainerStopped"
NetworkSetupCompleted PodLifeCycleEventType = "NetworkSetupCompleted"
NetworkFailed PodLifeCycleEventType = "NetworkFailed"
)

This comment has been minimized.

@ejemba

ejemba Oct 15, 2015

Regarding the issue #10288, I think you should add another high level PodLifeCycleEventType "ContainerDied" otherwise we will have the same problem ,even with the your new system.
Considering what events docker is capable of firing ( cf. https://docs.docker.com/reference/api/docker_remote_api_v1.17/#monitor-docker-s-events ) this PodLifeCycleEventType won't be inexact.

This comment has been minimized.

@yujuhong

yujuhong Oct 16, 2015
Author Member

@ejemba this list is not complete, but it includes all the types we use for now. We can augment this whenever we need :)

@zhengguoyong
Copy link
Contributor

@zhengguoyong zhengguoyong commented Nov 6, 2015

👍

@dchen1107
Copy link
Member

@dchen1107 dchen1107 commented Dec 2, 2015

@yujuhong Can you update this based on our current roadmap, design and implemention.

@k8s-bot
Copy link

@k8s-bot k8s-bot commented Dec 2, 2015

GCE e2e test build/test passed for commit 919dbf6.

@yujuhong yujuhong force-pushed the yujuhong:kubelet_proposal branch from 919dbf6 to 4faba2a Jan 12, 2016
@k8s-bot
Copy link

@k8s-bot k8s-bot commented Jan 12, 2016

GCE e2e build/test failed for commit 4faba2a.

@yujuhong yujuhong force-pushed the yujuhong:kubelet_proposal branch 2 times, most recently from 501ced3 to f3f3d6f Jan 12, 2016
@yujuhong yujuhong force-pushed the yujuhong:kubelet_proposal branch from f3f3d6f to b8b532b Jan 12, 2016
@yujuhong
Copy link
Member Author

@yujuhong yujuhong commented Jan 12, 2016

@yujuhong Can you update this based on our current roadmap, design and implemention.

@dchen1107, I updated the proposal. Please take a look, thanks!

@k8s-teamcity-mesosphere

This comment has been minimized.

Copy link

@k8s-teamcity-mesosphere k8s-teamcity-mesosphere commented on b8b532b Jan 12, 2016

TeamCity OSS :: Kubernetes Mesos :: 4 - Smoke Tests Build 11313 outcome was SUCCESS
Summary: Tests passed: 1, ignored: 212 Build time: 00:08:36

@k8s-bot
Copy link

@k8s-bot k8s-bot commented Jan 12, 2016

GCE e2e test build/test passed for commit 501ced3.

@k8s-bot
Copy link

@k8s-bot k8s-bot commented Jan 12, 2016

GCE e2e test build/test passed for commit b8b532b.

@eparis
Copy link
Member

@eparis eparis commented Feb 1, 2016

@k8s-bot test this issue: #IGNORE

Tests have been pending for 24 hours

@k8s-bot
Copy link

@k8s-bot k8s-bot commented Feb 2, 2016

GCE e2e test build/test passed for commit b8b532b.

@dchen1107
Copy link
Member

@dchen1107 dchen1107 commented Feb 2, 2016

LGTM

@dchen1107 dchen1107 added the lgtm label Feb 2, 2016
@k8s-github-robot
Copy link
Contributor

@k8s-github-robot k8s-github-robot commented Feb 2, 2016

Automatic merge from submit-queue

k8s-github-robot added a commit that referenced this pull request Feb 2, 2016
Auto commit by PR queue bot
@k8s-github-robot k8s-github-robot merged commit 635cf58 into kubernetes:master Feb 2, 2016
5 checks passed
5 checks passed
Jenkins GCE e2e 221 tests run, 105 skipped, 0 failed.
Details
Jenkins unit/integration 2653 tests run, 9 skipped, 0 failed.
Details
Submit Queue Queued to run github e2e tests a second time.
Details
cla/google All necessary CLAs are signed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@yujuhong yujuhong deleted the yujuhong:kubelet_proposal branch Feb 23, 2016
xingzhou pushed a commit to xingzhou/kubernetes that referenced this pull request Dec 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

You can’t perform that action at this time.