Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event Manager #3329

Closed
lavalamp opened this issue Jan 8, 2015 · 8 comments
Closed

Event Manager #3329

lavalamp opened this issue Jan 8, 2015 · 8 comments
Labels
area/introspection kind/design Categorizes issue or PR as related to design. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@lavalamp
Copy link
Member

lavalamp commented Jan 8, 2015

Now that we have various components producing events, we need to start processing them. Our current policy of a 2 day TTL isn't going to scale.

So, I want us to build an event manager.

The basic sketch is that it reads events from the cluster, processes them, and archives them.

a. Reading events:

  1. It runs in a pod as part of your kubernetes cluster.
  2. It uses the "kubernetes" service to contact the master.
  3. It lists & then watches all events.

b. Process events

  1. Idea: compression. If you see N events identical except for timestamp, transform that into one event with N timestamps.
  2. Alternatively, do the same but for sequences of events. Like if a pod is in a pull -> fail -> pull... loop, make a meta-event and start storing timestamps.
  3. Main purpose: Actively manage the events stored in the kubernetes system, with the goal of maximizing usefulness while storing only a sane number of events.
  4. Idea: provide scriptable/configurable hooks. For example, it would great for an admin to make a policy here that deletes a pod after N failures to start. Or up a level, to delete a replication controller that's making crashlooping pods. Or delete a pod if it fails to schedule after 30 minutes. Etc.
  5. Idea: provide scriptable/configurable alerts. As above, but email/page an admin/owner instead of deleting something.
  6. Idea: upon reading an event, update the object's status information to reflect the new information.

c. Archive events

  1. Store the events in a DB of some sort for offline analysis. Ideally the latency between the event getting added to k8s and getting stored in the DB is low

d. Misc open questions

  1. Run one per namespace or one per cluster? I'd prefer the latter, but we'd have to be cognizant of not leaking information between namespaces.

We can also explore the idea of kubernetes writing events to a non-etcd (or separate etcd) DB in the first place. Events are the primary source of write load on etcd right now. It's good for the moment in that it's exposing some bugs in our use of etcd, but in the long term it's probably more efficient for us to use a different storage mechanism for events.

@davidopp
Copy link
Member

davidopp commented Jan 8, 2015

How do we decide what information to put into PodStatus/NodeStatus and what information to put into discrete events? I believe there is some kind of homomorphism here because in the limit we could

  • embed events in the PodStatus/NodeStatus and remove it as a discrete API type; or on the other extreme
  • remove PodStatus/NodeStatus as API types and make them virtual objects that the client constructs by observing events

Also, on the archiving question -- could/should we just archive the whole etcd transaction log? (I assume etcd is structured as some kind of log that contains every mutating operation, perhaps with some kind of periodic compaction) This could be useful for post-hoc debugging and would give us archiving of events for free (if you were only interested in events when reading back, you'd skip all the non-event mutations).

@lavalamp
Copy link
Member Author

lavalamp commented Jan 8, 2015

How do we decide what information to put into PodStatus/NodeStatus and what information to put into discrete events?

XStatus should have the entire current state of X; an event about X tells you only one detail. I think there is room for both. I added b.6. above to capture the idea of updating status based on incoming events. It's not clear if that's actually the best thing to do, but it's worth talking about.

Also, on the archiving question -- could/should we just archive the whole etcd transaction log?

This is a fair point-- perhaps we should solve archival globally. But it would be good to store it in a searchable/queryable format.

@derekwaynecarr
Copy link
Member

I prefer that we solve archival globally, and if we use events as the first use case, that is fine for me.

----- Original Message -----
From: "Daniel Smith" notifications@github.com
To: "GoogleCloudPlatform/kubernetes" kubernetes@noreply.github.com
Sent: Thursday, January 8, 2015 4:49:29 PM
Subject: Re: [kubernetes] Event Manager (#3329)

How do we decide what information to put into PodStatus/NodeStatus and what information to put into discrete events?

XStatus should have the entire current state of X; an event about X tells you only one detail. I think there is room for both. I added b.6. above to capture the idea of updating status based on incoming events. It's not clear if that's actually the best thing to do, but it's worth talking about.

Also, on the archiving question -- could/should we just archive the whole etcd transaction log?

This is a fair point-- perhaps we should solve archival globally. But it would be good to store it in a searchable/queryable format.


Reply to this email directly or view it on GitHub:
#3329 (comment)

@dchen1107
Copy link
Member

#2298 too

We plan to moving XStatus (at least PodStatus) computation to Kubelet level, which means keeping XStatus as API type is necessary.

@davidopp
Copy link
Member

davidopp commented Jan 8, 2015

One possible model for Status vs. Events is the following (based on a similar system I worked on recently). I think this is similar to what you were saying.

A component that starts with no information can construct the entire state of the cluster by observing the current values of the Status objects in etcd. If it later becomes backlogged or goes down for a long time, it can come back up and reconstruct the state this way. On the other hand, Event lifetimes are bounded (either by an explicit TTL, or the compaction interval of the transaction log, or the retention policy of the system that archives the transaction log, or whatever). So Events should only be used to convey information that is not mission-critical (for example, you specifically would not want to delegate conversion of Events into Status to an event manager that runs on top of Kubernetes, since losing some Events would corrupt the representation of the cluster's state; having Kubelet and API server compute Status is safest).

This also provides a design guideline for components, which in the previous project we described as "edged-based" vs. "level-based"; the former rely on seeing every object transition, while the latter could figure out what to do just based on its observation of the current Statuses and whatever private state it had stashed away. Unfortunately by the time we understood this distinction we were already building edge-based components so we just pretended the store transaction log would be archived long enough that even if the edge-based component was down (or fell behind) for a long time enough time it would always be able to catch up. This isn't a good/safe assumption to make.

@goltermann goltermann added the kind/design Categorizes issue or PR as related to design. label Jan 14, 2015
@goltermann goltermann added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jan 28, 2015
@dchen1107
Copy link
Member

We extracted work items required for v1 release, and filed them separately. Lower the priority of this to P3 to unblock V1.

@dchen1107 dchen1107 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Feb 3, 2015
@davidopp davidopp added team/control-plane sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed team/master labels Aug 22, 2015
@saad-ali saad-ali removed their assignment Mar 17, 2016
@smarterclayton
Copy link
Contributor

This is highly theoretical and in practice isn't an urgent issue. Please re-open if you disagree.

@falenn
Copy link

falenn commented Jun 1, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/introspection kind/design Categorizes issue or PR as related to design. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

9 participants