Proposal for revocation severity levels and context #47

THS-on · 2021-06-08T09:49:40Z

This is the proposal for adding severity levels and context to revocation events that are send by Keylime.

This is useful if other systems need to make more informed decisions about the state of an agent. This is currently not possible with Keylime.

I've tried to design the proposed implementation in such a way that (hopefully) most of components don't need heavy modifications.

Signed-off-by: Thore Sommer <mail@thson.de>

46_revocation_severity_and_context.md

mpeters · 2021-06-18T15:32:56Z

46_revocation_severity_and_context.md

+Events can a specified context. Those can be either a static string or an JSON
+object that contains more information event. This is optional and it is assumed
+that two events from the same agent with the same event id have the same
+severity and are the same event.


I don't think we can assume they are for the same event right? It's possible to have the same failure one hour, recover and have a similar failing again the next hour right? Or am I missing something.

I think my wording for recoverable vs irrecoverable events is not precise enough. What I meant with recoverable was that Keylime can still operate and do (parts of) remote attestation. Is would case for an IMA failure but not if the qoute cannot be validated (and retries failed).

In my opinion you cannot really recover from a revocation event because in most cases that just means that a window for attack was open and the system might be compromised.
If it was determined that window was not used maybe the events can deleted manually or reset the agent by removing and adding it to the verifier again.

The main idea behind that is, that the receiver of the revocation events does not get a message every time when the failed agent is polled if it was already notified.
Another idea was that our decision is mainly based on the id and context is more for deeper analysis if needed. Also the level is only assigned based on the id not the context.

But I think if the context part is heavily used i could make sense to make it not optional and consider two events with the same id but different context as two separate events.

46_revocation_severity_and_context.md

mpeters · 2021-06-18T15:39:14Z

46_revocation_severity_and_context.md

+validating further. The state of the agent after that should be
+`QUOTE_FAILED_IRRECOVERABLE`.
+
+The highest severity is 0 and the higher the number the number the lower the


While I like the idea of being flexible, I'm not sure it's necessary. I can't imagine real world scenarios with more than a handful of severity levels otherwise it will just be too operationally confusing. Also, I can see folks making mistakes in configuration because 0 is high and something like 10 is low. How about using something like the syslog severity text lables? It's well known and covers all kinds of scenarios:

emerg

alert

crit

err

warning

notice

info

debug

I could even see simplifying it more and merging emerg, alert, and crit into a single value like "crit"

Yes, I can see that this might be too confusing and may lead to errors. Labels are more descriptive than numbers.

The main issue that I have with static labels is that you might use a label for a unintended purposes because you run out of labels. This will not be the case in most environments, but for us we want to use it for presorting events with a pretty fine granularity and I don't know if the proposed labels can represent accurately.

Another idea would be to use labels but make them configurable in the keylime.conf. Those labels would be strictly ordered by severity and are static during operation, but you still have the flexibility to add custom labels.

I've changed the proposal to include user specified labels instead of using numbers.

Signed-off-by: Thore Sommer <mail@thson.de>

maugustosilva · 2021-06-18T16:50:51Z

46_revocation_severity_and_context.md

+severity. This makes it easy to add less sever levels without the need to
+increase the level of the highest severity.
+
+A new table `revocation_events` is introduced to save the already sent


The existence of a new table worries me, to be honest. Would this table continuously grow within the lifetime of an agent? In other words, are we expecting multiple lines with the same agent id?

Yes we would record every event that was generated for a agent during its lifetime.
This allows us to do two things. First can provide an API entry point for all the revocation events that have been generated for other systems. Second we want to check if we already sent an event. For example if a static PCR is failing it will in most cases always fail again and we want to be only notified once. Always resending the event would cause potentially a lot of redundant traffic.

We could put that information into a JSON object and store it string encoded in one single entry (if the database allows arbitrary length for fields), but that would make checking if we send a event already always a linear action.
Another approach is to only store the latest or the one with the current highest severity, but this will remove the option to check if a event was sent and a event log for an agent must be recorded by a component outside of Keylime.

This proposal would be breaking a key assumption on Keylime databases so far: these are not "historical", i.e., they do not accumulate data about an agent and, if destroyed, it can be quickly rebuilt from the "ground truth", which is the current state requested from such agents. This is design point made before my time as one of the keylime maintainers, but I must say that it is a point worth preserving. This being said, I would vote against this enhancement based due to the need for this change.

If changing that is not an option, what is your opinion on just saving the highest severity level and sending a event only if it has a higher severity? This would not require an extra table and cloud be more easily restored by requesting the state from the agent. Now the clients receiving the events can only make decisions on based on the severity level but I think in most cases this will be still a valuable improvement.

I think that it will be very useful to have some kind of API (either over REST API or Python module) such that external systems can get more information about all revocations events out of Keylime. Those systems can than implement the record keeping outside of Keylime and relay the necessary information to the relevant systems.
(We are already implementing a bridge between Keylime and our system for the tenant functionality, so we could move that out of Keylime and into our bridge.)

My short answer would be yes, but I believe your original proposal, the "provenance" of the component/stage where the failure happened is tremendously useful on itself. After several months using Keylime in Production, I became a firm believer in the need to "decouple" the revocation from the core of keylime (the current implementation is not particularly good, flexible or robust) and adopting a more configurable approach (e.g., execute a given command in case an agent fails, passing all the appropriate parameters) is the way to move forward.

I would propose that I keep the tracking where a event was generated and the user specified rules in this proposal but restructure the event sending part to just use the labels to demonstrate some aspects of the new gained capabilities of Keylime.

For the redesign of the revocation mechanism of Keylime we then open a new enhancement proposal which uses the added capabilities from this proposal as a base. (Here I would propose some kind of plugin API where the current revocation mechanism is just a plugin and can be swapped out for other mechanisms.)

I agree with @maugustosilva here in that we shouldn't have keylime keep track of events. Having an extra status field about the last failure, etc attached to the verifier table would be good. Having an extra table that just grows over time would not be. We do need to tackle the idea of an event log and how to store/query/search/maintain that long term, but that's better suited as APIs to existing external systems like adapters for elasticsearch or similar.

maugustosilva · 2021-06-18T16:52:41Z

46_revocation_severity_and_context.md

+`component.[sub_component].event`.
+
+* `component`: Name of the part of Keylime where the event was generated.
+  * Current would be: `quote_validation`, `pcr_validation`, `measured_boot`,


This is a really really really good idea, and IMO should be implemented first (probably with an extra column on the main agent table) in an self-contained manner.

Signed-off-by: Thore Sommer <mail@thson.de>

THS-on · 2021-06-23T11:16:58Z

@maugustosilva I've now split the tagging part for events and the severity levels for events into two distinct parts.

Signed-off-by: Thore Sommer <mail@thson.de>

Proposal for revocation severity levels and context

c330d06

Signed-off-by: Thore Sommer <mail@thson.de>

THS-on force-pushed the revocation_events branch from 0136dd5 to c330d06 Compare June 8, 2021 09:50

This was referenced Jun 8, 2021

Revocation severity levels and context #48

Open

Increase flexibilty of IMA entry parsing and validation keylime/keylime#687

Merged

maugustosilva self-requested a review June 17, 2021 20:10

mpeters requested changes Jun 18, 2021

View reviewed changes

fix wording in proposal 46

e35f69f

Signed-off-by: Thore Sommer <mail@thson.de>

maugustosilva reviewed Jun 18, 2021

View reviewed changes

THS-on added 2 commits June 19, 2021 13:40

Clarify how rules are added.

11f4ba7

Signed-off-by: Thore Sommer <mail@thson.de>

Use labels instead of numbers for severity levels

304f4c6

Signed-off-by: Thore Sommer <mail@thson.de>

THS-on requested a review from maugustosilva June 23, 2021 11:17

THS-on force-pushed the revocation_events branch from 5db1e83 to 03deadf Compare June 23, 2021 21:03

Restructure proposal to split the tagging part from the severity levels

ce8f508

Signed-off-by: Thore Sommer <mail@thson.de>

THS-on force-pushed the revocation_events branch from 03deadf to ce8f508 Compare June 23, 2021 21:08

mpeters approved these changes Jun 24, 2021

View reviewed changes

mpeters merged commit cd754e5 into keylime:master Jun 24, 2021

THS-on mentioned this pull request Feb 14, 2022

Enhanced Event Logging #62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for revocation severity levels and context #47

Proposal for revocation severity levels and context #47

THS-on commented Jun 8, 2021

mpeters Jun 18, 2021

THS-on Jun 18, 2021 •

edited

Loading

mpeters Jun 18, 2021

THS-on Jun 18, 2021

THS-on Jun 22, 2021

maugustosilva Jun 18, 2021

THS-on Jun 18, 2021

maugustosilva Jun 21, 2021

THS-on Jun 22, 2021

maugustosilva Jun 22, 2021

THS-on Jun 22, 2021

mpeters Jun 23, 2021

maugustosilva Jun 18, 2021

THS-on commented Jun 23, 2021

Proposal for revocation severity levels and context #47

Proposal for revocation severity levels and context #47

Conversation

THS-on commented Jun 8, 2021

Choose a reason for hiding this comment

THS-on Jun 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

THS-on commented Jun 23, 2021

THS-on Jun 18, 2021 •

edited

Loading