-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add remediation status to the rule_evaluation_status table #1094
Conversation
going to point out some places where I could use some feedback as comments. |
@@ -201,6 +201,8 @@ CREATE TABLE entity_policies ( | |||
|
|||
create type eval_status_types as enum ('success', 'failure', 'error', 'skipped', 'pending'); | |||
|
|||
create type remediation_status_types as enum ('success', 'failure', 'error', 'skipped', 'not_supported'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these states make sense? Success, failure and error are hopefully self-explanatory, not_supported might also be quite clear. Skipped is supposed to mean that the remediation wasn't run because the evaluation passed in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I thought the remediation skipped could mean one of the following:
- The rule is actually not applicable e.g. "skipped because the check was also skipped" or "this rule applies to repositories with golang codebases and this applies to
npm
". - The rule was marked by the user as something that should be skipped (future functionality).
Seeing "Skipped" for a rule that already passed is actually confusing for me. Perhaps we should have yet another state "no remediation needed" or simply ""
(since the user needs no visual queue since the rule is indeed passing).
I also wonder about the "not_supported" status. We only pass that via the noop
remediator. Not supported would tell me that a remediation was tried but it's of the wrong driver or the wrong provider. Maybe it should be not_available
instead? Indicating that there simply is not a remediation avaiable.
I also wonder if we need a pending state for remediations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of an empty string.
About the pending remediation, currently I don't see a use-case for this, a policy can be pending, but only if there are no rule status rows. As soon as they are, at least one remediation status other than pending will be available.
database/query/policy_status.sql
Outdated
last_updated = NOW() | ||
eval_details = $7, | ||
remediation_status = $8, | ||
remediation_details = COALESCE($9, rule_evaluation_status.remediation_details), -- don't overwrite details already set with NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, this might be a bug, I need to check if we overwrite an error back to nothing if the remediation first fails and then succeeds..sorry, I'll check this case tomorrow, this might not need to be coalesced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make me think that details shouldn't be nullable. If there are no details we simply should set it to an empty string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, so this is a remainder of how I refactored the code, initially I didn't have remediation_status
at all as I was trying to minimise the amount of new columns, but that proved confusing. I just never reverted the nullable details.
I agree, I'll switch them to non-nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
case errorStatus: | ||
return "❌ Error" | ||
case skippedStatus: | ||
return "✅ Skipped" // success mark because we didn't need to remediate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently skipped and success are very visually similar, I'll be glad to get some help on how we can distinguish them better? Suggestions on unicode character or colours welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't like the "getEvalStatusText
section. In reality, that status comes up for quite valid reasons:
- The rule is actually not applicable
- The rule was marked by the user as something that should be skipped (future functionality).
So the visual queue should be smooth and something that a user actually also skips.
Perhaps "⬜ Skipped" or "▫️ Skipped" is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about ⏭️ for skipping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For skipped remediation I currently use an empty string, I agree with @JAORMX that a skipped remediation on evaluation success should just not be reported. I'm fine changing the evaluation skipped icon to forward (currently I used the stop-button)
case errorStatus: | ||
return "❌ Error" | ||
case skippedStatus: | ||
return "✅ Skipped" // success mark because we didn't need to remediate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't like the "getEvalStatusText
section. In reality, that status comes up for quite valid reasons:
- The rule is actually not applicable
- The rule was marked by the user as something that should be skipped (future functionality).
So the visual queue should be smooth and something that a user actually also skips.
Perhaps "⬜ Skipped" or "▫️ Skipped" is better.
@@ -201,6 +201,8 @@ CREATE TABLE entity_policies ( | |||
|
|||
create type eval_status_types as enum ('success', 'failure', 'error', 'skipped', 'pending'); | |||
|
|||
create type remediation_status_types as enum ('success', 'failure', 'error', 'skipped', 'not_supported'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I thought the remediation skipped could mean one of the following:
- The rule is actually not applicable e.g. "skipped because the check was also skipped" or "this rule applies to repositories with golang codebases and this applies to
npm
". - The rule was marked by the user as something that should be skipped (future functionality).
Seeing "Skipped" for a rule that already passed is actually confusing for me. Perhaps we should have yet another state "no remediation needed" or simply ""
(since the user needs no visual queue since the rule is indeed passing).
I also wonder about the "not_supported" status. We only pass that via the noop
remediator. Not supported would tell me that a remediation was tried but it's of the wrong driver or the wrong provider. Maybe it should be not_available
instead? Indicating that there simply is not a remediation avaiable.
I also wonder if we need a pending state for remediations?
database/query/policy_status.sql
Outdated
last_updated = NOW() | ||
eval_details = $7, | ||
remediation_status = $8, | ||
remediation_details = COALESCE($9, rule_evaluation_status.remediation_details), -- don't overwrite details already set with NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make me think that details shouldn't be nullable. If there are no details we simply should set it to an empty string.
internal/engine/eval_status.go
Outdated
} else if err != nil { | ||
return db.RemediationStatusTypesError | ||
} | ||
return db.RemediationStatusTypesSuccess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: style wise it could look more readable in the following format:
switch err != nil {
case errors.Is(err, evalerrors.ErrRemediateFailed):
return db.RemediationStatusTypesFailure
case errors.Is(err, evalerrors.ErrRemediationSkipped):
return db.RemediationStatusTypesSkipped
case errors.Is(err, evalerrors.ErrRemediationNotSupported):
return db.RemediationStatusTypesNotSupported
default:
return db.RemediationStatusTypesError
}
}
return db.RemediationStatusTypesSuccess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion taken (with a modification because this way the success is actually unreachable which surprised me as well)
After the evaluation and potentail remediation, we store the remediation result into the database. There are three fields: - remediation_status - a summary - remediation_details - pretty much err.String() - remediation_last_updated - timestamp
To be able to send the remediation detail to the clients, we extend the RuleEvaluationStatus protobuf message with the status, details and timestamp.
Extends the Eval() function so that in addition to the remediation error, the evaluation error is also returned so that it can be stored in the database. New errors are also added to be able to distinguish between errors that must be reported in detail to the user and errors that don't have to be (remediation is not supported for this rule type and/or was not even attempted)
Extends the CLI tool to display the remediation status in a new column
@JAORMX thank you for the quick review, the current version just resolves a conflict in the protobuf files. There are no other changes. |
Adds the remediation status to the rule_evaluation_status table. Even though
the table might change and be split soon when we add support for the alerts, I
think it still makes sense to include the result in the meantime so we can
iterate faster and have a working code to refactor already.
Adds the same fields to the RuleEvaluationStatus protobuf message to be sent
to the clients and fills those fields as appropriate based on errors received
from the remediator.
I'm not happy with the unit test coverage, I would like to extend the test for
the handleEntityEvent method, but at the same time I'd prefer to do it after we
at least agree that this patch is going in the right direction.
Fixes: #1075