Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remediation status to the rule_evaluation_status table #1094

Merged
merged 4 commits into from
Oct 4, 2023

Conversation

jhrozek
Copy link
Contributor

@jhrozek jhrozek commented Oct 3, 2023

Adds the remediation status to the rule_evaluation_status table. Even though
the table might change and be split soon when we add support for the alerts, I
think it still makes sense to include the result in the meantime so we can
iterate faster and have a working code to refactor already.

Adds the same fields to the RuleEvaluationStatus protobuf message to be sent
to the clients and fills those fields as appropriate based on errors received
from the remediator.

I'm not happy with the unit test coverage, I would like to extend the test for
the handleEntityEvent method, but at the same time I'd prefer to do it after we
at least agree that this patch is going in the right direction.

Fixes: #1075

@jhrozek
Copy link
Contributor Author

jhrozek commented Oct 3, 2023

going to point out some places where I could use some feedback as comments.

@@ -201,6 +201,8 @@ CREATE TABLE entity_policies (

create type eval_status_types as enum ('success', 'failure', 'error', 'skipped', 'pending');

create type remediation_status_types as enum ('success', 'failure', 'error', 'skipped', 'not_supported');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these states make sense? Success, failure and error are hopefully self-explanatory, not_supported might also be quite clear. Skipped is supposed to mean that the remediation wasn't run because the evaluation passed in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I thought the remediation skipped could mean one of the following:

  • The rule is actually not applicable e.g. "skipped because the check was also skipped" or "this rule applies to repositories with golang codebases and this applies to npm".
  • The rule was marked by the user as something that should be skipped (future functionality).

Seeing "Skipped" for a rule that already passed is actually confusing for me. Perhaps we should have yet another state "no remediation needed" or simply "" (since the user needs no visual queue since the rule is indeed passing).

I also wonder about the "not_supported" status. We only pass that via the noop remediator. Not supported would tell me that a remediation was tried but it's of the wrong driver or the wrong provider. Maybe it should be not_available instead? Indicating that there simply is not a remediation avaiable.

I also wonder if we need a pending state for remediations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of an empty string.
About the pending remediation, currently I don't see a use-case for this, a policy can be pending, but only if there are no rule status rows. As soon as they are, at least one remediation status other than pending will be available.

last_updated = NOW()
eval_details = $7,
remediation_status = $8,
remediation_details = COALESCE($9, rule_evaluation_status.remediation_details), -- don't overwrite details already set with NULL
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, this might be a bug, I need to check if we overwrite an error back to nothing if the remediation first fails and then succeeds..sorry, I'll check this case tomorrow, this might not need to be coalesced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make me think that details shouldn't be nullable. If there are no details we simply should set it to an empty string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, so this is a remainder of how I refactored the code, initially I didn't have remediation_status at all as I was trying to minimise the amount of new columns, but that proved confusing. I just never reverted the nullable details.

I agree, I'll switch them to non-nullable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

case errorStatus:
return "❌ Error"
case skippedStatus:
return "✅ Skipped" // success mark because we didn't need to remediate
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently skipped and success are very visually similar, I'll be glad to get some help on how we can distinguish them better? Suggestions on unicode character or colours welcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't like the "⚠️ Skipped" that I used above in the getEvalStatusText section. In reality, that status comes up for quite valid reasons:

  • The rule is actually not applicable
  • The rule was marked by the user as something that should be skipped (future functionality).

So the visual queue should be smooth and something that a user actually also skips.

Perhaps "⬜ Skipped" or "▫️ Skipped" is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about ⏭️ for skipping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For skipped remediation I currently use an empty string, I agree with @JAORMX that a skipped remediation on evaluation success should just not be reported. I'm fine changing the evaluation skipped icon to forward (currently I used the stop-button)

case errorStatus:
return "❌ Error"
case skippedStatus:
return "✅ Skipped" // success mark because we didn't need to remediate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't like the "⚠️ Skipped" that I used above in the getEvalStatusText section. In reality, that status comes up for quite valid reasons:

  • The rule is actually not applicable
  • The rule was marked by the user as something that should be skipped (future functionality).

So the visual queue should be smooth and something that a user actually also skips.

Perhaps "⬜ Skipped" or "▫️ Skipped" is better.

@@ -201,6 +201,8 @@ CREATE TABLE entity_policies (

create type eval_status_types as enum ('success', 'failure', 'error', 'skipped', 'pending');

create type remediation_status_types as enum ('success', 'failure', 'error', 'skipped', 'not_supported');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I thought the remediation skipped could mean one of the following:

  • The rule is actually not applicable e.g. "skipped because the check was also skipped" or "this rule applies to repositories with golang codebases and this applies to npm".
  • The rule was marked by the user as something that should be skipped (future functionality).

Seeing "Skipped" for a rule that already passed is actually confusing for me. Perhaps we should have yet another state "no remediation needed" or simply "" (since the user needs no visual queue since the rule is indeed passing).

I also wonder about the "not_supported" status. We only pass that via the noop remediator. Not supported would tell me that a remediation was tried but it's of the wrong driver or the wrong provider. Maybe it should be not_available instead? Indicating that there simply is not a remediation avaiable.

I also wonder if we need a pending state for remediations?

last_updated = NOW()
eval_details = $7,
remediation_status = $8,
remediation_details = COALESCE($9, rule_evaluation_status.remediation_details), -- don't overwrite details already set with NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make me think that details shouldn't be nullable. If there are no details we simply should set it to an empty string.

database/query/policy_status.sql Show resolved Hide resolved
internal/controlplane/handlers_policy.go Show resolved Hide resolved
} else if err != nil {
return db.RemediationStatusTypesError
}
return db.RemediationStatusTypesSuccess
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: style wise it could look more readable in the following format:

    switch err != nil {
        case errors.Is(err, evalerrors.ErrRemediateFailed):
            return db.RemediationStatusTypesFailure
	case errors.Is(err, evalerrors.ErrRemediationSkipped):
            return db.RemediationStatusTypesSkipped
	case errors.Is(err, evalerrors.ErrRemediationNotSupported):
            return db.RemediationStatusTypesNotSupported
	default:
            return db.RemediationStatusTypesError
	}
    }

    return db.RemediationStatusTypesSuccess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion taken (with a modification because this way the success is actually unreachable which surprised me as well)

JAORMX
JAORMX previously approved these changes Oct 4, 2023
After the evaluation and potentail remediation, we store the remediation
result into the database. There are three fields:
    - remediation_status - a summary
    - remediation_details - pretty much err.String()
    - remediation_last_updated - timestamp
To be able to send the remediation detail to the clients, we extend the
RuleEvaluationStatus protobuf message with the status, details and
timestamp.
Extends the Eval() function so that in addition to the remediation
error, the evaluation error is also returned so that it can be stored in
the database.

New errors are also added to be able to distinguish between errors that
must be reported in detail to the user and errors that don't have to be
(remediation is not supported for this rule type and/or was not even
attempted)
Extends the CLI tool to display the remediation status in a new column
@jhrozek
Copy link
Contributor Author

jhrozek commented Oct 4, 2023

@JAORMX thank you for the quick review, the current version just resolves a conflict in the protobuf files. There are no other changes.

@jhrozek jhrozek merged commit f3249bf into mindersec:main Oct 4, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add remediation result status and timestamp to the rule_evaluation_status table
3 participants