Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement the rest of alerts #1228

Merged
merged 15 commits into from
Oct 20, 2023
Merged

feat: implement the rest of alerts #1228

merged 15 commits into from
Oct 20, 2023

Conversation

rdimitrov
Copy link
Member

@rdimitrov rdimitrov commented Oct 18, 2023

The following PR includes:

  • The rest of the alerts implementation
  • Implementation of the security-advisory alert type
  • There are two separate templates trying to give a different message based on if remediations are enabled as well or not.
  • Updated all existing rule types to support alerts
  • Revisited the alert trigger conditions to take into account the remediation status and type
  • Revisited the actions evaluation and the related interfaces

Top of my mind topics that need feedback and/or are still draft:

  • Used default severity statuses for most rule types, do comment if some need to be updated.
  • I'm definitely unsure about the query for upserting to the alert details table (def no sql guru). The idea behind my change was to avoid overwriting the existing row if there's already a row with that id and we are coming with a skipped status. I've tried one approach but it was failing with no rows found when there was such a conflict.
  • Change the default behaviour for alerts to be always on by default.

Marking as WIP until I have tested this properly and addressed the pending issues.

Fixes: #1213
Fixes: #1214

Screenshots of the alert in action. It was closed automatically after the remediation PR was merged.
image

Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
@rdimitrov rdimitrov self-assigned this Oct 18, 2023
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
@rdimitrov
Copy link
Member Author

I've also made an excel sheet trying to visualise the different use cases I implemented for the conditions that trigger an alarm action along with its desired state - Scenarios for Alert execution

Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code looks pretty good. The only two things were that naming of the GetType/GetParentType were a bit confusing and I think we should consider exposing an interface over exposing the evalParams struct to the engine..

return alert.actionType
}

// SubType returns the action subtype of the remediation engine
func (_ *Alert) SubType() string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions: 1. since we're returning the module constant, why not just use the module constant in the code? 2. Why is the method named SubType? (we have ParentType and SubType..but no Type?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree I didn't liked the naming that much too. I'll think of something that makes more sense 😃

The reason for parent and sub-type(should have been child I guess?) is the following. In the actions engine we have a list of actions and their type can be remediate or alerts. Then all of them can have types on their own - different variations like rest, pull_request, advisory and so on, so thus why parent and sub-type.
The idea of having a sub-type method in the actions interface is to be able to act on it, i.e. do not alert if action of type remediation and sub-type pull_request was successful and it seemed better to have that as part of the actions interface.

// Type returns the action type of the security-advisory engine
func (alert *Alert) Type() interfaces.ActionType {
// ParentType returns the action type of the security-advisory engine
func (alert *Alert) ParentType() interfaces.ActionType {
return alert.actionType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the methods be called something like ActionType and AlertType/AlertEngineType?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I addressed this in the comment above 👍

@@ -242,10 +240,6 @@ func (r *RuleTypeEngine) Eval(ctx context.Context, ent protoreflect.ProtoMessage
result, err := r.rdi.Ingest(ctx, ent, ruleParams)
if err != nil {
evalParams.EvalErr = fmt.Errorf("error ingesting data: %w", err)
evalParams.ActionsErr = evalerrors.ActionsError{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block was moved to createEvalStatusParams right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or getDefaultResult ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, getDefaultResult. The idea is that the actionsErr value should come only from the DoActions() method by the Actions engine.

// any existing meta entry since we don't upsert in case of conflict while skipping
m, err := json.Marshal(&map[string]any{})
if err != nil {
logger.Error().Err(err).Msg("error marshaling empty json.RawMessage")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do we store on error? Should we explicitly set m to an empty message? (Or are we relying on the db default?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering that since it's not a variable that we try to marshal it is safe to assume that it shouldn't fail unless there's a problem with the json.Marshal. As for the default value, I guess I had to make the alert tables column accepting nil values so I don't have to ensure this.

@@ -258,12 +252,6 @@ func (r *RuleTypeEngine) Actions(
ruleDef, ruleParams map[string]any,
evalParams *EvalStatusParams,
) {
// Skip actions in case ingesting failed during evaluation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what handles this case now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be handled by the actions engine now.

And it was actually not correct, mainly because of the way evaluations work now - the end status of an evaluation can come from both the ingest part and the eval part.

This is a bit implicit as it's hidden behind errors, some generic, some custom. For example, depending on the ingest type, ingest can err out due to a failed ingesting (which makes sense for actions skipping), but also with ErrEvaluationFailed which has to be processed by the actions.

I was thinking to address that by having ingesting do ingestions only and the eval part be the one that sets the custom Error type but it's out of the PR scope and it implies updating big portions of the code.

This change makes evaluation more explicit and also sets clear boundaries - eval is for eval and shouldn't decide if the action engine should be skipped. With this we channel every rule evaluation result (either coming from ingest or eval) to the actions and let the actions engine decide if what needs to be skipped - whether it's a specific action or even all actions.

metadata = sqlc.arg(metadata)::jsonb,
status = CASE WHEN $2 != 'skipped' THEN $2 ELSE rule_details_alert.status END,
details = CASE WHEN $2 != 'skipped' THEN $3 ELSE rule_details_alert.details END,
metadata = CASE WHEN $2 != 'skipped' THEN sqlc.arg(metadata)::jsonb ELSE rule_details_alert.metadata END,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the SQL part that you asked for some extra eyes on? It looks good to me (tests would be nice)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is this one, yes 👍 I haven't seen any issue so far, but it's good to hear that from someone else too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me at least.

@@ -18,7 +18,6 @@ import (
"context"
"errors"
"fmt"
engif "github.com/stacklok/mediator/internal/engine/interfaces"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -175,7 +175,7 @@ func (_ *Alert) SubType() string {

// GetOnOffState returns the alert action state read from the profile
func (_ *Alert) GetOnOffState(p *pb.Profile) interfaces.ActionOpt {
return interfaces.ActionOptFromString(p.Alert)
return interfaces.ActionOptFromString(p.Alert, interfaces.ActionOptOn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great defaults.

logger := zerolog.Ctx(ctx)
params := &paramsSA{}
_ = evalParams
switch entity := entity.(type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely not for this PR, but this sounds like something that should be extracted into an interface so you wouldn't have to code the typeswich here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added that as an item in our profile design improvement document 👍

}
params.Summary = summaryStr.String()

var descriptionStr strings.Builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice for some reason I never realised that strings.Builder implements raw io.Writer and wasn't using it for template expansion even though I could..

@rdimitrov rdimitrov changed the title WIP: feat: implement the rest of alerts feat: implement the rest of alerts Oct 19, 2023
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for renaming the interface. As long as you plan to address exposing the struct to the actions in a subsequent PR, I'm happy with this version.

Also, great work overall getting this feature from design, to database changes to implementation. It's really nice to see it all come together.

@rdimitrov rdimitrov merged commit ab74356 into mindersec:main Oct 20, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants