Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement an actions engine #1192

Merged
merged 11 commits into from
Oct 15, 2023
Merged

feat: implement an actions engine #1192

merged 11 commits into from
Oct 15, 2023

Conversation

rdimitrov
Copy link
Member

The following PR includes the following changes:

  • Introduces a new actions engine that acts as a wrapper for actions such as remediation and alerts. All of this is pluggable so it should be fairly easy to implement additional types of both parent and child action types which act based on given rule evaluation status.
  • Introduces the initial implementation of an alert engine.
  • Updated the current entity handler to use an EvalStatusParams object that is being used to carry whatever is necessary for rule evaluations and for processing actions
  • So far we have the following action types - alert and remediate where each action can have implementation specific child-type actions. By child-type action I mean - alerts of type security-advisory or slack, remediations of type rest or pull-request

The following PR does not include the following changes:

  • Implementing the actual handling for alert processing - triggering, closing. This will be a separate PR.

Fixes: #1182
Fixes: #1191

Note: I'll remove the WIP once I update the tests and also have tested it hands-on properly.

Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
@rdimitrov rdimitrov changed the title feat: implement an actions engine WIP: feat: implement an actions engine Oct 13, 2023
@rdimitrov rdimitrov self-assigned this Oct 13, 2023
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the engine changes are quite clean, left a bunch of comments

internal/engine/actions/remediate/rest/rest_test.go Outdated Show resolved Hide resolved
internal/engine/eval_status.go Outdated Show resolved Hide resolved
internal/engine/eval_status.go Show resolved Hide resolved
internal/engine/interfaces/interface.go Outdated Show resolved Hide resolved
internal/engine/interfaces/interface.go Outdated Show resolved Hide resolved
Copy link
Contributor

@JAORMX JAORMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes make sense to me and I like this direction.

Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
@jhrozek
Copy link
Contributor

jhrozek commented Oct 13, 2023

I like the code. I'm sure we will do some additional refactoring once we start adding the actions, but the most important thing is that the engine code is quite clean and overall the design is extensible.

Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Copy link
Contributor

@JAORMX JAORMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know when this is mo longer WIP

@rdimitrov
Copy link
Member Author

Let me know when this is mo longer WIP

It should be okay, but I'd like to try test this throughout the weekend and have it merged either tomorrow or Monday morning. I'll remove the WIP and request feedback once it's ready.

With the latest changes, I've updated the approach so now:

  • Actions are only responsible for performing an action with a desired outcome (can be omitted depending on the action implementation, of course). They shouldn't have any notion of priority or order of their execution in relation to other actions.
  • Actions Engine is the one that is responsible for handling the proper order of evaluating actions. It is the only one that has notion of all actions and their types and it's its responsibility to prioritise that.

Anyway, do share if you have something you want me to address in the meantime so we don't lose time on that 👍

Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not approving only because the PR is marked as a WIP, but I think the code is quite clean and the existing REST remediations still seem to work.
I'll forward-port my existing PRs atop yours so that the rebasing is easier. Just let me know when your PR is non-WIP.

internal/engine/actions/actions.go Outdated Show resolved Hide resolved
// Store provides all functions to execute db queries and transactions
type Store interface {
Querier
ExtendQuerier
CheckHealth() error
BeginTransaction() (*sql.Tx, error)
GetQuerierWithTransaction(tx *sql.Tx) Querier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't then GetQuerierWithTransation also return ExtendQuerier? I understand the motivation for using ExtendQuerier but I wonder if we could make Store behave consistently and in the end make Querier into something that's only used inside internal/db..
(if you agree, please file a card and do this in a subsequent patch)

remEngine.Type(): remEngine,
alertEngine.Type(): alertEngine,
},
actionsOnOff: map[engif.ActionType]engif.ActionOpt{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need this since we already get get remEngine through the actions key? Or does setting actionsOnOff here separately take advantage of passing in the profile?
(not a nack, I'm trying to understand the code better)

Copy link
Member Author

@rdimitrov rdimitrov Oct 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously it was with a separate method called from a private struct field of the rule type engine and I stumbled upon that being a problem when I applied these changes to the medev tool. That made me rethink this and so I decided that since the on/off states are an integral part of the actions engine, it makes more sense to expect that at the point of creating the engine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, that's enough of an explanation. If you do end up amending the PR just add a comment, otherwise let's leave that as it is


// shouldRemediate returns the action command for remediation taking into account previous evaluations
func shouldRemediate(prevEvalFromDb db.ListRuleEvaluationsByProfileIdRow, evalErr error) engif.ActionCmd {
_ = prevEvalFromDb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we passing the parameters here but not using them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to leave this in case we want to add conditions to processing a remediation, i.e. maybe skip remediation if the remediate type is PR, the evalErr is failing and we already have the last remediate status set to "success" (meaning we created a PR remediation, it's just not merged yet). That might not be the best example, but something in that sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, this should be well thought through sо perhaps it is not a goal for now. I guess it's better to remove it for now and we can always add something like that later if needed 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's leave it for merging this PR and revisit during the week. I wouldn't like to block the PR any longer. If you do plan on updating this PR, just add a comment. The extra method doesn't hurt.

// shouldAlert returns the action command for alerting taking into account previous evaluations
func shouldAlert(prevEvalFromDb db.ListRuleEvaluationsByProfileIdRow, evalErr error, remErr error) engif.ActionCmd {
// Start simple without taking into account the remediation status
_ = remErr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for future extensibility?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial thought in case I think of some clever way for using it without overcomplicating this part. I think it's right to remove it for now since I've decided to go with the simplest approach for now. I gave a thought of creating a model where all statuses are taken into account but it got messy very quickly so I'll leave that to be a future task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whatever you prefer, I was really just trying to understand the code I'm reviewing better.

// Case 1 - Evaluation has PASSED, but the Alert was ON.
if db.EvalStatusTypesSuccess == newEval && db.AlertStatusTypesOn == prevAlert {
// We should turn it OFF.
return engif.ActionCmdOff
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context of alerts, does Off mean "delete the alert" ? I'm confused between Off and DoNothing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or "turn off for this particular run" ? Maybe extend the comments for the type and its values..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to introduce a command that is passed to each action that explicitly states the expected behaviour and respectively the result of this action run. In this case I mean that I want this action to turn off the alert which if handled successfully by the action means that the alert status that will be stored in the database at the end of this evaluation cycle is going to be off too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for the explanation

internal/engine/actions/actions.go Outdated Show resolved Hide resolved
errors.Is(evalErr, enginerr.ErrEvaluationSkipSilently) ||
// TODO: (radoslav) Discuss this with Jakub and decide if we want to skip remediation too
// rule evaluation had no error, skip action if actionType IS NOT alert
(evalErr == nil && actionType != alert.ActionType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. I can't think of a reason to remediate if the evaluation passed (the remediations can be quite API-call intensive), moreover calling the remediation all the time would render the remediation timestamps useless (we want to record how often we call remediations and when was the last time we called one)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of something like - the evaluation passed, see if you have a PR opened for that (if rem type is PR for example) and if so, close it automatically since it's no longer relevant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, nice and then it would be the remediate engine closing the PR. This is not implemented in the PR remediator's current instance, but it's really a nice idea. Currently I'm thinking that the only place that can decide this is the remediator itself - e.g. REST remediations probably shouldn't be called over and over with passing eval[1] but PRs might in order to close them.

Maybe the remediator should have a two methods, one for non-nil and one for nil eval?

Would you like to file an issue to explore this further or should I? I wouldn't like to forget this detail, but I think it's out of scope for this PR and the initial pass on the PR remediator as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue - #1201 and described an untested idea of how this might be implemented. It's just thoughts based on the alert use cases but it should work for this too I think 👍

internal/engine/eval_status.go Outdated Show resolved Hide resolved
internal/engine/executor.go Outdated Show resolved Hide resolved
pbuild *providers.ProviderBuilder,
) (*Remediator, error) {
if actionType == "" {
return nil, fmt.Errorf("action type cannot be empty")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this explicitly test if the action type is different than remediate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought behind it was if a sub-action might be reused at some point by another action. For example the alert action might reuse the rest type for creating a rest call alerting a webhook somewhere?

@jhrozek
Copy link
Contributor

jhrozek commented Oct 14, 2023

@rdimitrov I forward-ported the branch protection remediations atop your branch and the only issue I found is https://github.com/jhrozek/mediator/commit/e9a8c35f44dc5ab966333c2960eabc7adf4fc389 on policy initialisation . Feel free to pull it into your branch if you agree with the fix.

rdimitrov and others added 2 commits October 15, 2023 01:05
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
@rdimitrov rdimitrov changed the title WIP: feat: implement an actions engine feat: implement an actions engine Oct 14, 2023
@jhrozek
Copy link
Contributor

jhrozek commented Oct 15, 2023

Ship it!

@rdimitrov rdimitrov merged commit 9f3b3eb into mindersec:main Oct 15, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants