-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement an actions engine #1192
Conversation
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the engine changes are quite clean, left a bunch of comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes make sense to me and I like this direction.
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
I like the code. I'm sure we will do some additional refactoring once we start adding the actions, but the most important thing is that the engine code is quite clean and overall the design is extensible. |
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know when this is mo longer WIP
It should be okay, but I'd like to try test this throughout the weekend and have it merged either tomorrow or Monday morning. I'll remove the WIP and request feedback once it's ready. With the latest changes, I've updated the approach so now:
Anyway, do share if you have something you want me to address in the meantime so we don't lose time on that 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not approving only because the PR is marked as a WIP, but I think the code is quite clean and the existing REST remediations still seem to work.
I'll forward-port my existing PRs atop yours so that the rebasing is easier. Just let me know when your PR is non-WIP.
internal/db/store.go
Outdated
// Store provides all functions to execute db queries and transactions | ||
type Store interface { | ||
Querier | ||
ExtendQuerier | ||
CheckHealth() error | ||
BeginTransaction() (*sql.Tx, error) | ||
GetQuerierWithTransaction(tx *sql.Tx) Querier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't then GetQuerierWithTransation
also return ExtendQuerier
? I understand the motivation for using ExtendQuerier
but I wonder if we could make Store
behave consistently and in the end make Querier
into something that's only used inside internal/db
..
(if you agree, please file a card and do this in a subsequent patch)
remEngine.Type(): remEngine, | ||
alertEngine.Type(): alertEngine, | ||
}, | ||
actionsOnOff: map[engif.ActionType]engif.ActionOpt{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we need this since we already get get remEngine
through the actions
key? Or does setting actionsOnOff
here separately take advantage of passing in the profile?
(not a nack, I'm trying to understand the code better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously it was with a separate method called from a private struct field of the rule type engine and I stumbled upon that being a problem when I applied these changes to the medev tool. That made me rethink this and so I decided that since the on/off states are an integral part of the actions engine, it makes more sense to expect that at the point of creating the engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, that's enough of an explanation. If you do end up amending the PR just add a comment, otherwise let's leave that as it is
|
||
// shouldRemediate returns the action command for remediation taking into account previous evaluations | ||
func shouldRemediate(prevEvalFromDb db.ListRuleEvaluationsByProfileIdRow, evalErr error) engif.ActionCmd { | ||
_ = prevEvalFromDb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we passing the parameters here but not using them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided to leave this in case we want to add conditions to processing a remediation, i.e. maybe skip remediation if the remediate type is PR, the evalErr is failing and we already have the last remediate status set to "success" (meaning we created a PR remediation, it's just not merged yet). That might not be the best example, but something in that sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, this should be well thought through sо perhaps it is not a goal for now. I guess it's better to remove it for now and we can always add something like that later if needed 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's leave it for merging this PR and revisit during the week. I wouldn't like to block the PR any longer. If you do plan on updating this PR, just add a comment. The extra method doesn't hurt.
// shouldAlert returns the action command for alerting taking into account previous evaluations | ||
func shouldAlert(prevEvalFromDb db.ListRuleEvaluationsByProfileIdRow, evalErr error, remErr error) engif.ActionCmd { | ||
// Start simple without taking into account the remediation status | ||
_ = remErr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this for future extensibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my initial thought in case I think of some clever way for using it without overcomplicating this part. I think it's right to remove it for now since I've decided to go with the simplest approach for now. I gave a thought of creating a model where all statuses are taken into account but it got messy very quickly so I'll leave that to be a future task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whatever you prefer, I was really just trying to understand the code I'm reviewing better.
// Case 1 - Evaluation has PASSED, but the Alert was ON. | ||
if db.EvalStatusTypesSuccess == newEval && db.AlertStatusTypesOn == prevAlert { | ||
// We should turn it OFF. | ||
return engif.ActionCmdOff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this context of alerts, does Off mean "delete the alert" ? I'm confused between Off and DoNothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or "turn off for this particular run" ? Maybe extend the comments for the type and its values..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided to introduce a command that is passed to each action that explicitly states the expected behaviour and respectively the result of this action run. In this case I mean that I want this action to turn off the alert which if handled successfully by the action means that the alert status that will be stored in the database at the end of this evaluation cycle is going to be off too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks for the explanation
errors.Is(evalErr, enginerr.ErrEvaluationSkipSilently) || | ||
// TODO: (radoslav) Discuss this with Jakub and decide if we want to skip remediation too | ||
// rule evaluation had no error, skip action if actionType IS NOT alert | ||
(evalErr == nil && actionType != alert.ActionType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. I can't think of a reason to remediate if the evaluation passed (the remediations can be quite API-call intensive), moreover calling the remediation all the time would render the remediation timestamps useless (we want to record how often we call remediations and when was the last time we called one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of something like - the evaluation passed, see if you have a PR opened for that (if rem type is PR for example) and if so, close it automatically since it's no longer relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, nice and then it would be the remediate engine closing the PR. This is not implemented in the PR remediator's current instance, but it's really a nice idea. Currently I'm thinking that the only place that can decide this is the remediator itself - e.g. REST remediations probably shouldn't be called over and over with passing eval[1] but PRs might in order to close them.
Maybe the remediator should have a two methods, one for non-nil and one for nil eval?
Would you like to file an issue to explore this further or should I? I wouldn't like to forget this detail, but I think it's out of scope for this PR and the initial pass on the PR remediator as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened an issue - #1201 and described an untested idea of how this might be implemented. It's just thoughts based on the alert use cases but it should work for this too I think 👍
pbuild *providers.ProviderBuilder, | ||
) (*Remediator, error) { | ||
if actionType == "" { | ||
return nil, fmt.Errorf("action type cannot be empty") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this explicitly test if the action type is different than remediate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought behind it was if a sub-action might be reused at some point by another action. For example the alert action might reuse the rest type for creating a rest call alerting a webhook somewhere?
@rdimitrov I forward-ported the branch protection remediations atop your branch and the only issue I found is https://github.com/jhrozek/mediator/commit/e9a8c35f44dc5ab966333c2960eabc7adf4fc389 on policy initialisation . Feel free to pull it into your branch if you agree with the fix. |
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
…ow if no rows are found
Ship it! |
The following PR includes the following changes:
EvalStatusParams
object that is being used to carry whatever is necessary for rule evaluations and for processing actionsalert
andremediate
where each action can have implementation specific child-type actions. By child-type action I mean - alerts of typesecurity-advisory
orslack
, remediations of typerest
orpull-request
The following PR does not include the following changes:
Fixes: #1182
Fixes: #1191
Note: I'll remove the WIP once I update the tests and also have tested it hands-on properly.