-
Notifications
You must be signed in to change notification settings - Fork 168
Add support for hook priorities (ordering) #695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
…optimizations Signed-off-by: Jop Zitman <jop.zitman@secura.com>
Signed-off-by: Jop Zitman <jop.zitman@secura.com> pq usage optimization (less append calls) Signed-off-by: Jop Zitman <jop.zitman@secura.com>
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
operator/config/crd/bases/execution.securecodebox.io_scancompletionhooks.yaml
Outdated
Show resolved
Hide resolved
operator/crds/execution.securecodebox.io_scancompletionhooks.yaml
Outdated
Show resolved
Hide resolved
Thanks for the PR 🚀 I'm currently wondering if there is a way to simplify this behavior, as having the Prio applied to both I'd propose to split the The prio would then only be used for the |
Awesome feedback @J12934! This was the discussion I was looking for 😀
I completely agree that this PR may potentially add unnecessary complexity to the flow of a secureCodeBox scan. I am however, wondering which specific feature from this PR you find adds too much complexity. I'm doubting the flexibility of your proposed solution a bit.
Adding yet another field (e.g. I'm unsure where the complexity you are referring to actually arises. With my initially proposed implementation, the flow is like this:
For the code, it would be optimal if we disregarded the requirement that Resulting in the following flow:
The difference between |
Yup was thinking about
Would you then start all ReadOnly Hooks the first ReadAndWrite hook of the same prio at the same time? The ReadOnly hooks would then potentially get different finding when the ReadAndWrite hook was faster then the finding download of the ReadOnly hook. (I know this is kind of a edge case in this case as it would only happen when somebody manually configures a Readonly and ReadAndWrite hook to have the same prio, was just curios if I'm understanding this correctly) Might still be worth to keep the ReadOnly after ReadAndWrite hook ordering to avoid these race conditions, I'd understand the flow to happen like the following: Basically the same as it is today just running in multiple "stages" (one "stage" per prio configured, first stage would be all hooks with the highest prio number) |
Interesting case 🤔 . That would indeed have been undefined behavior in my last proposal.
I think that would cover most edges and be as close to what we have today while still really flexible. Sounds good to me! Shall I update this PR with your latest proposal? |
That would be awesome 👍 |
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
@J12934 could you give this a review? |
Will try to review this tomorrow. Took a short look at the docs or already, the diagram is really nice 👍 |
Hi, The code is already working great 👍 What i think would be a better data structure would be to a "list of list". The length of the outer list would be equal to the number of "colums" in the execution diagram of the hooks (in this example 5), the inner lists then contain the actual hooks which can always be executed in parralel (RW hooks are always the only entry in their lists). This nested list would then be generated once and attached to the scan status.
For the diagram this would generate the following list (nested lists in yaml kinda hard too read 😬) apiVersion: execution.securecodebox.io/v1
kind: Scan
metadata:
name: nmap-localhost
namespace: default
spec:
parameters:
- localhost
scanType: nmap
status:
orderedHookStatuses:
- - hookName: rw-0
priority: 2
state: Pending
type: ReadAndWrite
- - hookName: rw-1
priority: 2
state: Pending
type: ReadAndWrite
- - hookName: ro-0
priority: 2
state: Pending
type: ReadOnly
- hookName: ro-1
priority: 2
state: Pending
type: ReadOnly
- - hookName: ro-2
priority: 1
state: Pending
type: ReadOnly
- hookName: ro-3
priority: 1
state: Pending
type: ReadOnly
- - hookName: rw-2
priority: 0
state: Pending
type: ReadAndWrite I've wanted to try out how hard this nested list is to generate and went ahead in fully implementing it, pushed it to a experimental fork: https://github.com/EndPositive/secureCodeBox/compare/hook-priorities...secureCodeBox:experiment/hook-prio-refactor With this change the reconciler now only has to get the active group of hooks and work on them. This also achieves the goal the the only reason the reconciler has to differentiate between RO & RW hooks is to pass in the update arguments :) Let me know what you think 🙏 |
It already has side-effects (i.e. cluster updates) so it's more consistent to update in-place. Signed-off-by: Jop Zitman <jop.zitman@secura.com>
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
Signed-off-by: Jop Zitman <jop.zitman@secura.com> # Conflicts: # operator/crds/execution.securecodebox.io_scans.yaml # operator/go.mod
Hi @J12934 👋,
That was a very reasonable concern. It also doesn't help that golang's heap implementation is not that mature. It's a bummer that most of the original code won't be used, but it's better like this 💪. I really like the new solution you came up with. It seems very simple and nicely splits up the controller from logic. I also really like the tests framework, I didn't know you had already used this in another recent MR. I've merged your experimental branch into this MR and added some changes. One is related to pointers and the other to logging. I've also now merged main back into here. Take a quick look and I think its ready? I have more awesome ideas for priority use cases 😏. Note: DCO signoff is missing from your 'experimental' commits. |
Awesome, triggered the CI. PR should be close to be "mergable". What I'm not thinking about if this is considered a breaking change as the chage to the status fields would cause the upgraded operator from properly executing any scans which were in progress during the operator upgrade. Could we make some easy tweks to the operator to make that not the case? |
Please pause merging this. I'm not convinced yet using another data structure for priorities is necessary and without scalability downsides. |
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
Can you elaborate what the scalability issues on the approach would be? I don't think it is possible to implement this without some sort of hook status field on the scan. Without it the operator would have look at the state of the cluster, in this case which hook jobs where already completed to infer which hooks have been completed and which are still pending. This would work for some cases but would break if somebody configures their hooks jobs to have a |
My scalability concern was: if you have thousands or more of priority classes, we need an efficient way to process them and the proper, understandable abstraction for such a problem. Whereas you can mimic this with the pre-sorted list and some specific logic to find the top of the list, that would be less standard. I just discussed this with @EndPositive, and it seems our misunderstanding stems from the initial implementation approach with the priority queue. Instead of a priority queue of hooks, a priority queue of classes would've been more suitable. The class struct would contain an ordered list of RW hooks and a list of RO hooks and have a priority (e.g., field). This would be clean, the logic for handling the classes will and should be outside the priority queue implementation. However, it seems I overestimated the number of priority classes we foresee. At least, your diagram and the current applications of SCB seem more geared towards a handful of hooks. Please go ahead. |
…ile upgrading Signed-off-by: Jop Zitman <jop.zitman@secura.com>
@J12934 in 05c3464 I tried implementing a migration mechanism for the old fields. I think it covers all cases except where a scan was executing |
Hi again 👋 Finally found the time to test this PR with the new migration code. Tested it by starting off with 3.3.1 operator and crds and installing 50 update field hook which each add a new attribute to the finding: for value in {1..50}
do
helm upgrade --install "ufh-$value" secureCodeBox/update-field-hook --set attribute.name="attributes.foo-$value" --set attribute.value="$value"
done Then I upgraded the crds and operator while the hooks were executed. Upgrade went seamlessly and all hooks were executed properly. (Confirmed and checked that the findings had all 50 new attributes set) With that I think the PR is ready to be merged 🚀 Post Merge we should probably add priority helm values for all hook helm charts, so that you can easily upgrade / configure the prio. |
No objections from my side (but I haven't reviewed or tested in-depth, relying on your judgement here :) ). |
Awesome @J12934, really nice way to test. No objections as far as the operator code is concerned. However, I know that some hooks (e.g. DD) implement their own types for our CRD's. We should probably test/verify those before making a release. |
Just ran into an issue where I wanted to delete a Scan created with pre ordering. I'll take a look.
|
Unfortunately can't set the field to `nil` as Kubernetes complains about missing fields priority and type (on non existing elements...) Signed-off-by: Jop Zitman <jop.zitman@secura.com>
@J12934, just pushed a fix for Scan objects which were marked |
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
ab00c6c added hook prio field in values.yaml |
Hi @EndPositive. I am ready to merge this, but after merging your other hook PR, this is giving some conflicts. All the values.yaml changes are easy to resolve, but the conflicts in the hook_reconciler are complex enough that I would prefer if you could quickly resolve them, so that I don't introduce any new bugs in the process. Can you quickly resolve them? |
@malexmave done. I think you should run integration test & helm-docs again. |
Signed-off-by: Jop Zitman <jop.zitman@secura.com> # Conflicts: # hooks/cascading-scans/values.yaml # hooks/finding-post-processing/values.yaml # hooks/generic-webhook/values.yaml # hooks/notification/values.yaml # hooks/persistence-defectdojo/values.yaml # hooks/persistence-elastic/values.yaml # hooks/update-field/values.yaml # operator/controllers/execution/scans/hook_reconciler.go
Description
This PR, if applied, schedules hooks based on their priority. Hooks with higher priority are ran before hooks with a lower priority. Hooks with equal priority are scheduled according to the previously defined orderings (i.e. ReadAndWrite hooks first in serial, ReadOnly hooks after in parallel). All hooks have a default priority of
0
.If current revision is merged, there will be no changes in scheduling behavior yet.
A Documentation PR has also been made secureCodeBox/documentation#131.
Motivation
The described behavior can be useful when multiple different hooks have been deployed.
Examples:
Checklist
npm test
runs for the whole project.