Add support for hook priorities (ordering) #695

EndPositive · 2021-10-01T12:55:58Z

Description

This PR, if applied, schedules hooks based on their priority. Hooks with higher priority are ran before hooks with a lower priority. Hooks with equal priority are scheduled according to the previously defined orderings (i.e. ReadAndWrite hooks first in serial, ReadOnly hooks after in parallel). All hooks have a default priority of 0.

If current revision is merged, there will be no changes in scheduling behavior yet.

A Documentation PR has also been made secureCodeBox/documentation#131.

Motivation

The described behavior can be useful when multiple different hooks have been deployed.

Examples:

The update-field hook is required to update a field before importing the finding to DefectDojo (R&W).
First import finding into DefectDojo (RO) before sending a notification.
A data processing hook before any other hook is ran.

Checklist

Test your changes as thoroughly as possible before you commit them. Preferably, automate your test by unit/integration tests.
Make sure npm test runs for the whole project.
Make codeclimate checks happy

Signed-off-by: Jop Zitman <jop.zitman@secura.com>

…optimizations Signed-off-by: Jop Zitman <jop.zitman@secura.com>

Signed-off-by: Jop Zitman <jop.zitman@secura.com> pq usage optimization (less append calls) Signed-off-by: Jop Zitman <jop.zitman@secura.com>

Signed-off-by: Jop Zitman <jop.zitman@secura.com>

operator/apis/execution/v1/scan_types.go

operator/config/crd/bases/execution.securecodebox.io_scancompletionhooks.yaml

operator/config/crd/bases/execution.securecodebox.io_scans.yaml

operator/crds/execution.securecodebox.io_scancompletionhooks.yaml

operator/controllers/execution/scans/hook_reconciler.go

J12934 · 2021-10-05T15:49:53Z

Thanks for the PR 🚀
Hook Ordering / Prioritization has been a topic we've discussed many times, but never got around to actually implement it.
This already looks conceptually like a good way to tackle the problem.

I'm currently wondering if there is a way to simplify this behavior, as having the Prio applied to both ReadOnly and ReadAndWrite hooks feels extremely complicated with the differences between serial & parallel execution.

I'd propose to split the ReadOnly hooks in two categories one which runs before ReadAndWrite and one which runs after them. (Either configured via a third type, or via a new attribute)

The prio would then only be used for the ReadAndWrite hooks and the code would not have to deal (or at least not more then it already does) with orchestrating the hook while respecting both the prio's and the serial / parallel execution order.

EndPositive · 2021-10-05T17:38:10Z

Awesome feedback @J12934! This was the discussion I was looking for 😀

I'm currently wondering if there is a way to simplify this behavior, as having the Prio applied to both ReadOnly and ReadAndWrite hooks feels extremely complicated with the differences between serial & parallel execution.

I completely agree that this PR may potentially add unnecessary complexity to the flow of a secureCodeBox scan. I am however, wondering which specific feature from this PR you find adds too much complexity. I'm doubting the flexibility of your proposed solution a bit.

I'd propose to split the ReadOnly hooks in two categories one which runs before ReadAndWrite and one which runs after them. The prio would then only be used for the ReadAndWrite hooks and the code would not have to deal (or at least not more then it already does) with orchestrating the hook while respecting both the prio's and the serial / parallel execution order.

Adding yet another field (e.g. order) for defining the order of ReadOnly compared to the ReadAndWrite hook adds complexity while reducing the flexibility. As stated in OP, I had a dream for a future implementation without requirements on respective order betwen ReadAndWrite and ReadOnly hooks. For example, one cannot currently implement a ReadOnly -> ReadAndWrite -> ReadOnly -> ReadAndWrite flow. One can also not create a flow of ReadOnly -> ReadOnly hooks with required orderings (e.g. first upload findings to persistence provider, only then send an email to management with persitence link).

I'm unsure where the complexity you are referring to actually arises. With my initially proposed implementation, the flow is like this:

Retrieve all hooks
Run all ReadAndWrite hooks according to priority. Are not allowed to run in parallel.
Run all ReadOnly hooks according to priority. Are allowed to run in parallel.

For the code, it would be optimal if we disregarded the requirement that ReadAndWrite hooks are required to be ran before ReadOnly hooks. This logic could simply be implemented with priorities in hook helm charts (i.e. set currently implemented ReadAndWrite hook's priority to 1 and ReadOnly hooks to 0).

Resulting in the following flow:

Retrieve all hooks
Run all hooks according to priority. Are allowed to run in parallel if NOT ReadAndWrite hook.

The difference between ReadAndWrite and ReadOnly hooks would simply be their permissions on MinIO while not allowing multiple ReadAndWrite hooks simultaneously (preventing possible data loss).

J12934 · 2021-10-06T16:20:46Z

Yup was thinking about ReadOnly -> ReadAndWrite -> ReadOnly -> ReadAndWrite use cases too, but thought we'd be able to skip these as the seem more complicated then I'd like them to be.

Resulting in the following flow:
Retrieve all hooks
Run all hooks according to priority. Are allowed to run in parallel if NOT ReadAndWrite hook.

Would you then start all ReadOnly Hooks the first ReadAndWrite hook of the same prio at the same time? The ReadOnly hooks would then potentially get different finding when the ReadAndWrite hook was faster then the finding download of the ReadOnly hook. (I know this is kind of a edge case in this case as it would only happen when somebody manually configures a Readonly and ReadAndWrite hook to have the same prio, was just curios if I'm understanding this correctly)

Might still be worth to keep the ReadOnly after ReadAndWrite hook ordering to avoid these race conditions, I'd understand the flow to happen like the following:

Basically the same as it is today just running in multiple "stages" (one "stage" per prio configured, first stage would be all hooks with the highest prio number)

EndPositive · 2021-10-07T07:34:38Z

Would you then start all ReadOnly Hooks the first ReadAndWrite hook of the same prio at the same time?

Interesting case 🤔 . That would indeed have been undefined behavior in my last proposal.

Basically the same as it is today just running in multiple "stages" (one "stage" per prio configured, first stage would be all hooks with the highest prio number)

I think that would cover most edges and be as close to what we have today while still really flexible. Sounds good to me!

Shall I update this PR with your latest proposal?

J12934 · 2021-10-07T08:11:27Z

Shall I update this PR with your latest proposal?

That would be awesome 👍

Signed-off-by: Jop Zitman <jop.zitman@secura.com>

EndPositive · 2021-10-11T11:31:08Z

@J12934 could you give this a review?

J12934 · 2021-10-11T15:48:32Z

Will try to review this tomorrow.

Took a short look at the docs or already, the diagram is really nice 👍

J12934 · 2021-10-12T21:11:46Z

Hi,
took a quite intensive dive into the PR today 😀

The code is already working great 👍
One thing I don't like was the PriorityQueue part. I don't think thats the best match for the problem as it will always output only a single hook which makes it impossible for it to properly map the entire require logic where RO Hooks are run in parrallel. As it can't completly handle this logic this now has the prioritization logic split in two locations (PriorityQueue and the hook_reconciler.go).

What i think would be a better data structure would be to a "list of list". The length of the outer list would be equal to the number of "colums" in the execution diagram of the hooks (in this example 5), the inner lists then contain the actual hooks which can always be executed in parralel (RW hooks are always the only entry in their lists). This nested list would then be generated once and attached to the scan status.

                                 Priority 2                                          Priority 1                    Priority 0
    +-------------------------------------------------------------------+     +----------------------+      +----------------------+
    |    +--------------+       +--------------+       +--------------+ |     |    +--------------+  |      |    +--------------+  | 
    | -> | ReadAndWrite |------>| ReadAndWrite |------>|   ReadOnly   | |     | -> |   ReadOnly   |  | ---> | -> | ReadAndWrite |  |
    |    +--------------+       +--------------+  |    +--------------+ |     |    +--------------+  |      |    +--------------+  |
--> |                                             |                     | --> |                      |      +----------------------+
    |                                             |    +--------------+ |     |    +--------------+  |
    |                                             +--->|   ReadOnly   | |     | -> |   ReadOnly   |  |
    |                                                  +--------------+ |     |    +--------------+  |
    +-----------+-------------------------------------------------------+     +----------------------+

For the diagram this would generate the following list (nested lists in yaml kinda hard too read 😬)

apiVersion: execution.securecodebox.io/v1
kind: Scan
metadata:
  name: nmap-localhost
  namespace: default
spec:
  parameters:
    - localhost
  scanType: nmap
status:
  orderedHookStatuses:
    - - hookName: rw-0
        priority: 2
        state: Pending
        type: ReadAndWrite
    - - hookName: rw-1
        priority: 2
        state: Pending
        type: ReadAndWrite
    - - hookName: ro-0
        priority: 2
        state: Pending
        type: ReadOnly
      - hookName: ro-1
        priority: 2
        state: Pending
        type: ReadOnly
    - - hookName: ro-2
        priority: 1
        state: Pending
        type: ReadOnly
      - hookName: ro-3
        priority: 1
        state: Pending
        type: ReadOnly
    - - hookName: rw-2
        priority: 0
        state: Pending
        type: ReadAndWrite

I've wanted to try out how hard this nested list is to generate and went ahead in fully implementing it, pushed it to a experimental fork: EndPositive/secureCodeBox@hook-priorities...secureCodeBox:experiment/hook-prio-refactor
Ordering Code is here: https://github.com/secureCodeBox/secureCodeBox/blob/experiment/hook-prio-refactor/operator/utils/orderedhookgroups.go

With this change the reconciler now only has to get the active group of hooks and work on them. This also achieves the goal the the only reason the reconciler has to differentiate between RO & RW hooks is to pass in the update arguments :)

Let me know what you think 🙏

It already has side-effects (i.e. cluster updates) so it's more consistent to update in-place. Signed-off-by: Jop Zitman <jop.zitman@secura.com>

…ile upgrading Signed-off-by: Jop Zitman <jop.zitman@secura.com>

EndPositive · 2021-10-22T19:06:14Z

What I'm not thinking about if this is considered a breaking change as the chage to the status fields would cause the upgraded operator from properly executing any scans which were in progress during the operator upgrade. Could we make some easy tweks to the operator to make that not the case?

@J12934 in 05c3464 I tried implementing a migration mechanism for the old fields. I think it covers all cases except where a scan was executing ReadOnly hooks and a ReadOnly hook was executed but the Job has been cleaned up. In that case, the new status field for that hook would get stuck on InProgress.

J12934 · 2021-10-29T12:22:14Z

Hi again 👋

Finally found the time to test this PR with the new migration code.
Looks great 👍

Tested it by starting off with 3.3.1 operator and crds and installing 50 update field hook which each add a new attribute to the finding:

for value in {1..50}
do
    helm upgrade --install "ufh-$value" secureCodeBox/update-field-hook --set attribute.name="attributes.foo-$value" --set attribute.value="$value"
done

Then I upgraded the crds and operator while the hooks were executed.

Upgrade went seamlessly and all hooks were executed properly. (Confirmed and checked that the findings had all 50 new attributes set)

With that I think the PR is ready to be merged 🚀
Any objections?

Post Merge we should probably add priority helm values for all hook helm charts, so that you can easily upgrade / configure the prio.

malexmave · 2021-10-29T12:33:43Z

No objections from my side (but I haven't reviewed or tested in-depth, relying on your judgement here :) ).

EndPositive · 2021-10-29T14:57:16Z

Awesome @J12934, really nice way to test.

No objections as far as the operator code is concerned. However, I know that some hooks (e.g. DD) implement their own types for our CRD's. We should probably test/verify those before making a release.

EndPositive · 2021-11-01T11:03:32Z

Just ran into an issue where I wanted to delete a Scan created with pre ordering. I'll take a look.

2021-11-01T11:21:58.077Z	ERROR	controllers.execution.Scan	Failed to run Scan Finalizer	{"error": "Scan.execution.securecodebox.io \"nmap-example\" is invalid: [status.readAndWriteHookStatus.priority: Required value, status.readAndWriteHookStatus.type: Required value]"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
github.com/secureCodeBox/secureCodeBox/operator/controllers/execution/scans.(*ScanReconciler).Reconcile
	/workspace/controllers/execution/scans/scan_controller.go:85
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99

2021-11-01T11:00:17.380Z	ERROR	controller-runtime.manager.controller.scan	Reconciler error	{"reconciler group": "execution.securecodebox.io", "reconciler kind": "Scan", "name": "nmap-example", "namespace": "default", "error": "Scan.execution.securecodebox.io \"nmap-example\" is invalid: [status.readAndWriteHookStatus.priority: Required value, status.readAndWriteHookStatus.type: Required value]"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:302
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99

Unfortunately can't set the field to `nil` as Kubernetes complains about missing fields priority and type (on non existing elements...) Signed-off-by: Jop Zitman <jop.zitman@secura.com>

EndPositive · 2021-11-01T13:09:24Z

@J12934, just pushed a fix for Scan objects which were marked Done and were created before this change. Can you test again with the same method, but also remove all Scan's after updating? Once the operator detected deletion of a Scan resource, it tried to remove the finalizer, which require that the ReadAndWriteHookStatus was valid. I.e. we have to check whether the field had already been migrated on deletion.

Signed-off-by: Jop Zitman <jop.zitman@secura.com>

EndPositive · 2021-11-01T14:28:06Z

ab00c6c added hook prio field in values.yaml

malexmave · 2021-11-10T11:48:04Z

Hi @EndPositive. I am ready to merge this, but after merging your other hook PR, this is giving some conflicts. All the values.yaml changes are easy to resolve, but the conflicts in the hook_reconciler are complex enough that I would prefer if you could quickly resolve them, so that I don't introduce any new bugs in the process. Can you quickly resolve them?

EndPositive · 2021-11-10T12:41:16Z

@malexmave done. I think you should run integration test & helm-docs again.

Signed-off-by: Jop Zitman <jop.zitman@secura.com> # Conflicts: # hooks/cascading-scans/values.yaml # hooks/finding-post-processing/values.yaml # hooks/generic-webhook/values.yaml # hooks/notification/values.yaml # hooks/persistence-defectdojo/values.yaml # hooks/persistence-elastic/values.yaml # hooks/update-field/values.yaml # operator/controllers/execution/scans/hook_reconciler.go

Add support for hook priorities

5edd163

Signed-off-by: Jop Zitman <jop.zitman@secura.com>

EndPositive force-pushed the hook-priorities branch 3 times, most recently from b6b7c90 to 8257f4d Compare October 1, 2021 22:49

Introduce heap based priority queue and use reference everywhere for …

6d6fa10

…optimizations Signed-off-by: Jop Zitman <jop.zitman@secura.com>

EndPositive force-pushed the hook-priorities branch from 8257f4d to 6d6fa10 Compare October 1, 2021 23:48

Jop Zitman added 3 commits October 2, 2021 13:09

pq usage optimization (less append calls)

b90a567

Signed-off-by: Jop Zitman <jop.zitman@secura.com> pq usage optimization (less append calls) Signed-off-by: Jop Zitman <jop.zitman@secura.com>

More "global" error handling

27d5d96

Signed-off-by: Jop Zitman <jop.zitman@secura.com>

Fix log msg condition

120137f

Signed-off-by: Jop Zitman <jop.zitman@secura.com>

sanmai-NL suggested changes Oct 5, 2021

View reviewed changes

rseedorff added this to In progress in secureCodeBox v3 via automation Oct 6, 2021

rseedorff added architecture Architecture changes hook Implement or update a hook labels Oct 6, 2021

Jop Zitman added 3 commits October 9, 2021 17:30

Reimplement hook priorities

a0f2c92

Signed-off-by: Jop Zitman <jop.zitman@secura.com>

Split up processHook function

0c12033

Signed-off-by: Jop Zitman <jop.zitman@secura.com>

Update priority description

bd6219d

Signed-off-by: Jop Zitman <jop.zitman@secura.com>

EndPositive requested a review from J12934 October 11, 2021 11:31

EndPositive mentioned this pull request Oct 11, 2021

Add hook priority docs secureCodeBox/documentation#131

Closed

EndPositive marked this pull request as ready for review October 11, 2021 17:01

J12934 added 2 commits October 12, 2021 23:00

Refactor hook prio handling to use a pre calculated list

9833851

Remove prints

1ff2dfb

Update HookStatuses by reference instead.

168e6d3

It already has side-effects (i.e. cluster updates) so it's more consistent to update in-place. Signed-off-by: Jop Zitman <jop.zitman@secura.com>

secureCodeBox v3 automation moved this from Done to In progress Oct 22, 2021

EndPositive mentioned this pull request Oct 22, 2021

Add ability to configure which hooks to run per scan #757

Merged

3 tasks

Jop Zitman added 2 commits October 22, 2021 20:58

Merge remote-tracking branch 'upstream/main' into hook-priorities

e4d19ee

Implement migration mechanism for scans that were processing hooks wh…

05c3464

…ile upgrading Signed-off-by: Jop Zitman <jop.zitman@secura.com>

Add migrations to scans which had already finished.

71293e1

Unfortunately can't set the field to `nil` as Kubernetes complains about missing fields priority and type (on non existing elements...) Signed-off-by: Jop Zitman <jop.zitman@secura.com>

EndPositive force-pushed the hook-priorities branch from bedf0b6 to 71293e1 Compare November 1, 2021 13:07

Add priority field to Hook charts

ab00c6c

Signed-off-by: Jop Zitman <jop.zitman@secura.com>

EndPositive mentioned this pull request Nov 1, 2021

Hook priority and selectors documentation secureCodeBox/documentation#152

Merged

J12934 previously approved these changes Nov 10, 2021

View reviewed changes

secureCodeBox v3 automation moved this from In progress to Reviewer approved Nov 10, 2021

EndPositive dismissed J12934’s stale review via 844e687 November 10, 2021 12:39

secureCodeBox v3 automation moved this from Reviewer approved to To Review Nov 10, 2021

EndPositive force-pushed the hook-priorities branch from 844e687 to 96c1ff8 Compare November 10, 2021 13:21

J12934 approved these changes Nov 10, 2021

View reviewed changes

secureCodeBox v3 automation moved this from To Review to Reviewer approved Nov 10, 2021

J12934 merged commit d39435f into secureCodeBox:main Nov 10, 2021

secureCodeBox v3 automation moved this from Reviewer approved to Done Nov 10, 2021

J12934 mentioned this pull request Nov 17, 2021

Update Java Model for CRD #824

Merged

J12934 moved this from Done to counter in secureCodeBox v3 Nov 19, 2021

EndPositive deleted the hook-priorities branch November 22, 2021 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for hook priorities (ordering) #695

Add support for hook priorities (ordering) #695

EndPositive commented Oct 1, 2021 •

edited

J12934 commented Oct 5, 2021

EndPositive commented Oct 5, 2021

J12934 commented Oct 6, 2021

EndPositive commented Oct 7, 2021

J12934 commented Oct 7, 2021

EndPositive commented Oct 11, 2021

J12934 commented Oct 11, 2021

J12934 commented Oct 12, 2021

EndPositive commented Oct 22, 2021 •

edited

J12934 commented Oct 29, 2021

malexmave commented Oct 29, 2021

EndPositive commented Oct 29, 2021 •

edited

EndPositive commented Nov 1, 2021 •

edited

EndPositive commented Nov 1, 2021 •

edited

EndPositive commented Nov 1, 2021

malexmave commented Nov 10, 2021

EndPositive commented Nov 10, 2021 •

edited

Add support for hook priorities (ordering) #695

Add support for hook priorities (ordering) #695

Conversation

EndPositive commented Oct 1, 2021 • edited

Description

Motivation

Checklist

J12934 commented Oct 5, 2021

EndPositive commented Oct 5, 2021

J12934 commented Oct 6, 2021

EndPositive commented Oct 7, 2021

J12934 commented Oct 7, 2021

EndPositive commented Oct 11, 2021

J12934 commented Oct 11, 2021

J12934 commented Oct 12, 2021

EndPositive commented Oct 22, 2021 • edited

J12934 commented Oct 29, 2021

malexmave commented Oct 29, 2021

EndPositive commented Oct 29, 2021 • edited

EndPositive commented Nov 1, 2021 • edited

EndPositive commented Nov 1, 2021 • edited

EndPositive commented Nov 1, 2021

malexmave commented Nov 10, 2021

EndPositive commented Nov 10, 2021 • edited

EndPositive commented Oct 1, 2021 •

edited

EndPositive commented Oct 22, 2021 •

edited

EndPositive commented Oct 29, 2021 •

edited

EndPositive commented Nov 1, 2021 •

edited

EndPositive commented Nov 1, 2021 •

edited

EndPositive commented Nov 10, 2021 •

edited