Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for hook priorities (ordering) #695

Merged
merged 21 commits into from
Nov 10, 2021

Conversation

EndPositive
Copy link
Contributor

@EndPositive EndPositive commented Oct 1, 2021

Description

This PR, if applied, schedules hooks based on their priority. Hooks with higher priority are ran before hooks with a lower priority. Hooks with equal priority are scheduled according to the previously defined orderings (i.e. ReadAndWrite hooks first in serial, ReadOnly hooks after in parallel). All hooks have a default priority of 0.

If current revision is merged, there will be no changes in scheduling behavior yet.

A Documentation PR has also been made secureCodeBox/documentation#131.

Motivation

The described behavior can be useful when multiple different hooks have been deployed.

Examples:

  1. The update-field hook is required to update a field before importing the finding to DefectDojo (R&W).
  2. First import finding into DefectDojo (RO) before sending a notification.
  3. A data processing hook before any other hook is ran.

Checklist

  • Test your changes as thoroughly as possible before you commit them. Preferably, automate your test by unit/integration tests.
  • Make sure npm test runs for the whole project.
  • Make codeclimate checks happy

Signed-off-by: Jop Zitman <jop.zitman@secura.com>
@EndPositive EndPositive force-pushed the hook-priorities branch 3 times, most recently from b6b7c90 to 8257f4d Compare October 1, 2021 22:49
…optimizations

Signed-off-by: Jop Zitman <jop.zitman@secura.com>
Jop Zitman added 3 commits October 2, 2021 13:09
Signed-off-by: Jop Zitman <jop.zitman@secura.com>

pq usage optimization (less append calls)

Signed-off-by: Jop Zitman <jop.zitman@secura.com>
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
operator/apis/execution/v1/scan_types.go Show resolved Hide resolved
operator/controllers/execution/scans/hook_reconciler.go Outdated Show resolved Hide resolved
operator/controllers/execution/scans/hook_reconciler.go Outdated Show resolved Hide resolved
operator/controllers/execution/scans/hook_reconciler.go Outdated Show resolved Hide resolved
operator/controllers/execution/scans/hook_reconciler.go Outdated Show resolved Hide resolved
operator/controllers/execution/scans/hook_reconciler.go Outdated Show resolved Hide resolved
operator/controllers/execution/scans/hook_reconciler.go Outdated Show resolved Hide resolved
@J12934
Copy link
Member

J12934 commented Oct 5, 2021

Thanks for the PR 🚀
Hook Ordering / Prioritization has been a topic we've discussed many times, but never got around to actually implement it.
This already looks conceptually like a good way to tackle the problem.

I'm currently wondering if there is a way to simplify this behavior, as having the Prio applied to both ReadOnly and ReadAndWrite hooks feels extremely complicated with the differences between serial & parallel execution.

I'd propose to split the ReadOnly hooks in two categories one which runs before ReadAndWrite and one which runs after them. (Either configured via a third type, or via a new attribute)

The prio would then only be used for the ReadAndWrite hooks and the code would not have to deal (or at least not more then it already does) with orchestrating the hook while respecting both the prio's and the serial / parallel execution order.

@EndPositive
Copy link
Contributor Author

Awesome feedback @J12934! This was the discussion I was looking for 😀

I'm currently wondering if there is a way to simplify this behavior, as having the Prio applied to both ReadOnly and ReadAndWrite hooks feels extremely complicated with the differences between serial & parallel execution.

I completely agree that this PR may potentially add unnecessary complexity to the flow of a secureCodeBox scan. I am however, wondering which specific feature from this PR you find adds too much complexity. I'm doubting the flexibility of your proposed solution a bit.

I'd propose to split the ReadOnly hooks in two categories one which runs before ReadAndWrite and one which runs after them. The prio would then only be used for the ReadAndWrite hooks and the code would not have to deal (or at least not more then it already does) with orchestrating the hook while respecting both the prio's and the serial / parallel execution order.

Adding yet another field (e.g. order) for defining the order of ReadOnly compared to the ReadAndWrite hook adds complexity while reducing the flexibility. As stated in OP, I had a dream for a future implementation without requirements on respective order betwen ReadAndWrite and ReadOnly hooks. For example, one cannot currently implement a ReadOnly -> ReadAndWrite -> ReadOnly -> ReadAndWrite flow. One can also not create a flow of ReadOnly -> ReadOnly hooks with required orderings (e.g. first upload findings to persistence provider, only then send an email to management with persitence link).

I'm unsure where the complexity you are referring to actually arises. With my initially proposed implementation, the flow is like this:

  1. Retrieve all hooks
  2. Run all ReadAndWrite hooks according to priority. Are not allowed to run in parallel.
  3. Run all ReadOnly hooks according to priority. Are allowed to run in parallel.

For the code, it would be optimal if we disregarded the requirement that ReadAndWrite hooks are required to be ran before ReadOnly hooks. This logic could simply be implemented with priorities in hook helm charts (i.e. set currently implemented ReadAndWrite hook's priority to 1 and ReadOnly hooks to 0).

Resulting in the following flow:

  1. Retrieve all hooks
  2. Run all hooks according to priority. Are allowed to run in parallel if NOT ReadAndWrite hook.

The difference between ReadAndWrite and ReadOnly hooks would simply be their permissions on MinIO while not allowing multiple ReadAndWrite hooks simultaneously (preventing possible data loss).

@rseedorff rseedorff added this to In progress in secureCodeBox v3 via automation Oct 6, 2021
@rseedorff rseedorff added architecture Architecture changes hook Implement or update a hook labels Oct 6, 2021
@J12934
Copy link
Member

J12934 commented Oct 6, 2021

Yup was thinking about ReadOnly -> ReadAndWrite -> ReadOnly -> ReadAndWrite use cases too, but thought we'd be able to skip these as the seem more complicated then I'd like them to be.

Resulting in the following flow:
Retrieve all hooks
Run all hooks according to priority. Are allowed to run in parallel if NOT ReadAndWrite hook.

Would you then start all ReadOnly Hooks the first ReadAndWrite hook of the same prio at the same time? The ReadOnly hooks would then potentially get different finding when the ReadAndWrite hook was faster then the finding download of the ReadOnly hook. (I know this is kind of a edge case in this case as it would only happen when somebody manually configures a Readonly and ReadAndWrite hook to have the same prio, was just curios if I'm understanding this correctly)

Might still be worth to keep the ReadOnly after ReadAndWrite hook ordering to avoid these race conditions, I'd understand the flow to happen like the following:

Untitled Diagram(1)

Basically the same as it is today just running in multiple "stages" (one "stage" per prio configured, first stage would be all hooks with the highest prio number)

@EndPositive
Copy link
Contributor Author

Would you then start all ReadOnly Hooks the first ReadAndWrite hook of the same prio at the same time?

Interesting case 🤔 . That would indeed have been undefined behavior in my last proposal.

Basically the same as it is today just running in multiple "stages" (one "stage" per prio configured, first stage would be all hooks with the highest prio number)

I think that would cover most edges and be as close to what we have today while still really flexible. Sounds good to me!

Shall I update this PR with your latest proposal?

@J12934
Copy link
Member

J12934 commented Oct 7, 2021

Shall I update this PR with your latest proposal?

That would be awesome 👍

Jop Zitman added 3 commits October 9, 2021 17:30
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
Signed-off-by: Jop Zitman <jop.zitman@secura.com>
@EndPositive
Copy link
Contributor Author

@J12934 could you give this a review?

@J12934
Copy link
Member

J12934 commented Oct 11, 2021

Will try to review this tomorrow.

Took a short look at the docs or already, the diagram is really nice 👍

@EndPositive EndPositive marked this pull request as ready for review October 11, 2021 17:01
@J12934
Copy link
Member

J12934 commented Oct 12, 2021

Hi,
took a quite intensive dive into the PR today 😀

The code is already working great 👍
One thing I don't like was the PriorityQueue part. I don't think thats the best match for the problem as it will always output only a single hook which makes it impossible for it to properly map the entire require logic where RO Hooks are run in parrallel. As it can't completly handle this logic this now has the prioritization logic split in two locations (PriorityQueue and the hook_reconciler.go).

What i think would be a better data structure would be to a "list of list". The length of the outer list would be equal to the number of "colums" in the execution diagram of the hooks (in this example 5), the inner lists then contain the actual hooks which can always be executed in parralel (RW hooks are always the only entry in their lists). This nested list would then be generated once and attached to the scan status.

                                 Priority 2                                          Priority 1                    Priority 0
    +-------------------------------------------------------------------+     +----------------------+      +----------------------+
    |    +--------------+       +--------------+       +--------------+ |     |    +--------------+  |      |    +--------------+  | 
    | -> | ReadAndWrite |------>| ReadAndWrite |------>|   ReadOnly   | |     | -> |   ReadOnly   |  | ---> | -> | ReadAndWrite |  |
    |    +--------------+       +--------------+  |    +--------------+ |     |    +--------------+  |      |    +--------------+  |
--> |                                             |                     | --> |                      |      +----------------------+
    |                                             |    +--------------+ |     |    +--------------+  |
    |                                             +--->|   ReadOnly   | |     | -> |   ReadOnly   |  |
    |                                                  +--------------+ |     |    +--------------+  |
    +-----------+-------------------------------------------------------+     +----------------------+

For the diagram this would generate the following list (nested lists in yaml kinda hard too read 😬)

apiVersion: execution.securecodebox.io/v1
kind: Scan
metadata:
  name: nmap-localhost
  namespace: default
spec:
  parameters:
    - localhost
  scanType: nmap
status:
  orderedHookStatuses:
    - - hookName: rw-0
        priority: 2
        state: Pending
        type: ReadAndWrite
    - - hookName: rw-1
        priority: 2
        state: Pending
        type: ReadAndWrite
    - - hookName: ro-0
        priority: 2
        state: Pending
        type: ReadOnly
      - hookName: ro-1
        priority: 2
        state: Pending
        type: ReadOnly
    - - hookName: ro-2
        priority: 1
        state: Pending
        type: ReadOnly
      - hookName: ro-3
        priority: 1
        state: Pending
        type: ReadOnly
    - - hookName: rw-2
        priority: 0
        state: Pending
        type: ReadAndWrite

I've wanted to try out how hard this nested list is to generate and went ahead in fully implementing it, pushed it to a experimental fork: EndPositive/secureCodeBox@hook-priorities...secureCodeBox:experiment/hook-prio-refactor
Ordering Code is here: https://github.com/secureCodeBox/secureCodeBox/blob/experiment/hook-prio-refactor/operator/utils/orderedhookgroups.go

With this change the reconciler now only has to get the active group of hooks and work on them. This also achieves the goal the the only reason the reconciler has to differentiate between RO & RW hooks is to pass in the update arguments :)

Let me know what you think 🙏

It already has side-effects  (i.e. cluster updates) so it's more consistent to update in-place.

Signed-off-by: Jop Zitman <jop.zitman@secura.com>
secureCodeBox v3 automation moved this from Done to In progress Oct 22, 2021
@EndPositive
Copy link
Contributor Author

EndPositive commented Oct 22, 2021

What I'm not thinking about if this is considered a breaking change as the chage to the status fields would cause the upgraded operator from properly executing any scans which were in progress during the operator upgrade. Could we make some easy tweks to the operator to make that not the case?

@J12934 in 05c3464 I tried implementing a migration mechanism for the old fields. I think it covers all cases except where a scan was executing ReadOnly hooks and a ReadOnly hook was executed but the Job has been cleaned up. In that case, the new status field for that hook would get stuck on InProgress.

@J12934
Copy link
Member

J12934 commented Oct 29, 2021

Hi again 👋

Finally found the time to test this PR with the new migration code.
Looks great 👍

Tested it by starting off with 3.3.1 operator and crds and installing 50 update field hook which each add a new attribute to the finding:

for value in {1..50}
do
    helm upgrade --install "ufh-$value" secureCodeBox/update-field-hook --set attribute.name="attributes.foo-$value" --set attribute.value="$value"
done

Then I upgraded the crds and operator while the hooks were executed.

Upgrade went seamlessly and all hooks were executed properly. (Confirmed and checked that the findings had all 50 new attributes set)

With that I think the PR is ready to be merged 🚀
Any objections?

Post Merge we should probably add priority helm values for all hook helm charts, so that you can easily upgrade / configure the prio.

@malexmave
Copy link
Member

No objections from my side (but I haven't reviewed or tested in-depth, relying on your judgement here :) ).

@EndPositive
Copy link
Contributor Author

EndPositive commented Oct 29, 2021

Awesome @J12934, really nice way to test.

No objections as far as the operator code is concerned. However, I know that some hooks (e.g. DD) implement their own types for our CRD's. We should probably test/verify those before making a release.

@EndPositive
Copy link
Contributor Author

EndPositive commented Nov 1, 2021

Just ran into an issue where I wanted to delete a Scan created with pre ordering. I'll take a look.

2021-11-01T11:21:58.077Z	ERROR	controllers.execution.Scan	Failed to run Scan Finalizer	{"error": "Scan.execution.securecodebox.io \"nmap-example\" is invalid: [status.readAndWriteHookStatus.priority: Required value, status.readAndWriteHookStatus.type: Required value]"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
github.com/secureCodeBox/secureCodeBox/operator/controllers/execution/scans.(*ScanReconciler).Reconcile
	/workspace/controllers/execution/scans/scan_controller.go:85
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99

2021-11-01T11:00:17.380Z	ERROR	controller-runtime.manager.controller.scan	Reconciler error	{"reconciler group": "execution.securecodebox.io", "reconciler kind": "Scan", "name": "nmap-example", "namespace": "default", "error": "Scan.execution.securecodebox.io \"nmap-example\" is invalid: [status.readAndWriteHookStatus.priority: Required value, status.readAndWriteHookStatus.type: Required value]"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:302
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
	/go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99

Unfortunately can't set the field to `nil` as Kubernetes complains about missing fields priority and type (on non existing elements...)

Signed-off-by: Jop Zitman <jop.zitman@secura.com>
@EndPositive
Copy link
Contributor Author

EndPositive commented Nov 1, 2021

@J12934, just pushed a fix for Scan objects which were marked Done and were created before this change. Can you test again with the same method, but also remove all Scan's after updating? Once the operator detected deletion of a Scan resource, it tried to remove the finalizer, which require that the ReadAndWriteHookStatus was valid. I.e. we have to check whether the field had already been migrated on deletion.

Signed-off-by: Jop Zitman <jop.zitman@secura.com>
@EndPositive
Copy link
Contributor Author

ab00c6c added hook prio field in values.yaml

J12934
J12934 previously approved these changes Nov 10, 2021
secureCodeBox v3 automation moved this from In progress to Reviewer approved Nov 10, 2021
@malexmave
Copy link
Member

Hi @EndPositive. I am ready to merge this, but after merging your other hook PR, this is giving some conflicts. All the values.yaml changes are easy to resolve, but the conflicts in the hook_reconciler are complex enough that I would prefer if you could quickly resolve them, so that I don't introduce any new bugs in the process. Can you quickly resolve them?

secureCodeBox v3 automation moved this from Reviewer approved to To Review Nov 10, 2021
@EndPositive
Copy link
Contributor Author

EndPositive commented Nov 10, 2021

@malexmave done. I think you should run integration test & helm-docs again.

Signed-off-by: Jop Zitman <jop.zitman@secura.com>

# Conflicts:
#	hooks/cascading-scans/values.yaml
#	hooks/finding-post-processing/values.yaml
#	hooks/generic-webhook/values.yaml
#	hooks/notification/values.yaml
#	hooks/persistence-defectdojo/values.yaml
#	hooks/persistence-elastic/values.yaml
#	hooks/update-field/values.yaml
#	operator/controllers/execution/scans/hook_reconciler.go
secureCodeBox v3 automation moved this from To Review to Reviewer approved Nov 10, 2021
@J12934 J12934 merged commit d39435f into secureCodeBox:main Nov 10, 2021
secureCodeBox v3 automation moved this from Reviewer approved to Done Nov 10, 2021
@J12934 J12934 mentioned this pull request Nov 17, 2021
@J12934 J12934 moved this from Done to counter in secureCodeBox v3 Nov 19, 2021
@EndPositive EndPositive deleted the hook-priorities branch November 22, 2021 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture Architecture changes hook Implement or update a hook
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants