Large deployment pipelines with multiple children results in huge pipeline body #6159

titirigaiulian · 2020-11-05T11:25:59Z

Issue Summary:

For large k8s deployment pipelines with multiple parent levels the pipeline body grows exponentially causing high load on the DB.

Description:

Usecase

Some of our teams have large deployments of tightly coupled components on multiple services, clusters, regions and environments. The way the deployment is structured:
Level 1: Generic deployment pipeline for one service
Level 2: Deployment pipeline for multiple coupled services
Level 3: Deployment for one region
Level 4: Deploy full env
Level 5: Full deploy: dev->stage->prod

Each child pipeline Level 1 pipeline has its own bake + deploy stages so it can be executed independent
This flow helps us create wave deployments.

The full deployment evolves to a tree like:

Steps to Reproduce:

Problem 1:

Each leaf of the tree (Level 1 pipelines) produces k8s manifests after deployments and appends to output. I can see in the pipeline outputs, the manifests multiplied under different keys: outputs.createdArtifacts, optionalArtifacts etc.

Problem 2:

Each child pipeline receives as trigger the entire parent pipeline payload. For the leaf pipeline it results into 5 levels of nested trigger -> parentPipeline

Each problem by its own does not cause any issue, but hitting both the pipelines become unmanageable.

Additional Details:

Both issues are currently in progress:

#5909

spinnaker/orca#3986

Alternative

Introduce a flag in PipelineStage context e.g. skipDownstreamOutput which can be introduced at the desired parent level that clears the output from pipeline. This ends up in splitting the tree into smaller sub-trees and manageable pipelines without loosing the ability to have large orchestrations and modular pipelines.

The text was updated successfully, but these errors were encountered:

Hertog-PJ · 2020-11-16T15:37:36Z

I can confirm that we run into this exact same issue. The proposed fix would be really helpful.

emagana-zz · 2020-12-16T15:26:31Z

Salesforce use cases are also affected by this issue. We have a short/medium-term solution by compressing body payload. PR on its way. Thank you!

Nirmalyasen · 2020-12-17T05:46:51Z

Somewhere around the 1.20 release, the number of produced artifact out of a deployment manifest stage has increased significantly. And combined with the output context carried from child pipeline to another is bloating up the execution context. This is having two effects:

The size of the payload stored in the database
The size of the payload being downloaded by UI

This is becoming a huge performance issue that needs to be addressed!!!

titirigaiulian · 2020-12-17T10:23:48Z

@Nirmalyasen the issue started for us after upgrading from 1.20.6, so I think somewhere in 1.21, 1.22 was introduced this performance regression, but didn't managed to find a specific commit.
@emagana compressing sounds like a good idea. Just curious, do you have any numbers for the json compression before and after?
We did a quick workaround by cleaning the outputs after parent execution under a feature flag, but this is useful if you do not use stuff from previous executions

titirigaiulian · 2020-12-17T11:30:48Z

Further investigations so far:
It can affect performance also single pipelines with large manifests e.g. bake 1Mb manifest + deploy results in >15Mb of data.
There is an open issue #6159 for replacing full manifests with coords, but even with the full manifest in the context there are still a lots of duplicated manifests

I can see the baked manifest exists in the execution 6 times for a simple bake and deploy pipeline under different keys:

Bake stage:
- context.artifacts.reference
- context.resolvedExpectedArtifacts.boundArtifact.reference
- outputs.artifacts.reference
- outputs.resolvedExpectedArtifacts.boundArtifact.reference
Deploy manifest stage:
- context.optionalArtifacts.reference
- outputs.optionalArtifacts.reference

The full manifests exists 4 times

context.manifests
context.kato.tasks.resultObjects.manifests
context.outputs.manifests
outputs.manifests
outputs.outputs.manifests

Some conclusions:

Some keys are duplicated under context and outputs for the base64 produced artifact after bake
For the full manifests, the outputs.manifest exists at top level, but also duplicated in context and once again under as nested under outputs (???). Don't think is the desired behavior

Having a large baked manifest the actual value could be replaced with a reference?
Even with the full manifests in the pipeline it could be manageable with some cleanup around this duplication.
cc @maggieneterval

P.S. I see in the deployment stage there is a CleanupArtifactsTask seems it's not doing its job :)

Nirmalyasen · 2020-12-18T08:01:05Z

The other thing I noticed is that the output.manifests (in a deploy manifest stage) has an annotation kubectl.kubernetes.io/last-applied-configuration that is a copy of the existing manifest. So, the attribute outputs.manifest itself almost doubles itself!!!! Is that result of kubectl version upgrade?

dbrougham · 2021-01-29T03:17:19Z

This looks related to something we are seeing. Results in large manifests being stored which makes the whole system sluggish especially with large pipeline hierarchy's. Can't wait to see this in a release!

amuraru · 2021-02-17T17:07:08Z

+1 impacted by this as well! thanks @titirigaiulian

dancb10 · 2021-02-25T13:09:55Z

+1

titirigaiulian · 2021-03-09T14:24:30Z

Added another PR to control the default behaviour and override it at pipeline level for the exceptions. Will add also some docs once we'll have this merged

spinnakerbot · 2021-04-23T14:25:15Z

This issue hasn't been updated in 45 days, so we are tagging it as 'stale'. If you want to remove this label, comment:

@spinnakerbot remove-label stale

spinnakerbot · 2021-06-07T14:25:16Z

This issue is tagged as 'stale' and hasn't been updated in 45 days, so we are tagging it as 'to-be-closed'. It will be closed in 45 days unless updates are made. If you want to remove this label, comment:

@spinnakerbot remove-label to-be-closed

hugohenley · 2021-07-16T15:14:23Z

+1

dbyron-sf · 2021-07-16T15:15:36Z

@spinnakerbot remove-label to-be-closed

spinnakerbot · 2021-08-30T15:20:18Z

This issue is tagged as 'stale' and hasn't been updated in 45 days, so we are tagging it as 'to-be-closed'. It will be closed in 45 days unless updates are made. If you want to remove this label, comment:

@spinnakerbot remove-label to-be-closed

dbyron-sf · 2021-08-30T15:21:08Z

@spinnakerbot remove-label to-be-closed

dbyron-sf · 2021-08-30T15:21:17Z

@spinnakerbot remove-label stale

spinnakerbot · 2021-10-14T15:30:19Z

This issue hasn't been updated in 45 days, so we are tagging it as 'stale'. If you want to remove this label, comment:

@spinnakerbot remove-label stale

spinnakerbot · 2021-11-28T15:35:16Z

This issue is tagged as 'stale' and hasn't been updated in 45 days, so we are tagging it as 'to-be-closed'. It will be closed in 45 days unless updates are made. If you want to remove this label, comment:

@spinnakerbot remove-label to-be-closed

dbrougham · 2021-11-28T19:25:33Z

@spinnakerbot remove-label stale

spinnakerbot · 2022-02-20T12:10:14Z

This issue hasn't been updated in 45 days, so we are tagging it as 'stale'. If you want to remove this label, comment:

@spinnakerbot remove-label stale

spinnakerbot · 2022-04-06T12:10:15Z

This issue is tagged as 'stale' and hasn't been updated in 45 days, so we are tagging it as 'to-be-closed'. It will be closed in 45 days unless updates are made. If you want to remove this label, comment:

@spinnakerbot remove-label to-be-closed

spinnakerbot · 2022-05-21T12:15:14Z

This issue is tagged as 'to-be-closed' and hasn't been updated in 45 days, so we are closing it. You can always reopen this issue if needed.

The way that the monitor pipeline task worked before was that it only stored the _status_ of the child pipeline when the child was running (see file before the commit [here](https://github.com/spinnaker/orca/pull/3902/files#diff-4c694b67271c7d9436d5af695f160f0b60c7ad7753a0ba1eb9aea76d7ba4e851R128-R130)). After the commit was merged, the MonitorPipelineTask was storing the context of the child pipeline on every update regardless of the child pipeline’s status ([see line in commit](https://github.com/spinnaker/orca/blob/20e2b9ea61037e7fc8eb59c817dc5d33f4718af3/orca-front50/src/main/groovy/com/netflix/spinnaker/orca/front50/tasks/MonitorPipelineTask.groovy#L96)). This should address the issue seen where large nested pipelines especially those child pipelines that generate a lot of outputs (Deploy Manifest) negatively impact the performance of Spinnaker. See: spinnaker/spinnaker#6159

The way that the monitor pipeline task worked before was that it only stored the _status_ of the child pipeline when the child was running (see file before the commit [here](https://github.com/spinnaker/orca/pull/3902/files#diff-4c694b67271c7d9436d5af695f160f0b60c7ad7753a0ba1eb9aea76d7ba4e851R128-R130)). After the commit was merged, the MonitorPipelineTask was storing the context of the child pipeline on every update regardless of the child pipeline’s status ([see line in commit](https://github.com/spinnaker/orca/blob/20e2b9ea61037e7fc8eb59c817dc5d33f4718af3/orca-front50/src/main/groovy/com/netflix/spinnaker/orca/front50/tasks/MonitorPipelineTask.groovy#L96)). This should address the issue seen where large nested pipelines especially those child pipelines that generate a lot of outputs (Deploy Manifest) negatively impact the performance of Spinnaker. See: spinnaker/spinnaker#6159 Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

The way that the monitor pipeline task worked before was that it only stored the _status_ of the child pipeline when the child was running (see file before the commit [here](https://github.com/spinnaker/orca/pull/3902/files#diff-4c694b67271c7d9436d5af695f160f0b60c7ad7753a0ba1eb9aea76d7ba4e851R128-R130)). After the commit was merged, the MonitorPipelineTask was storing the context of the child pipeline on every update regardless of whether the child pipeline’s status ([see line in commit](https://github.com/spinnaker/orca/blob/20e2b9ea61037e7fc8eb59c817dc5d33f4718af3/orca-front50/src/main/groovy/com/netflix/spinnaker/orca/front50/tasks/MonitorPipelineTask.groovy#L96)). This should address the issue seen where large nested pipelines especially those child pipelines that generate a lot of outputs (Deploy Manifest) negatively impact the performance of Spinnaker. See: spinnaker/spinnaker#6159

titirigaiulian mentioned this issue Nov 5, 2020

feat(pipeline executions/orca): Add support for explicit opt-out from… spinnaker/orca#3989

Merged

maggieneterval added pipelines/executions sig/none Issues that do not fall under scope of any SIG labels Nov 5, 2020

costimuraru mentioned this issue Dec 17, 2020

Kubernetes operations should not include the entire manifest in pipeline context #5909

Open

Nirmalyasen mentioned this issue Dec 22, 2020

feat(pipeline executions/orca): attribue to explicitly skip the outputs section in Deployment Manifest or Run Job stage spinnaker/orca#4032

Open

titirigaiulian mentioned this issue Mar 9, 2021

feat(pipeline executions/orca): Add configurable default for skipDown… spinnaker/orca#4080

Merged

spinnakerbot added the stale label Apr 23, 2021

spinnakerbot added the to-be-closed label Jun 7, 2021

spinnakerbot removed the to-be-closed label Jul 16, 2021

spinnakerbot added the to-be-closed label Aug 30, 2021

spinnakerbot removed to-be-closed stale labels Aug 30, 2021

spinnakerbot added the stale label Oct 14, 2021

spinnakerbot added the to-be-closed label Nov 28, 2021

spinnakerbot removed the stale label Nov 28, 2021

github-actions bot removed the to-be-closed label Jan 6, 2022

spinnakerbot added the stale label Feb 20, 2022

spinnakerbot added the to-be-closed label Apr 6, 2022

spinnakerbot closed this as completed May 21, 2022

CharlieTLe mentioned this issue May 25, 2022

fix(tasks): Fix MonitorPipelineTask regression spinnaker/orca#4271

Merged

Avi1235 mentioned this issue Sep 28, 2022

feat(child pipeline executions): Add support for omit other stages fr… spinnaker/orca#4303

Open

jasonmcintosh reopened this Apr 6, 2023

jasonmcintosh added no-lifecycle and removed stale to-be-closed labels Apr 6, 2023

dbyron-sf mentioned this issue Dec 21, 2023

feat(sql): conditionally compress large pipeline execution bodies spinnaker/orca#4620

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large deployment pipelines with multiple children results in huge pipeline body #6159

Large deployment pipelines with multiple children results in huge pipeline body #6159

titirigaiulian commented Nov 5, 2020

Hertog-PJ commented Nov 16, 2020

emagana-zz commented Dec 16, 2020

Nirmalyasen commented Dec 17, 2020

titirigaiulian commented Dec 17, 2020

titirigaiulian commented Dec 17, 2020 •

edited

Nirmalyasen commented Dec 18, 2020

dbrougham commented Jan 29, 2021

amuraru commented Feb 17, 2021

dancb10 commented Feb 25, 2021

titirigaiulian commented Mar 9, 2021

spinnakerbot commented Apr 23, 2021

spinnakerbot commented Jun 7, 2021

hugohenley commented Jul 16, 2021

dbyron-sf commented Jul 16, 2021

spinnakerbot commented Aug 30, 2021

dbyron-sf commented Aug 30, 2021

dbyron-sf commented Aug 30, 2021

spinnakerbot commented Oct 14, 2021

spinnakerbot commented Nov 28, 2021

dbrougham commented Nov 28, 2021

spinnakerbot commented Feb 20, 2022

spinnakerbot commented Apr 6, 2022

spinnakerbot commented May 21, 2022

Large deployment pipelines with multiple children results in huge pipeline body #6159

Large deployment pipelines with multiple children results in huge pipeline body #6159

Comments

titirigaiulian commented Nov 5, 2020

Issue Summary:

Description:

Usecase

Steps to Reproduce:

Problem 1:

Problem 2:

Additional Details:

Alternative

Hertog-PJ commented Nov 16, 2020

emagana-zz commented Dec 16, 2020

Nirmalyasen commented Dec 17, 2020

titirigaiulian commented Dec 17, 2020

titirigaiulian commented Dec 17, 2020 • edited

Nirmalyasen commented Dec 18, 2020

dbrougham commented Jan 29, 2021

amuraru commented Feb 17, 2021

dancb10 commented Feb 25, 2021

titirigaiulian commented Mar 9, 2021

spinnakerbot commented Apr 23, 2021

spinnakerbot commented Jun 7, 2021

hugohenley commented Jul 16, 2021

dbyron-sf commented Jul 16, 2021

spinnakerbot commented Aug 30, 2021

dbyron-sf commented Aug 30, 2021

dbyron-sf commented Aug 30, 2021

spinnakerbot commented Oct 14, 2021

spinnakerbot commented Nov 28, 2021

dbrougham commented Nov 28, 2021

spinnakerbot commented Feb 20, 2022

spinnakerbot commented Apr 6, 2022

spinnakerbot commented May 21, 2022

titirigaiulian commented Dec 17, 2020 •

edited