Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy stage failing with "NOT_STARTED" status on synthetic before stage #5257

Closed
louisjimenez opened this issue Dec 19, 2019 · 2 comments · Fixed by spinnaker/orca#3359
Closed
Assignees

Comments

@louisjimenez
Copy link
Contributor

Issue Summary:

Post merge of #3333 pipelines with a deploy stage are failing.

It appears that the synthetic stage that is intended to run prior to the deploy stage fails to start and is left with the status NOT_STARTED. After a bit of debugging, it seems this is tied to the requisiteStageRefIds array containing the id of an earlier stage in the pipeline. I tested this against 1.17 and found that the same pipeline generates an empty requisiteStageRefIds array for the synthetic stage. In that case, only the parent deploy stage lists the prior stage in its requisiteStageRefIds and the pipeline executes correctly.

Here is the sample JSON for what the synthetic stage looks like on master:

stages:
[
  {
    id: "01DWFT60K9N5000JR1BMCVB79A",
    refId: "DEPLOY<1",
    type: "createServerGroup",
    name: "Deploy in us-east1",
    status: "NOT_STARTED",
    context: {},
    outputs: { },
    tasks: [ ],
    syntheticStageOwner: "STAGE_BEFORE",
    parentStageId: "01DWFT5V5FSZ1A06RNAQW8194V",
    requisiteStageRefIds: ["1"]
  }
]

And for how it appears on 1.17

stages: [
  {},
  {
    id: "01DWFNKQ5X0557J8WH2SDEAV5X",
    refId: "DEPLOY<1",
    type: "createServerGroup",
    name: "Deploy in us-east1",
    startTime: 1576778980594,
    endTime: 1576779057754,
    status: "SUCCEEDED",
    context: {},
    outputs: { },
    tasks: [],
    syntheticStageOwner: "STAGE_BEFORE",
    parentStageId: "01DWFNKPFBJ0MKY3PQX2ZZ1K1Q",
    requisiteStageRefIds: [ ]
  },
]

The NOT_STARTED status for the synthetic stage causes the parent deploy stage to be marked as TERMINAL and the execution to fail. The orca log reflects this with:

ERROR 247858 --- [ handlers-19] c.n.s.o.q.handler.CompleteStageHandler : [] Unhandled condition for stage 01DWFPSBYCR01SK1YZPY6JN3B6 of com.netflix.spinnaker.orca.pipeline.model.Execution@af536b3a.id, marking as TERMINAL. syntheticStatuses=[NOT_STARTED], taskStatuses=[SUCCEEDED], planningStatus=[], afterStageStatuses=[]

Cloud Provider(s):

GCE, EC2

Environment:

Master

Steps to Reproduce:

This can be reproduced by creating a pipeline with a deploy stage that targets gce or ec2. A simple pipeline for reproducing:

{
  "appConfig": {},
  "lastModifiedBy": "anonymous",
  "limitConcurrent": true,
  "parallel": true,
  "stageCounter": 2,
  "stages": [
    {
      "clusters": [
        {
          "account": "my-account",
          "application": "my-app",
          "associatePublicIpAddress": true,
          "authScopes": [
            "cloud.useraccounts.readonly",
            "devstorage.read_only",
            "logging.write",
            "monitoring.write"
          ],
          "automaticRestart": true,
          "availabilityZones": {
            "us-east1": [
              "us-east1-b"
            ]
          },
          "backendServiceMetadata": [],
          "canIpForward": false,
          "capacity": {
            "desired": 1,
            "max": 1,
            "min": 1
          },
          "cloudProvider": "gce",
          "disableTraffic": false,
          "disks": [
            {
              "sizeGb": 10,
              "type": "pd-ssd"
            }
          ],
          "distributionPolicy": {
            "zones": []
          },
          "freeFormDetails": "",
          "imageSource": "priorStage",
          "instanceMetadata": {
            "load-balancer-names": "my-load-balancer",
            "startup-script": "sudo apt-get update && sudo apt-get install apache2 -y"
          },
          "instanceType": "f1-micro",
          "labels": {},
          "loadBalancers": [
            "my-load-balancer"
          ],
          "minCpuPlatform": "",
          "network": "default",
          "onHostMaintenance": "MIGRATE",
          "preemptible": false,
          "provider": "gce",
          "region": "us-east1",
          "regional": false,
          "selectZones": false,
          "serviceAccountEmail": "default",
          "stack": "my-stack",
          "strategy": "",
          "subnet": "",
          "tags": [],
          "targetSize": 1,
          "zone": "us-east1-b"
        }
      ],
      "name": "Deploy",
      "refId": "DEPLOY",
      "requisiteStageRefIds": [
        "1"
      ],
      "type": "deploy"
    },
    {
      "cloudProvider": "gce",
      "cloudProviderType": "gce",
      "cluster": "gce-test",
      "credentials": "my-account",
      "moniker": {
        "app": "my-app",
        "cluster": "gce-test",
        "stack": "test"
      },
      "name": "Find Image from Cluster",
      "onlyEnabled": true,
      "refId": "1",
      "regions": [
        "us-east1"
      ],
      "requisiteStageRefIds": [],
      "selectionStrategy": "LARGEST",
      "type": "findImage"
    }
  ],
  "triggers": [],
  "updateTs": "1576778769493"
}
@pdelagrave
Copy link

@spinnakerbot assign-issue @pdelagrave

@spinnakerbot
Copy link

User @pdelagrave cannot be assigned in this repo (spinnaker/spinnaker). Are they in the correct org/team?

pdelagrave pushed a commit to pdelagrave/orca that referenced this issue Jan 10, 2020
Fixes spinnaker/spinnaker#5257

The `Stage` constructor was keeping `refId`, `startTimeExpiry` and `requisiteStageRefIds` from the context map argument in its internal context map. This caused issues when a parent stage's context was used to create a child stage, as these entries aren't expected to be inherited. This wasn't a problem for a long time because these values were accidentally removed by the side effect of `StartStageHandler#shouldSkip`. `shouldSkip()` uses `objectMapper.convertValue()` on a stage context expecting to have a cloned map returned. A long standing design/documentation issue with Jackson made `convertValue()` return the input parameter as is instead of converting/cloning it. This made every `Stage` passing through `StartStageHandler.handle()` having their context map indirectly 'corrected' and persisted.
An upcoming upgrade to Spring Boot 2.2 comes with Jackson 2.10 where `objectMapper.convertValue()` has a new behaviour and would make `shouldSkip()` fail to silently 'correct' the stages context map.
FasterXML/jackson-databind#2220

`BakeStageSpec` was validating `context['refId']` on the generated contexts but the contexts aren't Stages yet; once a context is made into a Stage and added to the graph (`graph.add`) it will have its `refId` generated based on the parent refId; which isn't under test in that spec.
mergify bot added a commit to spinnaker/orca that referenced this issue Jan 21, 2020
Fixes spinnaker/spinnaker#5257

The `Stage` constructor was keeping `refId`, `startTimeExpiry` and `requisiteStageRefIds` from the context map argument in its internal context map. This caused issues when a parent stage's context was used to create a child stage, as these entries aren't expected to be inherited. This wasn't a problem for a long time because these values were accidentally removed by the side effect of `StartStageHandler#shouldSkip`. `shouldSkip()` uses `objectMapper.convertValue()` on a stage context expecting to have a cloned map returned. A long standing design/documentation issue with Jackson made `convertValue()` return the input parameter as is instead of converting/cloning it. This made every `Stage` passing through `StartStageHandler.handle()` having their context map indirectly 'corrected' and persisted.
An upcoming upgrade to Spring Boot 2.2 comes with Jackson 2.10 where `objectMapper.convertValue()` has a new behaviour and would make `shouldSkip()` fail to silently 'correct' the stages context map.
FasterXML/jackson-databind#2220

`BakeStageSpec` was validating `context['refId']` on the generated contexts but the contexts aren't Stages yet; once a context is made into a Stage and added to the graph (`graph.add`) it will have its `refId` generated based on the parent refId; which isn't under test in that spec.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
KathrynLewis pushed a commit to KathrynLewis/orca that referenced this issue Jan 31, 2021
Fixes spinnaker/spinnaker#5257

The `Stage` constructor was keeping `refId`, `startTimeExpiry` and `requisiteStageRefIds` from the context map argument in its internal context map. This caused issues when a parent stage's context was used to create a child stage, as these entries aren't expected to be inherited. This wasn't a problem for a long time because these values were accidentally removed by the side effect of `StartStageHandler#shouldSkip`. `shouldSkip()` uses `objectMapper.convertValue()` on a stage context expecting to have a cloned map returned. A long standing design/documentation issue with Jackson made `convertValue()` return the input parameter as is instead of converting/cloning it. This made every `Stage` passing through `StartStageHandler.handle()` having their context map indirectly 'corrected' and persisted.
An upcoming upgrade to Spring Boot 2.2 comes with Jackson 2.10 where `objectMapper.convertValue()` has a new behaviour and would make `shouldSkip()` fail to silently 'correct' the stages context map.
FasterXML/jackson-databind#2220

`BakeStageSpec` was validating `context['refId']` on the generated contexts but the contexts aren't Stages yet; once a context is made into a Stage and added to the graph (`graph.add`) it will have its `refId` generated based on the parent refId; which isn't under test in that spec.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants