Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cancel buildrun support #809

Merged

Conversation

gabemontero
Copy link
Member

Changes

Fixes #54

/kind feature

This PR still needs automated testing, so I have marked it WIP, but the approach I've taken is far enough along that feedback from the community makes sense.

Most notable, for the minimal, required API change, I chose to duplicate the pattern established with Tekton TaskRuns.

Namely, buildRun.spec.status == BuildRunCancelled, to mirror taskRun.spec.status == TaskRunCancelled

There is possibly some more condition related work we want to consider My prototype current produces these results:

$ oc get br,tr
NAME                                                  SUCCEEDED   REASON             STARTTIME   COMPLETIONTIME
buildrun.shipwright.io/buildpack-nodejs-build-q8mlw   False       TaskRunCancelled   12m         12m

NAME                                                    SUCCEEDED   REASON             STARTTIME   COMPLETIONTIME
taskrun.tekton.dev/buildpack-nodejs-build-q8mlw-6qc42   False       TaskRunCancelled   12m         12m
$

Admittedly, an EP could be warranted. If need be, I can retroactively do that. But this seemed simple enough that perhaps we can avoid those additional cycles. But we'll see how the community responds to this, and we'll go from there.

Lastly, I have an shipwright-io/cli branch I'll push for a PR soon as well (though until this merges, I have to manually vendor in the changes here).

But you can test this as well via kubectl patch <buildruname> --patch '{"spec": {"status":"BuildRunCancelled"}}' --type=merge

NOTE: commit dc99af5 from #808 was needed in my local testing on top of openshift. I've included it here for now, but once that PR merges, it should go away here.

https://github.com//pull/808

Submitter Checklist

  • Includes tests if functionality changed/was added
  • Includes docs if changes are user-facing
  • Set a kind label on this PR
  • Release notes block has been filled in, or marked NONE

See the contributor guidedc99af5
for details on coding conventions, github and prow interactions, and the code review process.

Release Notes

Support for cancelling an active BuildRun has been added.

@openshift-ci openshift-ci bot added release-note Label for when a PR has specified a release note do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Jun 14, 2021
@SaschaSchwarze0 SaschaSchwarze0 added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jun 14, 2021
Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gabemontero, long-wished feature. :-)

Similar to us translating the TaskRun status TaskRunTimeout into BuildRunTimeout, we should do the same with TaskRunCancelled into BuildRunCanceled (and make it American ;-) ).

BuildRun documentation will need an update.

Your YAML change looks huge. Are you running the correct version of controller-gen?

@sbose78
Copy link
Member

sbose78 commented Jun 14, 2021

Usage looks straightforward to me, thank you! Not too firm on the need for an enhancement proposal as long as everyone votes up on the API modification.

@gabemontero
Copy link
Member Author

Hi @gabemontero, long-wished feature. :-)

Similar to us translating the TaskRun status TaskRunTimeout into BuildRunTimeout, we should do the same with TaskRunCancelled into BuildRunCanceled (and make it American ;-) ).

:-) sounds good @SaschaSchwarze0 ... I'll take that as a +1

BuildRun documentation will need an update.

Agreed ... will add that along with tests

Your YAML change looks huge. Are you running the correct version of controller-gen?

$ controller-gen --version
Version: v0.5.0

which lines up with https://github.com/shipwright-io/build/blob/main/HACK.md per the recent updates from you and @imjasonh

not sure what is up ... I see about re-running make generate ... as I add tests / doc

@gabemontero
Copy link
Member Author

Usage looks straightforward to me, thank you! Not too firm on the need for an enhancement proposal as long as everyone votes up on the API modification.

cool thanks @sbose78

and fyi everyone I know I need to add godoc to get the verify job to pass

@gabemontero
Copy link
Member Author

the supporting cli feature to wrapper the patch needed to cancel buildruns is ^^ shipwright-io/cli#22

@gabemontero
Copy link
Member Author

ok I've pushed commits to

I'll move onto tests next.

Then circle back to seen if there was any input on my conditions note in the description, and go from there.

Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small suggestions.

docs/buildrun.md Outdated

To cancel a `BuildRun` that's currently executing, update its status to mark it as canceled.

Whe you cancel a `BuildRun`, the underlying `TaskRun` is marked as canceled per the [Tekton cancel `TaskRun` feature](https://github.com/tektoncd/pipeline/blob/main/docs/taskruns.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Whe you cancel a `BuildRun`, the underlying `TaskRun` is marked as canceled per the [Tekton cancel `TaskRun` feature](https://github.com/tektoncd/pipeline/blob/main/docs/taskruns.md).
When you cancel a `BuildRun`, the underlying `TaskRun` is marked as canceled per the [Tekton cancel `TaskRun` feature](https://github.com/tektoncd/pipeline/blob/main/docs/taskruns.md#cancelling-a-taskrun).

docs/buildrun.md Outdated
| Unknown | Running | No | The BuildRun has been validate and started to perform its work. |
| True | Succeeded | Yes | The BuildRun Pod is done. |
| Unknown | TaskRunCancelled | No | The user requested the BuildRun to be canceled. This results in the BuildRun controller requesting the TaskRun be canceled. Cancellation has not been done yet. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our status reason should also be "BuildRunCanceled". The code change might simply be extending

case v1beta1.TaskRunReasonTimedOut:
reason = "BuildRunTimeout"
message = fmt.Sprintf("BuildRun %s failed to finish within %s",
buildRun.Name,
taskRun.Spec.Timeout.Duration,
)

A nuance here would be to set "BuildRunCanceling" once we notice that the user sent us a request to cancel until the actual TaskRun got canceled and then set "BuildRunCanceled". But that's an edge case differentiation that I do not insist on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so you concur with my speculation in this PR's description, and that the current form of the prototype is incomplete wrt what I showed there wrt TaskRunCancelled being shown.

Yes as I evolve this PR beyond WIP, assuming there are not dissenters to the API, I'll do this.

docs/buildrun.md Outdated
@@ -140,6 +160,7 @@ The following table illustrates the different states a BuildRun can have under i
| False | ServiceAccountNotFound | Yes | The referenced service account was not found in the cluster. |
| False | BuildRegistrationFailed | Yes | The related Build in the BuildRun is on a Failed state. |
| False | BuildNotFound | Yes | The related Build in the BuildRun was not found. |
| False | TaskRunCancelled | Yes | The BuildRun and underlying TaskRun were canceled successfully. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, should be "BuildRunCanceled".


const (
// BuildRunSpecStatusCanceled indicates that the user wants to cancel the BuildRun,
// if not already cancelled or terminated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// if not already cancelled or terminated
// if not already canceled or terminated

;-)

return c != nil && c.GetStatus() == corev1.ConditionTrue
}

// IsCanceled returns true if the BuildRun's spec status is set to Cancelled state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// IsCanceled returns true if the BuildRun's spec status is set to Cancelled state.
// IsCanceled returns true if the BuildRun's spec status is set to BuildRunCanceled state.

Path: path,
Value: value,
}}
data, _ := json.Marshal(payload)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data, _ := json.Marshal(payload)
data, err := json.Marshal(payload)
if err != nil {
return err
}

Just in case we eventually have some static code analyzer that complains about ignored errors.

@gabemontero
Copy link
Member Author

fyi I pushed some unit tests and code changes along the move from citing TaskRunCancelled to citing BuildRunCanceled but I would consider all that still "in progress" for reviewers

but I wanted to confirm what I have in my branch passed in CI as well.

I'm also looking at updating the integration tests today.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 21, 2021
Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some feedback I got from our UX folks on the API design.

pkg/apis/build/v1alpha1/buildrun_types.go Outdated Show resolved Hide resolved
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 28, 2021
@gabemontero
Copy link
Member Author

hmmm the verify job is still complaining saying to run make generate-crds even after I ran make generate-crds .... perhaps this formatting annoyance is turning into a blocker ... diving in

@gabemontero gabemontero changed the title WIP - add cancel buildrun support add cancel buildrun support Jun 28, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 28, 2021
@gabemontero
Copy link
Member Author

OK, I've progressed things along enough to remove the WIP

Remaining items:

  • still sorting out the name of the new buildRun.spec field with @SaschaSchwarze0 and the UX engineers on his side; we are going to change it to something else, we just need to get final consensus
  • I believe I've made the requisite updates for your other comments @SaschaSchwarze0
  • even with v0.5.0 of controller-gen the make generate-crds is not honoring the line breaks when I run it locally; ultimately this fails the verify job's git diff when it runs on its github actions ubuntu VM ... not sure how I'm going to deal with this yet
  • I've got the integration tests updated, but I can't run integration tests locally yet (centers around the openshift extra security, non-docker fedora setup I have which I think I'll have to finally punt on ;-)), so I may have to short term debug those via CI runs

All that said, and with minimally some subsequent commit squashing, PTAL :-)

@gabemontero
Copy link
Member Author

OK, I've progressed things along enough to remove the WIP

Remaining items:

* still sorting out the name of the new `buildRun.spec` field with @SaschaSchwarze0 and the UX engineers on his side; we are going to change it to something else, we just need to get final consensus

* I believe I've made the requisite updates for your other comments @SaschaSchwarze0

* even with v0.5.0 of `controller-gen` the `make generate-crds` is not honoring the line breaks when I run it locally; ultimately this fails the verify job's `git diff` when it runs on its github actions ubuntu VM ... not sure how I'm going to deal with this yet

* I've got the integration tests updated, but I can't run integration tests locally yet (centers around the openshift extra security, non-docker fedora setup I have which I think I'll have to finally punt on ;-)), so I may have to short term debug those via CI runs

the integration tests have passed

All that said, and with minimally some subsequent commit squashing, PTAL :-)

pkg/reconciler/buildrun/buildrun.go Show resolved Hide resolved
pkg/reconciler/buildrun/buildrun.go Show resolved Hide resolved
pkg/apis/build/v1alpha1/buildrun_types.go Outdated Show resolved Hide resolved
Comment on lines 254 to 271
succeededCondition := buildRun.Status.GetCondition(buildv1alpha1.Succeeded)
if buildRun.IsCanceled() && lastTaskRun.IsCancelled() && (succeededCondition == nil || succeededCondition.Reason != buildv1alpha1.BuildRunSpecStatusCanceled) {
if updateErr := resources.UpdateConditionWithFalseStatus(ctx, r.client, buildRun, "the BuildRun is marked canceled.", buildv1alpha1.BuildRunSpecStatusCanceled); updateErr != nil {
return reconcile.Result{}, updateErr
}
return reconcile.Result{}, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling is that this code should not be here, but rather the later code that calls UpdateBuildRunUsingTaskRunCondition should handle this case. This will also make sure that a generated service account gets deleted when a BuildRun is canceled (that scenario is maybe worth an integration test). Though, I am not sure what it means about the completion date handling (has a canceled TaskRun has a completionDate ?).

Beside that, I searched for the code that sets the BuildRun status to status=Unknown and reason=BuildRunCanceled? Basically when we are reacting to an update for BuildRunCanceled in the spec and patch the TaskRun, but this is still running.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK circling back to this @SaschaSchwarze0

First, to answer your question, yes, task run cancelling sets a completion date. See

Now, as it turns out, I originally had this call later in the path here, but ran into headaches when I started adding unit tests, and then rationalized that short circuiting processing sooner made sense.

That said, I'll take another pass and look at moving to where the calls to UpdateBuildRunUsingTaskRunCondition are and see how things unravel and then how I can subsequently sort them out.

For your "Beside that.." point, I'll circle back to that, after I look into ^^ and I am ready to provide my update to my #809 (comment) .... I have an understanding of what is going on with that, but before I dive into it, I want to see where I land with ^^ and then level set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interim, the full taskrun yaml with what I currently have implementation wise:

  apiVersion: tekton.dev/v1beta1
  kind: TaskRun
  metadata:
    annotations:
      pipeline.tekton.dev/release: v0.22.0
    creationTimestamp: "2021-06-30T16:07:55Z"
    generateName: buildpack-nodejs-build-htk8q-
    generation: 2
    labels:
      app.kubernetes.io/managed-by: tekton-pipelines
      build.shipwright.io/generation: "1"
      build.shipwright.io/name: buildpack-nodejs-build
      buildrun.shipwright.io/generation: "1"
      buildrun.shipwright.io/name: buildpack-nodejs-build-htk8q
      clusterbuildstrategy.shipwright.io/generation: "1"
      clusterbuildstrategy.shipwright.io/name: buildpacks-v3
    name: buildpack-nodejs-build-htk8q-88qsq
    namespace: ggmtest
    ownerReferences:
    - apiVersion: shipwright.io/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: BuildRun
      name: buildpack-nodejs-build-htk8q
      uid: b8993808-520c-4a67-8e6e-1ef154239ee0
    resourceVersion: "172515"
    uid: 29c563ad-b95d-4577-af85-7cf666952df0
  spec:
    params:
    - name: shp-output-image
      value: docker.io/gmontero/sample-nodejs:latest
    - name: shp-source-root
      value: /workspace/source
    - name: CONTEXT_DIR
      value: source-build
    - name: shp-source-context
      value: /workspace/source/source-build
    serviceAccountName: pipeline
    status: TaskRunCancelled
    taskSpec:
      params:
      - default: Dockerfile
        description: Path to the Dockerfile
        name: DOCKERFILE
        type: string
      - default: .
        description: The root of the code
        name: CONTEXT_DIR
        type: string
      - description: The URL of the image that the build produces
        name: shp-output-image
        type: string
      - description: The context directory inside the source directory
        name: shp-source-context
        type: string
      - description: The source directory
        name: shp-source-root
        type: string
      results:
      - description: The digest of the image
        name: shp-image-digest
      - description: The compressed size of the image
        name: shp-image-size
      - description: The commit SHA of the cloned source.
        name: shp-source-default-commit-sha
      steps:
      - args:
        - --url
        - https://github.com/shipwright-io/sample-nodejs
        - --target
        - $(params.shp-source-root)
        - --result-file-commit-sha
        - $(results.shp-source-default-commit-sha.path)
        command:
        - /ko-app/git
        image: docker.io/gmontero/git-77f36d96a091f12d11365ad77da28c66@sha256:9e8bb16e0432a354a10f3caf351093b23028c34013ebcdba41261be7f154216e
        name: source-default
        resources: {}
        securityContext:
          runAsGroup: 1000
          runAsUser: 1000
      - args:
        - -c
        - |
          chown -R "1000:1000" /workspace/source && chown -R "1000:1000" /tekton/home && chown -R "1000:1000" /cache && chown -R "1000:1000" /layers
        command:
        - /bin/bash
        image: docker.io/paketobuildpacks/builder:full
        name: prepare
        resources:
          limits:
            cpu: 500m
            memory: 1Gi
          requests:
            cpu: 250m
            memory: 65Mi
        securityContext:
          capabilities:
            add:
            - CHOWN
          runAsUser: 0
        volumeMounts:
        - mountPath: /cache
          name: cache-dir
        - mountPath: /layers
          name: layers-dir
      - args:
        - -app=/workspace/source/$(inputs.params.CONTEXT_DIR)
        - -cache-dir=/cache
        - -layers=/layers
        - $(params.shp-output-image)
        command:
        - /cnb/lifecycle/creator
        image: docker.io/paketobuildpacks/builder:full
        name: build-and-push
        resources:
          limits:
            cpu: 500m
            memory: 1Gi
          requests:
            cpu: 250m
            memory: 65Mi
        securityContext:
          runAsGroup: 1000
          runAsUser: 1000
        volumeMounts:
        - mountPath: /cache
          name: cache-dir
        - mountPath: /layers
          name: layers-dir
      volumes:
      - name: cache-dir
      - name: layers-dir
      workspaces:
      - name: source
    timeout: 0s
    workspaces:
    - emptyDir: {}
      name: source
  status:
    completionTime: "2021-06-30T16:08:00Z"
    conditions:
    - lastTransitionTime: "2021-06-30T16:08:00Z"
      message: TaskRun "buildpack-nodejs-build-htk8q-88qsq" was cancelled
      reason: TaskRunCancelled
      status: "False"
      type: Succeeded
    podName: buildpack-nodejs-build-htk8q-88qsq-pod-ff8fk
    startTime: "2021-06-30T16:07:55Z"
    steps:
    - container: step-source-default
      imageID: docker.io/gmontero/git-77f36d96a091f12d11365ad77da28c66@sha256:9e8bb16e0432a354a10f3caf351093b23028c34013ebcdba41261be7f154216e
      name: source-default
      terminated:
        exitCode: 1
        finishedAt: "2021-06-30T16:08:00Z"
        reason: TaskRunCancelled
        startedAt: "2021-06-30T16:07:58Z"
    - container: step-prepare
      imageID: docker.io/paketobuildpacks/builder@sha256:e3462130656ff77b7d64f0c1660e3116d70074a0568918f1a0898dc687bf9087
      name: prepare
      terminated:
        exitCode: 1
        finishedAt: "2021-06-30T16:08:00Z"
        reason: TaskRunCancelled
        startedAt: "2021-06-30T16:07:59Z"
    - container: step-build-and-push
      imageID: docker.io/paketobuildpacks/builder@sha256:e3462130656ff77b7d64f0c1660e3116d70074a0568918f1a0898dc687bf9087
      name: build-and-push
      terminated:
        exitCode: 1
        finishedAt: "2021-06-30T16:08:00Z"
        reason: TaskRunCancelled
        startedAt: "2021-06-30T16:07:59Z"
    taskSpec:
      params:
      - default: Dockerfile
        description: Path to the Dockerfile
        name: DOCKERFILE
        type: string
      - default: .
        description: The root of the code
        name: CONTEXT_DIR
        type: string
      - description: The URL of the image that the build produces
        name: shp-output-image
        type: string
      - description: The context directory inside the source directory
        name: shp-source-context
        type: string
      - description: The source directory
        name: shp-source-root
        type: string
      results:
      - description: The digest of the image
        name: shp-image-digest
      - description: The compressed size of the image
        name: shp-image-size
      - description: The commit SHA of the cloned source.
        name: shp-source-default-commit-sha
      steps:
      - args:
        - --url
        - https://github.com/shipwright-io/sample-nodejs
        - --target
        - $(params.shp-source-root)
        - --result-file-commit-sha
        - $(results.shp-source-default-commit-sha.path)
        command:
        - /ko-app/git
        image: docker.io/gmontero/git-77f36d96a091f12d11365ad77da28c66@sha256:9e8bb16e0432a354a10f3caf351093b23028c34013ebcdba41261be7f154216e
        name: source-default
        resources: {}
        securityContext:
          runAsGroup: 1000
          runAsUser: 1000
      - args:
        - -c
        - |
          chown -R "1000:1000" /workspace/source && chown -R "1000:1000" /tekton/home && chown -R "1000:1000" /cache && chown -R "1000:1000" /layers
        command:
        - /bin/bash
        image: docker.io/paketobuildpacks/builder:full
        name: prepare
        resources:
          limits:
            cpu: 500m
            memory: 1Gi
          requests:
            cpu: 250m
            memory: 65Mi
        securityContext:
          capabilities:
            add:
            - CHOWN
          runAsUser: 0
        volumeMounts:
        - mountPath: /cache
          name: cache-dir
        - mountPath: /layers
          name: layers-dir
      - args:
        - -app=/workspace/source/$(inputs.params.CONTEXT_DIR)
        - -cache-dir=/cache
        - -layers=/layers
        - $(params.shp-output-image)
        command:
        - /cnb/lifecycle/creator
        image: docker.io/paketobuildpacks/builder:full
        name: build-and-push
        resources:
          limits:
            cpu: 500m
            memory: 1Gi
          requests:
            cpu: 250m
            memory: 65Mi
        securityContext:
          runAsGroup: 1000
          runAsUser: 1000
        volumeMounts:
        - mountPath: /cache
          name: cache-dir
        - mountPath: /layers
          name: layers-dir
      volumes:
      - name: cache-dir
      - name: layers-dir
      workspaces:
      - name: source

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the buildrun yaml:

  apiVersion: shipwright.io/v1alpha1
  kind: BuildRun
  metadata:
    creationTimestamp: "2021-06-30T16:07:55Z"
    generateName: buildpack-nodejs-build-
    generation: 2
    labels:
      build.shipwright.io/generation: "1"
      build.shipwright.io/name: buildpack-nodejs-build
    name: buildpack-nodejs-build-htk8q
    namespace: ggmtest
    resourceVersion: "172510"
    uid: b8993808-520c-4a67-8e6e-1ef154239ee0
  spec:
    buildRef:
      name: buildpack-nodejs-build
    status: BuildRunCanceled
    timeout: 0s
  status:
    buildSpec:
      output:
        credentials:
          name: push-secret
        image: docker.io/gmontero/sample-nodejs:latest
      source:
        contextDir: source-build
        url: https://github.com/shipwright-io/sample-nodejs
      strategy:
        kind: ClusterBuildStrategy
        name: buildpacks-v3
    completionTime: "2021-06-30T16:08:00Z"
    conditions:
    - lastTransitionTime: "2021-06-30T16:08:00Z"
      message: the BuildRun is marked canceled.
      reason: BuildRunCanceled
      status: "False"
      type: Succeeded
    latestTaskRunRef: buildpack-nodejs-build-htk8q-88qsq
    startTime: "2021-06-30T16:07:55Z"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beside that, I searched for the code that sets the BuildRun status to status=Unknown and reason=BuildRunCanceled? Basically when we are reacting to an update for BuildRunCanceled in the spec and patch the TaskRun, but this is still running.

OK I have the move to UpdateBuildRunUsingTaskRunCondition working locally and will push that up in a bit.

On this portion of the comment, I may need some elaboration, though I have a notion of what you may want.

When I run kubectl get br -o yaml -w when running a buildrun, the Succeeded condition remains with status Unknown until the taskrun fails / succeeds / cancels ... i.e. the taskrun reaches a terminal state.

The current code hence does not bother with setting the status of the Succeeded condition when it first sees that the user patched the buildrun with cancel. Hence it "stays" unknown. I believe that is because we just use the taskRun condition's status, and it stays unknown. i.e. https://github.com/shipwright-io/build/blob/main/pkg/reconciler/buildrun/resources/conditions.go#L106

A quick yaml snippet:

  conditions:
  - lastTransitionTime: "2021-06-30T16:55:15Z"
    message: Not all Steps in the Task have finished executing
    reason: Running
    status: Unknown
    type: Succeeded

Then when it observes the taskrun has been processed, it sets the suceeded condition to false.

Now, I can be explicit around making sure this holds. I'll be including that in my change, with a comment that we can discuss in this PR about whether to keep it or not. Among other things, we can provide a message that is more precise doing it this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ change, plus the cancelled build run autogen sa integration test, pushed

@gabemontero
Copy link
Member Author

gabemontero commented Jun 29, 2021

OK I've crafted an ubuntu controller-gen container that allowed me to run make generate / make generate-crds and the commit looks good wrt line breaks

the verify job just passed

@imjasonh bumping controller-gen from 0.5.0 to 0.6.0 not surprisingly lead to the need for a pretty massive k8s related go mod bump that I didn't want to couple with this PR, so I punted on that for now. I'll see about revisiting that once I get over the hump here

I also got the update to the controller's role so we can patch PRs back up per prior discussion with @SaschaSchwarze0 so that is sorted out

but overall, I'm putting a

/hold

while I next sort through some oddities wrt my manual testing (I'll be curious to see if they arise in the CI testing) ... maybe related to @SaschaSchwarze0 comment about ordering of my changes in the reconciler, but TBH I have not had the cycles to sufficiently focus on it. When I can next circle back from openshift to shipwright, that is my next step.

@gabemontero gabemontero force-pushed the cancel-buildrun branch 2 times, most recently from 34010aa to bd79905 Compare July 5, 2021 03:05
@gabemontero
Copy link
Member Author

just saw #829 from @SaschaSchwarze0 .... looks very related to the functional test pain I've recently noted here.

FWIW, the tekton namespace events from my latest debug:

2021-07-05T03:12:56.2419796Z GGMGGM event type Warning reason FailedScheduling note 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
2021-07-05T03:12:56.2422473Z GGMGGM event type Normal reason SuccessfulCreate note Created pod: tekton-pipelines-controller-558cc574b7-5v9gs
2021-07-05T03:12:56.2425129Z GGMGGM event type Normal reason ScalingReplicaSet note Scaled up replica set tekton-pipelines-controller-558cc574b7 to 1
2021-07-05T03:12:56.2427214Z GGMGGM event type Warning reason FailedScheduling note 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
2021-07-05T03:12:56.2429243Z GGMGGM event type Normal reason SuccessfulCreate note Created pod: tekton-pipelines-webhook-575b9bcd9f-lwg2z
2021-07-05T03:12:56.2431314Z GGMGGM event type Normal reason ScalingReplicaSet note Scaled up replica set tekton-pipelines-webhook-575b9bcd9f to 1
2021-07-05T03:12:56.2433601Z GGMGGM event type Warning reason FailedGetResourceMetric note failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
2021-07-05T03:13:02.2506033Z GGMGGM event type Warning reason FailedComputeMetricsReplicas note invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)

hence the pod did not get scheduled.

Going to revert my debug, see how the next test run goes, and then go back on vacation for today.

@gabemontero
Copy link
Member Author

OK just 1 of my new tests failed with an intermittent update conflict:

"Operation cannot be fulfilled on taskruns.tekton.dev \"buildrun-test-build-222-rbfbk\": the object has been modified; please apply your changes to the latest version and try again",

I'll add some redundancy to the update in question when I'm officially back to work tomorrow.

thanks @SaschaSchwarze0 for otherwise fixing CI

@gabemontero gabemontero force-pushed the cancel-buildrun branch 2 times, most recently from 9227e9f to 9860e1c Compare July 7, 2021 14:31
docs/buildrun.md Outdated Show resolved Hide resolved
docs/buildrun.md Outdated Show resolved Hide resolved
@gabemontero
Copy link
Member Author

all green CI @coreydaley @SaschaSchwarze0

commits squashed

I believe all comments have either received code updates from me or declining to change comment responses from me

PTAL

pkg/apis/build/v1alpha1/buildrun_types.go Outdated Show resolved Hide resolved
deploy/crds/shipwright.io_buildruns.yaml Outdated Show resolved Hide resolved
pkg/apis/build/v1alpha1/buildrun_types.go Show resolved Hide resolved
make generate-crds via ubuntu
add is cancelled check, task run patch, to buildrun reconciler
unit and integration tests
doc changes
@coreydaley
Copy link
Member

It looks good to me, but I would like @SaschaSchwarze0 to take one last look also.

@gabemontero
Copy link
Member Author

It looks good to me, but I would like @SaschaSchwarze0 to take one last look also.

thanks and agreed @coreydaley

@gabemontero
Copy link
Member Author

still all green on ci e2e's @SaschaSchwarze0 @coreydaley after fixing the remaining oversight on the field rename that @coreydaley caught

@SaschaSchwarze0
Copy link
Member

SaschaSchwarze0 commented Jul 8, 2021

I'll make sure I take a final look this week. Will need to evaluate various scenarios against your code.

@coreydaley
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 14, 2021
@gabemontero
Copy link
Member Author

bump @SaschaSchwarze0 - any idea if you'll get back to cancel builds PR review this week?

Copy link
Member

@SaschaSchwarze0 SaschaSchwarze0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, went through some edge cases and everything was working fine. :-)

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 15, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SaschaSchwarze0

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 15, 2021
@openshift-merge-robot openshift-merge-robot merged commit 5d8fb41 into shipwright-io:main Jul 15, 2021
@gabemontero gabemontero deleted the cancel-buildrun branch July 15, 2021 12:09
@gabemontero
Copy link
Member Author

Nice, went through some edge cases and everything was working fine. :-)

/approve

Awesome - thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Label for when a PR has specified a release note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cancelling running/in flight BuildRun
6 participants