Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1792416: fix(pipeline-tasks): Cancelled tasks #3637

Merged

Conversation

abhinandan13jan
Copy link
Contributor

@abhinandan13jan abhinandan13jan commented Dec 3, 2019

Fixes: https://jira.coreos.com/browse/ODC-2423
Pipeline Stopped at stage1, Task Push

Screenshot from 2019-12-06 19-04-07
Screenshot from 2019-12-06 18-48-47

Pipeline Stopped at stage 2, build-1 and build-2 in progress

Screenshot from 2019-12-06 19-04-23
Screenshot from 2019-12-06 18-50-54
Screencast from 12-06-2019 07_04_56 PM

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 3, 2019
@openshift-ci-robot openshift-ci-robot added the component/dev-console Related to dev-console label Dec 3, 2019
Copy link
Contributor

@andrewballantyne andrewballantyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, I like the de-duplication of the pipelineFilterReducer. Just one tiny change.

}
if (conditions.length === 0) return null;

const condition = pipelineRun.status.conditions.find((c) => c.type === 'Succeeded');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse conditions

Suggested change
const condition = pipelineRun.status.conditions.find((c) => c.type === 'Succeeded');
const condition = conditions.find((c) => c.type === 'Succeeded');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, missed that. Thanks. Updated

@andrewballantyne
Copy link
Contributor

Ah dang, this isn't directly related to what you have done here... but we need to consider this scenario.

PipelineRunFailsAndContinues

After the PipelineRun shows failure, the container continues running until all Parallel tasks are completed.

Copy link
Contributor

@andrewballantyne andrewballantyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the individual tasks should be the individual task states. If they didn't run due to cancellation, then they are cancelled. If they stopped mid way due to cancellation, i'd also say cancelled instead of failed for these tasks.

From the Pipeline Slack room. I think we should follow this. The CLI makes the distinction as well:

$ tkn pipelinerun describe mock-app-pipeline-4xpg7b
Name:           mock-app-pipeline-4xpg7b
Namespace:      andrew
Pipeline Ref:   mock-app-pipeline

Status
STARTED          DURATION     STATUS
25 seconds ago   22 seconds   Failed(PipelineRunCancelled)

Message
PipelineRun "mock-app-pipeline-4xpg7b" was cancelled (TaskRun "mock-app-pipeline-4xpg7b-install-deps-hkdlc" was cancelled)

Resources
No resources

Params
No params

Taskruns
NAME                                          TASK NAME      STARTED          DURATION   STATUS
mock-app-pipeline-4xpg7b-install-deps-hkdlc   install-deps   25 seconds ago   ---        Failed(TaskRunCancelled)

The Pipeline has Failed due to Cancelation, as is the TaskRun.

In this case, all of these should be cancelled:

image

@andrewballantyne
Copy link
Contributor

PipelineRunFailsAndContinues

I wonder if our logic for showing Cancelled tasks is if it doesn't exist in the PLR or it has failed / cancelled. That way, despite this being cancelled, there is still a running parallel task. The logs page show this, the CLI shows this, and if we don't, we'll get the gif above. I think we need to address this at the same time.

@abhinandan13jan
Copy link
Contributor Author

@andrewballantyne Updated bars, tooltip and Visualization accordingly

Copy link
Contributor

@andrewballantyne andrewballantyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some tests.

Also could you log a bug for the parallel cancel issue since you didn't address it here? We'll want to keep track of it and fix it at a later date.

@@ -18,7 +18,7 @@ interface StatusIconProps {

export const StatusIcon: React.FC<StatusIconProps> = ({ status, ...props }) => {
switch (status) {
case runStatus['In Progress']:
case (runStatus['In Progress'], runStatus.Running):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case (runStatus['In Progress'], runStatus.Running):
case runStatus['In Progress']:
case runStatus.Running:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

Comment on lines +287 to +288
taskStatus[runStatus.Failed] > 0 || taskStatus[runStatus.Cancelled] > 0
? (taskStatus[runStatus.Cancelled] +=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just changed the logic and the tests didn't fail? Perhaps we need more / updated tests. Please look into it @abhinandan13jan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing tests and I've added a few more. This was mainly because of the || the existing tests din't fail. It just added the area for the function. Also, the existing tests were considering Cancelled+Failed and not Cancelled and Failed individually as it wasn't the case before.

}
return pipeline.latestRun.status.succeededCondition;
};

export const pipelineRunStatus = (pipelineRun): string => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write tests for this? I don't see any and this logic is starting to really warrant a few tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added tests for this

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 10, 2019
@@ -3,7 +3,9 @@ import { Pipeline, PipelineRun } from '../utils/pipeline-augment';
export enum DataState {
IN_PROGRESS = 'In Progress',
SUCCESS = 'Completed Successfully',
CANCELLED = 'Cancelled',
CANCELLED1 = 'Cancelled at stage1',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You renamed the enum property but didn't catch all uses - you're breaking in the TS build

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

const sumFailedTaskStatus = (status: TaskStatus): number => status.Failed + status.Cancelled;
const sumFailedTaskStatus = (status: TaskStatus): number => status.Failed;
const sumCancelledTaskStatus = (status: TaskStatus): number => status.Cancelled;
const sumSuccededTaskStatus = (status: TaskStatus): number => status.Succeeded;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const sumSuccededTaskStatus = (status: TaskStatus): number => status.Succeeded;
const sumSucceededTaskStatus = (status: TaskStatus): number => status.Succeeded;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

const reducerOutput = pipelineRunFilterReducer(mockPipelineRuns[0]);
expect(reducerOutput).toBe('-');
});
it('2. Pipelinerun with empty conditions array', () => {
it('3. Pipelinerun with empty status object', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should avoid numbering the tests like this imo. This is just painful for maintenance and will look really off if someone failed to properly update all the numbers.

Since you have to make edits to your work for my other comments, could you clean up these and remove the numbers from each of these tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed numbering

return null;
const conditions = _.get(pipelineRun, ['status', 'conditions'], []);
const isCancelled = conditions.find((c) =>
['PipelineRunCancelled', 'TaskRunCancelled'].some((cancel) => cancel === c.reason),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice this before, a PLR will show the reason for a fail being a TaskRun was cancelled? We don't cancel TaskRuns... that is weird. You sure this correct?

I don't see this in the CLI:

$ tkn pipelinerun describe mock-app-pipeline-o259xj
Name:           mock-app-pipeline-o259xj
Namespace:      andrew
Pipeline Ref:   mock-app-pipeline

Status
STARTED        DURATION    STATUS
1 minute ago   9 seconds   Failed(PipelineRunCancelled)

Message
PipelineRun "mock-app-pipeline-o259xj" was cancelled (TaskRun "mock-app-pipeline-o259xj-install-deps-k29ts" was cancelled)

Resources
No resources

Params
No params

Taskruns
NAME                                          TASK NAME      STARTED        DURATION   STATUS
mock-app-pipeline-o259xj-install-deps-k29ts   install-deps   1 minute ago   ---        Failed(TaskRunCancelled)

Status is ${status}(${reason})

STARTED        DURATION    STATUS
1 minute ago   9 seconds   Failed(PipelineRunCancelled)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because, the RunStatus function is usable over both PipelineRuns and TaskRuns... The status block for both are same for all statuses. Except for cancellation, where the reason differs.

When we cancel PipelineRuns, the running tasks are marked as Cancelled in the PLR YAML

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you don't pass always a PipelineRun? Sometimes you pass TaskRuns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the tooltip and the bars use the function for getting the status of TR. Perhaps a more generic name would avoid the confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name is pipelineRunStatus and it accepts a pipelineRun. Please correct the name and I will log a refactor ticket so we can use types in this file (it's imported by the place the types are, so it's circular if we try to use the types).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewballantyne I completely agree with you. However, there needs a lot of refactoring in the status area. The Visualization uses different set of functions and the Bars and Tooltips uses another set. We would need all of these to point to the same function just the way we have for Actions. Can we address this as a separate concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. There are fundamental issues with that file that predates this work. Although increased misuse happened with this work, it was already in place when this work was started. Since we need to correct a lot of things around Pipelines already, I'm okay with this being delayed to another PR.

@abhinandan13jan
Copy link
Contributor Author

https://jira.coreos.com/browse/ODC-2509 is the ticket to address parallel tasks on cancellation.

Comment on lines -247 to +249
specificPipelineData.pipelineRuns[DataState.CANCELLED],
specificPipelineData.pipelineRuns[DataState.CANCELLED1],
specificPipelineData.pipelineRuns[DataState.CANCELLED2],
specificPipelineData.pipelineRuns[DataState.CANCELLED3],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we needed to add CANCELLED2 and CANCELLED3 to this test data, but it appears to have not adjusted the intent of the test (which is get the latest run). I think it's okay to leave this change.

@andrewballantyne
Copy link
Contributor

/retest

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 11, 2019
ListFilterId.Running,
ListFilterId.Failed,
ListFilterId.Cancelled,
],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we have a util which can be used for filters and runFilters in PipelineRuns.tsx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are a lot of things that need to be touched up in Pipelines. I'll add this to the Tech Debt @invincibleJai. I've logged https://jira.coreos.com/browse/ODC-2553 to cover it

@abhinandan13jan
Copy link
Contributor Author

/test e2e-gcp-console

@invincibleJai
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 14, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinandan13jan, andrewballantyne, invincibleJai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 59ade4c into openshift:master Dec 14, 2019
@spadgett spadgett added this to the v4.4 milestone Dec 16, 2019
@andrewballantyne
Copy link
Contributor

/retitle Bug 1792416: fix(pipeline-tasks): Cancelled tasks

@openshift-ci-robot openshift-ci-robot changed the title fix(pipeline-tasks): Cancelled tasks Bug 1792416: fix(pipeline-tasks): Cancelled tasks Jan 17, 2020
@openshift-ci-robot
Copy link
Contributor

@abhinandan13jan: All pull requests linked via external trackers have merged. Bugzilla bug 1792416 has been moved to the MODIFIED state.

In response to this:

Bug 1792416: fix(pipeline-tasks): Cancelled tasks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andrewballantyne
Copy link
Contributor

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@andrewballantyne: Bugzilla bug 1792416 is in an unrecognized state (MODIFIED) and will not be moved to the MODIFIED state.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andrewballantyne
Copy link
Contributor

/cherry-pick release-4.3

@openshift-cherrypick-robot

@andrewballantyne: new pull request created: #3993

In response to this:

/cherry-pick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. component/dev-console Related to dev-console lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants