Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAG Workflow with Workflow resource #18827

Merged
merged 3 commits into from Jan 31, 2016

Conversation

@sdminonne
Copy link
Contributor

sdminonne commented Dec 17, 2015

@erictune, @soltysh As discussed here. Here it is the workflow proposal with the Workflow resource. It should improve composability and scheduling. It's heavily based on @derekwaynecarr Initializers #17305 (for dependency resource).
@bgrant0607, @mikedanese
@EricMountain-1A, @pmorie if you've some bandwidth

@googlebot googlebot added the cla: yes label Dec 17, 2015
@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Dec 17, 2015

Labelling this PR as size/L

@k8s-bot

This comment has been minimized.

Copy link

k8s-bot commented Dec 17, 2015

GCE e2e test build/test passed for commit b692120.

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Dec 17, 2015

The author of this PR is not in the whitelist for merge, can one of the admins add the 'ok-to-merge' label?

@erictune

This comment has been minimized.

Copy link
Member

erictune commented Dec 18, 2015

Note to myself: when I review this I need to first review the comments on #17787

@erictune

This comment has been minimized.

Copy link
Member

erictune commented Dec 19, 2015

The ExistenceDependencies of #17305 would check that an object exists. But for a workflow, you need to check that it has also reach a certain state, such as a job being complete.

// If this value is nil, the default grace period will be used instead.
// Set this value longer than the expected cleanup time for your workflow.
// If downstream resources (job, pod, etc.) define their TerminationGracePeriodSeconds
// the biggest is taken.

This comment has been minimized.

Copy link
@erictune

erictune Dec 19, 2015

Member

What does "the biggest is taken" mean? Are you saying this value is defaulted from the max value of TerminationGracePeriodSeconds of all contained objects?

This comment has been minimized.

Copy link
@sdminonne

sdminonne Dec 21, 2015

Author Contributor

About ExistenceDependencies (more generally about ObjectDependencies). Sure #17305 is about existence but the specific behavior is going to be customized via the controller ControllerRef (if I'm not wrong).

I would simply modify field name in ObjectDependencies

type ObjectDependencies {
   ...
   DependenciesRef []ObjectReference `json:"dependenciesRef,omitempty"` // modified
   ControllerRef *ObjectReference `json:"controllerRef,omitempty"`
   ...
}

and the ControllerRef is going to be assigned to a controller that check termination of the resources in Dependencies.

This comment has been minimized.

Copy link
@sdminonne

sdminonne Dec 21, 2015

Author Contributor

About "the biggest is taken", yes, I'm going to rephrase it.

TerminationGracePeriodSeconds *int64 `json:"terminationGracePeriodSeconds,omitempty"`
// Steps contains the vertices of the workflow graph.
Steps []WorkflowStep `json:"steps,omitempty"`

This comment has been minimized.

Copy link
@erictune

erictune Dec 19, 2015

Member

An array implies order, but, just to clarify, I think there is no meaning to the order, right? The only order is the DAG. Call out in comment.

Also, I'm not sure, but would it make sense to use a map here instead? A map would not imply order. And you could get rid of WorkflowStep.Id and use the map key for that purpose?

This comment has been minimized.

Copy link
@sdminonne

sdminonne Dec 21, 2015

Author Contributor

Right, I'll change this to a map. Thanks

// Set this value longer than the expected cleanup time for your workflow.
// If downstream resources (job, pod, etc.) define their TerminationGracePeriodSeconds
// the biggest is taken.
TerminationGracePeriodSeconds *int64 `json:"terminationGracePeriodSeconds,omitempty"`

This comment has been minimized.

Copy link
@erictune

erictune Dec 19, 2015

Member

Add:

        ActiveDeadlineSeconds *int64 `json:"activeDeadlineSeconds,omitempty"`

since job and pod have this already.

This comment has been minimized.

Copy link
@soltysh

soltysh Dec 21, 2015

Contributor

👍 for ActiveDeadlineSeconds which defines total run time, but I'm not sure about TerminationGracePeriodSeconds, can you shed some light on the idea for it?

This comment has been minimized.

Copy link
@sdminonne

sdminonne Dec 22, 2015

Author Contributor

Sure: in a workflowStep we can have basically two things:

  1. a JobTemplate which will create a Job
  2. a reference to an already instantiated Object (another Workflow a remote ubernetes resource, etc)

For case 1 we need to handle Graceful termination, and adding this field will set the termination period for all subresources.

This comment has been minimized.

Copy link
@erictune

erictune Dec 26, 2015

Member

Can't the PodSpec inside a Job (or JobTemplate, once we define it) define the TerminationGracePeriodSeconds for the Pod, making this unnecessary?

This comment has been minimized.

Copy link
@sdminonne

sdminonne Jan 4, 2016

Author Contributor

@erictune, I'm ok to remove the TerminationGracePeriodSeconds.

// Spec contains the job specificaton that should be run in this Workflow.
// Only one between External and Spec can be set.
Spec JobSpec `json:"jobSpec,omitempty"`

This comment has been minimized.

Copy link
@erictune

erictune Dec 19, 2015

Member

I'd call this "jobTemplate" since it is a template for a Job type.

This comment has been minimized.

Copy link
@sdminonne

sdminonne Dec 21, 2015

Author Contributor

ok

This comment has been minimized.

Copy link
@soltysh

soltysh Dec 21, 2015

Contributor

If this is meant to support generic objects in the future we'll have to come up with the idea of having some generalized template which will be able to hold different types of objects to be created. Please add a paragraph in the future evolution about such thing.

This comment has been minimized.

Copy link
@erictune

erictune Dec 26, 2015

Member

If we do hold generic objects, then there will also need to be an object-agnostic way to determine if the created object is 'completed'.

// External contains a reference to another schedulable resource.
// Only one between ExternalRef and Spec can be set.
ExternalRef api.ObjectReference `json:"externalRef,omitempty"`

This comment has been minimized.

Copy link
@erictune

erictune Dec 19, 2015

Member

Say I want to make an ExternalRef to a Job. How do I do that? If I create a Job, then it will run. But I don't want it to run before it's turn in the workflow.

One way to do that is to introduce a JobTemplate resource that looks like a Job but does not do anything. I think that is a reasonable approach, but please document that if it is the chosen approach. Or maybe you have something in mind with Initializers?

This comment has been minimized.

Copy link
@sdminonne

sdminonne Dec 21, 2015

Author Contributor

In the current proposal the user can do two things to specify the task in the step:

  1. via JobSpec (JobTemplate as you're proposing) here: in this way you're going to create a Job when dependencies are satisfied.
  2. via ExternalRef: in this way the workflow step may depend on something like another workflow or something else (to address this comment).

So the basic workflow use case is to have a graph of Jobs built via JobTemplates if someone wants to compose workflows or use them in a cross-cluster environment it must go through ExternalRef.

Obviously in ExternalRef one could reference an already running Job or ScheduledJob.
@erictune does this answer to the question, or am I missing something?

```go
type ObjectDependencies struct {
...
ExistenceDependencies []ObjectReference `json:"existenceDependencies,omitempty"`

This comment has been minimized.

Copy link
@erictune

erictune Dec 19, 2015

Member

Needs to depend on Completion, not existence, right?
Call it CompletionDependencies instead? Explain how it checks for Completion of an arbitrary Kind. Maybe all kinds have to support a subresource that checks their completeness?

By the way, waiting for completion is related to #1899. Maybe scan that issue for ideas.

This comment has been minimized.

Copy link
@soltysh

soltysh Dec 21, 2015

Contributor

Shouldn't depedench be dependant on the policy? Which means it not necessarily will be completion, but might be otherwise defined. Unless I miss something here. Additionally, if this is meant to support other objects in the future I'd vote for something more generic.

This comment has been minimized.

Copy link
@sdminonne

sdminonne Jan 7, 2016

Author Contributor

@erictune about completion detection and subresource. At the beginning this proposal
will support Job (via jobTemplate) and ObjectReference to compose with other Workflow.
Job already support job/status and my plan is to add workflow/status. I plan to add also
Condition to WorflowStatus.

Iterating over all condition of status is not enough to detect completion?
I'm new to sub-resource so feel free to educate or to point out an example I can have a look.

}
```

* `dependencies.controllerRef`: will contain the policy to trigger current `WorkflowStep`

This comment has been minimized.

Copy link
@erictune

erictune Dec 19, 2015

Member

In the struct above, ControllerRef is an ObjectReference but below you talk about it like it is a string. Please explain.

This comment has been minimized.

Copy link
@sdminonne

sdminonne Dec 21, 2015

Author Contributor

Many apologies for this. It's not clear. ControllerRef is going to be a Controller not a string.
So the user can choose the controller to trigger the current step. Obviously, the controller must be already implemented. For workflow we should implement at least two controllers:

  1. AllDependenciesRunToCompletion: triggers current step when all the dependencies are completed
  2. AtLeastOneDependencyRunToCompletion: triggers current step when at least one dependency is completed.

This comment has been minimized.

Copy link
@soltysh

soltysh Dec 21, 2015

Contributor

Right that answers some of my questions. This needs to be in the proposal. Generally I'd like to see an interface which defines how these controllers will look like and a word or two about provided implementations.
Moreover how this applies to policy? Is it the same? In my understanding - yes, if so it's not clear in the proposal. If not - what's the purpose of the two, how they differ?

This comment has been minimized.

Copy link
@erictune

erictune Dec 26, 2015

Member

What is the use case for some deps not running to completion?

This comment has been minimized.

Copy link
@sdminonne

sdminonne Jan 4, 2016

Author Contributor

@erictune I'm taking back this. I was sure I found a use-case where we were using an OR between predecessors but I cannot find anymore. Eventually, I'll add the controller later on.

* `dependencies.controllerRef`: will contain the policy to trigger current `WorkflowStep`

For `Workflow` basic scenario 2 controllers should be implemented: `AllDependenciesRunToCompletion`,
`AtLeastOneDependencyRunToCompletion`. This approach permits implementation of other kinds of triggering

This comment has been minimized.

Copy link
@erictune

erictune Dec 19, 2015

Member

I can't tell if AllDependenciesRunToCompletion is a string constant, or an object name, or something else.

This comment has been minimized.

Copy link
@sdminonne

sdminonne Dec 21, 2015

Author Contributor

See here

```go
// WorkflowStatus contains the current status of the Workflow
type WorkflowStatus struct {
Statuses []WorkflowStepStatus `json:statuses`

This comment has been minimized.

Copy link
@erictune

erictune Dec 19, 2015

Member

If we ever let someone update the WorkflowSpec to add a step, then the Steps and Statuses arrays will be different lengths. If you use maps for each, then that problem is mitigated?

This comment has been minimized.

Copy link
@soltysh

soltysh Dec 21, 2015

Contributor

Do we want to allow modifying the workflow once it started?

This comment has been minimized.

Copy link
@sdminonne

sdminonne Dec 21, 2015

Author Contributor

Yes, we do. My understanding (which comes from our internal use of workflow) is that workflow is a pretty complex sequence of tasks. A user should have a way to modify it, even reassigning it (feature quite common in commercial tools) see. Since the creation is quite a heavy process.

This comment has been minimized.

Copy link
@sdminonne

sdminonne Dec 21, 2015

Author Contributor

@erictune right. I'm going to modify to have maps.

This comment has been minimized.

Copy link
@soltysh

soltysh Dec 21, 2015

Contributor

What happens if I modify a dependency that has been satisfied already to one that is not? Does all modifications trigger verification of deps, at least those modified?

This comment has been minimized.

Copy link
@erictune

erictune Jan 8, 2016

Member

Good point.

This comment has been minimized.

Copy link
@sdminonne

sdminonne Jan 8, 2016

Author Contributor

As quickly discussed with @soltysh ( please confirm or complain :) ) over IRC: dependencies of a step should be updatable only if the step didn't already start.

This comment has been minimized.

Copy link
@soltysh

soltysh Jan 8, 2016

Contributor

👍

// WorkflowStepStatus contains the status of a WorkflowStep
type WorkflowStepStatus struct {
// Job contains the status of Job for a WorkflowStep
JobStatus Job `json:"jobsStatus,omitempty"`

This comment has been minimized.

Copy link
@erictune

erictune Dec 19, 2015

Member

The Workflow controller is going to create a Job object from the template in the WorkflowStep. And that Job will have status. So, I don't know that there is any value in copying it here.

What might be valuable is to have the status contain the name of (or an objectReference to) the Job object that it created, so the user can check there for status.

This comment has been minimized.

Copy link
@sdminonne

sdminonne Dec 21, 2015

Author Contributor

OK. I'm going to keep only ObjectReferences for status

@sdminonne

This comment has been minimized.

Copy link
Contributor Author

sdminonne commented Dec 19, 2015

@erictune thanks for the review, I'm waiting for other feedbacks (if any) and then I'll modify this according to your comments.

@soltysh

This comment has been minimized.

Copy link
Contributor

soltysh commented Dec 21, 2015

I did quick review, I'm planning to go through it once again later today. Sorry @sdminonne it's taking this long, my hands are pretty full already :/

@sdminonne

This comment has been minimized.

Copy link
Contributor Author

sdminonne commented Dec 21, 2015

@soltysh no problem :) thanks for the review.

* As a user I want to be able to define a workflow.
* As a user I want to compose workflows.
* As a user I want to delete a workflow (eventually cascading to running _tasks_).
* As a user I want to debug a workflow (ability to track failure).

This comment has been minimized.

Copy link
@soltysh

soltysh Dec 21, 2015

Contributor

How about more concrete examples such as: I want to create a Job B that should depend on Job A, etc. instead of giving these high level ones?

This comment has been minimized.

Copy link
@sdminonne

sdminonne Dec 22, 2015

Author Contributor

ok

In order to implement `Workflow`, one needs to introduce the concept of _dependency_ between resources.
Dependecies are _edges_ of the graph.
_Dependency_ is introduced by the [initializers proposal #17305](https://github.com/kubernetes/kubernetes/pull/17305), and we would like to use this as the foundation for workflows.
An _initializer_ is a dynamically registered object which implements a custom policy.

This comment has been minimized.

Copy link
@soltysh

soltysh Dec 21, 2015

Contributor

I just read about initializers and I'm not sure what you mean by this policy. Mind explaining?

This comment has been minimized.

Copy link
@soltysh

soltysh Dec 21, 2015

Contributor

If the policy is meant to define deps how you imagine doing that?

similar to the `ScheduledJob` [#11980](https://github.com/kubernetes/kubernetes/pull/11980) for `Job`.
If the scheduled job is able
to support [various resources](https://github.com/kubernetes/kubernetes/pull/11980#discussion_r46729699)
`Workflow` will benefit from the _schedule_ functionality of `Job`.

This comment has been minimized.

Copy link
@soltysh

soltysh Dec 21, 2015

Contributor

I'd move this into future evolution section.

@soltysh

This comment has been minimized.

Copy link
Contributor

soltysh commented Dec 21, 2015

One more question, how one checks current state of a workflow, iow. which deps have been met already and which not?

sdminonne added 2 commits Dec 10, 2015
@erictune

This comment has been minimized.

Copy link
Member

erictune commented Jan 15, 2016

I assume you plan to implement a Workflow Controller in e.g. pkg/controller/workflow. But I don't see the word controller anywhere in the doc. If you intend there to be a controller, I suggest you have a section about it, including some pseudo-code explaining what the controller does. If you don't intend such a controller, then please say so.

At the time of writing, to defer execution there are two discussions in the community:
[#17305](https://github.com/kubernetes/kubernetes/pull/17305): an
_initializer_ is a dynamically registered object which permits to select a custom controller
to be applied to a resource. The controller verifies the dependencies.

This comment has been minimized.

Copy link
@erictune

erictune Jan 15, 2016

Member

I cannot tell if you are just pointing out the similarity to #17305 or if you are saying that Workflow will be implemented entirely via #17305, or that it will partly use #17305.

The WorkflowSpec has a nested JobSpec. Creating a nested object is not the same as creating a bare object. I don't see anything in #17305 about handling nested objects. Therefore, at a minimum, I think that there would need to be a Workflow Controller that extracts the nested objects and creates them as top-level objects. You could either have the Workflow Controller create them all immediately, and then use #17305 to sequence their start. Or you could just use the Workflow Controller to create each nested object (step) only when it is ready. I had assumed that you would do the latter, but rereading this section I'm not not sure what you intend to do.

This comment has been minimized.

Copy link
@sdminonne

sdminonne Jan 25, 2016

Author Contributor

@erictune: I'm going to supply will contain the description of the controller. The controller won't use #17305 (even if mechanism could be similar, I argue), at the beginning I thought I could re-use the same mechanism or at least share some code. Unluckily I won't be able to work on both and apparently #17305 is taking longer. I'll remove any reference to #17305.

This comment has been minimized.

Copy link
@erictune

erictune Jan 25, 2016

Member

Okay. Sounds good. Happy to revisit it when #17305 is done.

To detect run to completion for the resource inside the graph the resource needs to implement
in `status` the slice of `condition`s. [See](../../docs/devel/api-conventions.md#objects)
and [#7856](https://github.com/kubernetes/kubernetes/issues/7856).

This comment has been minimized.

Copy link
@erictune

erictune Jan 15, 2016

Member

Please call out exactly what condition indicates completion.

This comment has been minimized.

Copy link
@sdminonne

sdminonne Jan 25, 2016

Author Contributor

ok.

type WorkflowStep struct {
// JobTemplate contains the job specificaton that should be run in this Workflow.
// Only one between externalRef and jobTemplate can be set.
JobTemplate JobSpec `json:"jobTemplate,omitempty"`

This comment has been minimized.

Copy link
@erictune

erictune Jan 15, 2016

Member

Need pointers if this is a union.

This comment has been minimized.

Copy link
@sdminonne

sdminonne Jan 25, 2016

Author Contributor

ok, thanks

@sdminonne

This comment has been minimized.

Copy link
Contributor Author

sdminonne commented Jan 25, 2016

@erictune @soltysh thanks for the reviews and sorry for the silence on this.
I've worked on your feedback and in the next hours I'll supply a new version of the proposal.

@sdminonne

This comment has been minimized.

Copy link
Contributor Author

sdminonne commented Jan 25, 2016

@soltysh

  1. About Workflow update and discussion we had on IRC: I'm going to add to the proposal.
  2. About the modification for kubectl: I'll modify kubectl to include usual CRUD plus describe command: but the modification is going to be quite basic due to the limiation of an ASCII interface. Obviously workflow tools tends to have fancy UI (to represent the graph) but this will be out of the proposal (I'll clarify this point).
@sdminonne

This comment has been minimized.

Copy link
Contributor Author

sdminonne commented Jan 29, 2016

@erictune @soltysh I'm posting a new version. Several modifications:

  1. Postponing execution paragraph is removed. In the Controller paragraph I describe how the detection of a Job or of a Workflow is detected. Reference to #17305 has been removed.
  2. Detecting run to completion paragraph is removed but more details are given discussing API and controller
  3. Controller section is added.
  4. Added some info Workflow updating
  5. kubectl section is added
  6. Code formatting removed
@sdminonne sdminonne force-pushed the sdminonne:workflow_resource branch from 77471d2 to e8cf1dc Jan 29, 2016
@k8s-bot

This comment has been minimized.

Copy link

k8s-bot commented Jan 29, 2016

GCE e2e test build/test passed for commit e8cf1dc.

@k8s-teamcity-mesosphere

This comment has been minimized.

Copy link

k8s-teamcity-mesosphere commented on e8cf1dc Jan 29, 2016

TeamCity OSS :: Kubernetes Mesos :: 4 - Smoke Tests Build 13453 outcome was SUCCESS
Summary: Tests passed: 1, ignored: 213 Build time: 00:04:27

@k8s-bot

This comment has been minimized.

Copy link

k8s-bot commented Jan 30, 2016

GCE e2e test build/test passed for commit e8cf1dc.

@erictune

This comment has been minimized.

Copy link

erictune commented on docs/proposals/workflow.md in e8cf1dc Jan 31, 2016

Workflows should probably time-out, in which case they can also be Failed.

Also, if a Job of a workflow Failed, is it retried? I assume not. So then Workflow is failed in that case too.

@erictune

This comment has been minimized.

Copy link

erictune commented on docs/proposals/workflow.md in e8cf1dc Jan 31, 2016

s/job/workflow/

@erictune

This comment has been minimized.

Copy link
Member

erictune commented Jan 31, 2016

lgtm. minor comments that you can address later.

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jan 31, 2016

Automatic merge from submit-queue

k8s-github-robot added a commit that referenced this pull request Jan 31, 2016
Auto commit by PR queue bot
@k8s-github-robot k8s-github-robot merged commit a81fa2f into kubernetes:master Jan 31, 2016
4 of 5 checks passed
4 of 5 checks passed
Submit Queue PR does not have 'ok-to-merge' label
Details
Jenkins GCE e2e 219 tests run, 102 skipped, 0 failed.
Details
Jenkins unit/integration 2631 tests run, 9 skipped, 0 failed.
Details
cla/google All necessary CLAs are signed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@sdminonne

This comment has been minimized.

Copy link
Contributor Author

sdminonne commented Jan 31, 2016

@erictune thanks!

@sdminonne sdminonne deleted the sdminonne:workflow_resource branch Feb 1, 2016
@sdminonne sdminonne restored the sdminonne:workflow_resource branch Feb 1, 2016
@ravilr

This comment has been minimized.

Copy link
Contributor

ravilr commented Oct 16, 2016

@sdminonne @erictune @soltysh the proposal in this merged PR isn't found in the repo now. Is the thinking that this could be implemented outside of the core api using ThirdPartyResources? Thanks.

@sdminonne

This comment has been minimized.

Copy link
Contributor Author

sdminonne commented Oct 16, 2016

@ravilr right. I'm implementing the controller using ThirdPartyResources as an external controller.

xingzhou pushed a commit to xingzhou/kubernetes that referenced this pull request Dec 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.