Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job controller proposal #11746

Merged
merged 1 commit into from
Aug 17, 2015

Conversation

soltysh
Copy link
Contributor

@soltysh soltysh commented Jul 23, 2015

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project, in which case you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@k8s-bot
Copy link

k8s-bot commented Jul 23, 2015

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

If this message is too spammy, please complain to ixdy.

@mikedanese
Copy link
Member

@davidopp @bprashanth #1624

@nikhiljindal
Copy link
Contributor

/sub

@alex-mohr
Copy link
Contributor

FWIW, I'd be curious as to whether you've thought about how much of what you need to implement for Jobs is specific to them, or whether it'd be useful to split out some of the functionality into a separate standalone object. That is, instead of Job -> Pod, perhaps Job -> {ForeverPod, RunToCompletionPod, Pet} -> Pod(s) would also make sense?

@erictune
Copy link
Member

When jobSpec.taskCount > 1, the pods must be doing something different. So, they must be getting their instructions on what to do differently from something external to the PodSpec, such as a remote service, or shared volume. In other words, different pods of the same job are working on different bits of data -- call those different work units (shards in map-reduce terminology).

I think users care about which pods are doing which work units. For example, if there are repeated failures on the same work unit, users usually want to know about that so they can debug. Or they want to know which work units are taking the longest so they can change their sharding scheme. But the JobScheduler is oblivious to what work unit a pod does. Does this make it less useful to users? Or are those things handled by a different component. If the latter, I'm having trouble envisioning how it all fits together, so maybe you could present an example (pseudocode) of a map-reduce master that uses Job as a building block?

@erictune
Copy link
Member

How would you envision running a map-reduce which has a master which starts a variable number of workers (e.g based on the size of the input files or other heuristics)? A Job of size 1 that runs a master, and then the master starts the workers as a Job with a computed size, and the master does not exit with success until all the work is done?

How are separate map, shuffle and reduce stages implemented? A Mapper Job followed by a Shuffler Job followed by a Reducer Job? Can I make those phases overlapping, while still respecting a total resource or pod count quota? What does that look like?

@soltysh
Copy link
Contributor Author

soltysh commented Jul 24, 2015

@alex-mohr can you elaborate more on your Job -> {ForeverPod, RunToCompletionPod, Pet} -> Pod(s) suggestion? My understanding for the three objects you've mentioned between a Job and a Pod is basically the same. Iow. all of them represent a job that will be run to a completion, and the actual time it's being run depends entirely on the author of the Job. The only one that might be stepping out, imho, would be ForeverPod, which could be implemented with current ReplicationController as well, but again that depends on the Job author. He might create a Job, without any constraints which might end up running forever.

@googlebot
Copy link

CLAs look good, thanks!

@soltysh
Copy link
Contributor Author

soltysh commented Jul 24, 2015

@erictune I guess you're referring to a discussion @davidopp and @bgrant0607 had in #1624 (starting from here). Let's discuss that topic in depth during next community hangout, as it was agreed in #1624.

## Motivation
Jobs are needed for executing multi-pod computation to completion; a good example
here would be the ability to implement any type of batch oriented tasks or a MapReduce
or Hadoop style workload.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove "or a MapReduce or Hadoop style workload"

@alex-mohr
Copy link
Contributor

@soltysh I was getting at whether Pod is building block you want for Job, or whether there's an intermediate Thing (regardless of what we actually call it) that might be useful either independently or in other contexts. See last para of #1624 (comment)

@erictune
Copy link
Member

@alex-mohr
I'd been thinking the same thing. I'd been calling it an AtLeastOncePod when I talked about it to myself.
It's spec just contains PodSpec and its status is just a summary of the Pod(s) created and their intermediate and final outcome.

@soltysh
Copy link
Contributor Author

soltysh commented Jul 24, 2015

@erictune @alex-mohr my understanding of the Job is, it will be a supplement to a ReplicationController, which is responsible for running an app/task/whatever your image does for ever; whereas Job runs to a completion. Obviously those runs will be represented by a intermediate object, you'll be actually getting status from (in my proposal it's called the JobExecution, but that's debatable). Which maps to Borg analogy in the following way:

k8s Borg
Job Job
JobExecution Task
Pod Attempt

Additionally I agree that adding the ability to assign "virtual amount" of work per job execution will be viable. I'll update my proposal after the weekend, if you don't mind.

}

// JobExec represents the current state of a single execution of a Job.
type JobExec struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name shouldn't be abbreviated (so JobExecution). Task is also appropriate. Is there any opposition to using that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task might deserves to be a top level api object in it's own right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also implement ActiveDeadlineSeconds on pods (attempts) if there is not already a way to do that..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task is extremely overloaded in cluster management, but I think it's a good choice here.

Can you explain what you mean by making it a top-level object? As-is it can presumably be incorporated into other types, if that's what you were after?

I think we do already have ActiveDeadlineSeconds on pods: https://github.com/GoogleCloudPlatform/kubernetes/blob/affba42a0520ecf6bab040fb7971284ef9bf450a/pkg/kubelet/kubelet.go#L1358

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "top-level object", I mean it should have a dedicated REST path in the apiserver and storage path in etcd (like replication controller, node, pod, endpoint, see this), not be embedded in the JobStatus like it is in the current revision of this proposal. It also makes sense that Task would have it's own Spec and Status.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would a task resource add compared to pods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was to replicate the behavior of ReplicationController with run-once pods in mind. Which means, I agree with Brian for not creating top-level object for JobExecution. The only difference between JE and Pods is the former groups certain amount of Pods, but that does not deserve its own object, imho.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@erictune
Copy link
Member

other than the one suggestion, LGTM.

@erictune erictune added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2015
@erictune
Copy link
Member

I'm sure there will be updates to this as you implement, so happy to merge this now.

@soltysh
Copy link
Contributor Author

soltysh commented Aug 17, 2015

@erictune let me change the MaxParallelism to Parallelism as you suggested and let's merge it.

@soltysh
Copy link
Contributor Author

soltysh commented Aug 17, 2015

@erictune changed MaxParallelism to Parallelism, now it's ready for merge. Thank you!

@soltysh
Copy link
Contributor Author

soltysh commented Aug 17, 2015

Once we have this in, I'll update the ScheduledJob proposal (#11980) to match API proposed here.

@soltysh
Copy link
Contributor Author

soltysh commented Aug 17, 2015

Fixed travis failure.

// job should be run with. Defaults to 1.
Completions *int

// Selector is a label query over pods that should match the pod count.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is out of date.

@bgrant0607
Copy link
Member

I don't think we need further iterations on the proposal at this point. It's pretty minimalistic.

You might need to rebase in order to make shippable pass.

@soltysh
Copy link
Contributor Author

soltysh commented Aug 17, 2015

I'll update that comment, I just did a rebase to make travis happy, will look into shippable as well.

@bgrant0607
Copy link
Member

I don't see any difference in your shippable.yml from that at master/HEAD, so I just restarted Shippable in case it was a random failure.

@soltysh
Copy link
Contributor Author

soltysh commented Aug 17, 2015

I've updated Selector comment, hopefully Shippable will like me more now.

@mikedanese
Copy link
Member

Shippable failure is caused by a github outage. https://status.github.com/messages

@roberthbailey
Copy link
Contributor

I kicked shippable to get a green status prior to merging.

roberthbailey added a commit that referenced this pull request Aug 17, 2015
@roberthbailey roberthbailey merged commit 19bb04f into kubernetes:master Aug 17, 2015
@soltysh soltysh deleted the job_controller_proposal branch August 18, 2015 12:05
@soltysh
Copy link
Contributor Author

soltysh commented Aug 18, 2015

@pmorie to answer your questions:

There's no field on JobStatus which indicates the overall status of the job -- I think this would be much easier to reason about than an array of JobConditions; any thoughts?

Still you'll be searching for particular condition JobSucceeded, if that value is True then you're all set.

Is there a distinct condition for a job which has been created, but for which no pods have yet been scheduled?
Is there a distinct condition for a job which has currently executing worker pods? (I asked about this above, including here for completeness)

Nope for both. This is the difference between phases and conditions. There are no direct phases you can observe a job to be in, iow. no state machine. Conditions, as stated in here "...represent the latest available observations of an object's current state...". There's an issue regarding that topic I recommend reading.

Is it worth giving some preliminary treatment to detecting overall failure of a job?

Can you elaborate on it a bit? Do you mean something like prematurely killing a job if we know it won't reach the desired Completions?

@soltysh soltysh mentioned this pull request Aug 19, 2015
## Motivation

Jobs are needed for executing multi-pod computation to completion; a good example
here would be the ability to implement any type of batch oriented tasks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should remove "any" b/c - workflow DAGs or graphs are not supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. lgtm "Looks good to me", indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.