Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow parallelism > completions #18876

Merged
merged 1 commit into from
Jan 25, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion api/swagger-spec/v1beta1.json
Original file line number Diff line number Diff line change
Expand Up @@ -4025,7 +4025,7 @@
"completions": {
"type": "integer",
"format": "int32",
"description": "Completions specifies the desired number of successfully finished pods the job should be run with. Defaults to 1. More info: http://releases.k8s.io/HEAD/docs/user-guide/jobs.md"
"description": "Completions specifies the desired number of successfully finished pods the job should be run with. Setting to nil means that the success of any pod signals the success of all pods, and allows parallelism to have any positive value. Setting to 1 means that parallelism is limited to 1 and the success of that pod signals the success of the job."
},
"activeDeadlineSeconds": {
"type": "integer",
Expand Down
6 changes: 3 additions & 3 deletions docs/api-reference/extensions/v1beta1/definitions.html
Original file line number Diff line number Diff line change
Expand Up @@ -3633,7 +3633,7 @@ <h3 id="_v1beta1_jobspec">v1beta1.JobSpec</h3>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">completions</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Completions specifies the desired number of successfully finished pods the job should be run with. Defaults to 1. More info: <a href="http://releases.k8s.io/HEAD/docs/user-guide/jobs.md">http://releases.k8s.io/HEAD/docs/user-guide/jobs.md</a></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Completions specifies the desired number of successfully finished pods the job should be run with. Setting to nil means that the success of any pod signals the success of all pods, and allows parallelism to have any positive value. Setting to 1 means that parallelism is limited to 1 and the success of that pod signals the success of the job.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">false</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">integer (int32)</p></td>
<td class="tableblock halign-left valign-top"></td>
Expand Down Expand Up @@ -4573,8 +4573,8 @@ <h3 id="_any">any</h3>
</div>
<div id="footer">
<div id="footer-text">
Last updated 2016-01-18 17:24:36 UTC
Last updated 2016-01-19 18:09:13 UTC
</div>
</div>
</body>
</html>
</html>
95 changes: 68 additions & 27 deletions docs/user-guide/jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ Documentation for other releases can be found at
- [Writing a Job Spec](#writing-a-job-spec)
- [Pod Template](#pod-template)
- [Pod Selector](#pod-selector)
- [Parallelism and Completions](#parallelism-and-completions)
- [Parallel Jobs](#parallel-jobs)
- [Controlling Parallelism](#controlling-parallelism)
- [Handling Pod and Container Failures](#handling-pod-and-container-failures)
- [Job Patterns](#job-patterns)
- [Alternatives](#alternatives)
Expand Down Expand Up @@ -103,7 +104,7 @@ Run the example job by downloading the example file and then running this comman

```console
$ kubectl create -f ./job.yaml
jobs/pi
job "pi" created
```

Check on the status of the job using this command:
Expand All @@ -113,16 +114,17 @@ $ kubectl describe jobs/pi
Name: pi
Namespace: default
Image(s): perl
Selector: app=pi
Parallelism: 2
Selector: app in (pi)
Parallelism: 1
Completions: 1
Labels: <none>
Pods Statuses: 1 Running / 0 Succeeded / 0 Failed
Start Time: Mon, 11 Jan 2016 15:35:52 -0800
Labels: app=pi
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
No volumes.
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
1m 1m 1 {job } SuccessfulCreate Created pod: pi-z548a

FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 1m 1 {job-controller } Normal SuccessfulCreate Created pod: pi-dtn4q
```

To view completed pods of a job, use `kubectl get pods --show-all`. The `--show-all` will show completed pods too.
Expand All @@ -141,7 +143,7 @@ that just gets the name from each pod in the returned list.
View the standard output of one of the pods:

```console
$ kubectl logs pi-aiw0a
$ kubectl logs $pods
3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489549303819644288109756659334461284756482337867831652712019091456485669234603486104543266482133936072602491412737245870066063155881748815209209628292540917153643678925903600113305305488204665213841469519415116094330572703657595919530921861173819326117931051185480744623799627495673518857527248912279381830119491298336733624406566430860213949463952247371907021798609437027705392171762931767523846748184676694051320005681271452635608277857713427577896091736371787214684409012249534301465495853710507922796892589235420199561121290219608640344181598136297747713099605187072113499999983729780499510597317328160963185950244594553469083026425223082533446850352619311881710100031378387528865875332083814206171776691473035982534904287554687311595628638823537875937519577818577805321712268066130019278766111959092164201989380952572010654858632788659361533818279682303019520353018529689957736225994138912497217752834791315155748572424541506959508295331168617278558890750983817546374649393192550604009277016711390098488240128583616035637076601047101819429555961989467678374494482553797747268471040475346462080466842590694912933136770289891521047521620569660240580381501935112533824300355876402474964732639141992726042699227967823547816360093417216412199245863150302861829745557067498385054945885869269956909272107975093029553211653449872027559602364806654991198818347977535663698074265425278625518184175746728909777727938000816470600161452491921732172147723501414419735685481613611573525521334757418494684385233239073941433345477624168625189835694855620992192221842725502542568876717904946016534668049886272327917860857843838279679766814541009538837863609506800642251252051173929848960841284886269456042419652850222106611863067442786220391949450471237137869609563643719172874677646575739624138908658326459958133904780275901
```

Expand Down Expand Up @@ -184,27 +186,66 @@ Also you should not normally create any pods whose labels match this selector, e
via another Job, or via another controller such as ReplicationController. Otherwise, the Job will
think that those pods were created by it. Kubernetes will not stop you from doing this.

### Parallelism and Completions
### Parallel Jobs

There are three main types of jobs:

1. Non-parallel Jobs
- normally only one pod is started, unless the pod fails.
- job is complete as soon as Pod terminates successfully.
1. Parallel Jobs with a *fixed completion count*:
- specify a non-zero positive value for `.spec.completions`
- the job is complete when there is one successful pod for each value in the range 1 to `.spec.completions`.
- **not implemented yet:** each pod passed a different index in the range 1 to `.spec.completions`.
1. Parallel Jobs with a *work queue*:
- do not specify `.spec.completions`
- the pods must coordinate with themselves or an external service to determine what each should work on
- each pod is independently capable of determining whether or not all its peers are done, thus the entire Job is done.
- when _any_ pod terminates with success, no new pods are created.
- once at least one pod has terminated with success and all pods are terminated, then the job is completed with success.
- once any pod has exited with success, no other pod should still be doing any work or writing any output. They should all be
in the process of exiting.

For a Non-parallel job, you can leave both `.spec.completions` and `.spec.parallelism` unset. When both are
unset, both are defaulted to 1.

For a Fixed Completion Count job, you should set `.spec.completions` to the number of completions needed.
You can set `.spec.parallelism`, or leave it unset and it will default to 1.

For a Work Queue Job, you must leave `.spec.completions` unset, and set `.spec.parallelism` to
a non-negative integer.

For more information about how to make use of the different types of job, see the [job patterns](#job-patterns) section.

By default, a Job is complete when one Pod runs to successful completion.

A single Job object can also be used to control multiple pods running in
parallel. There are several different [patterns for running parallel
jobs](#job-patterns).
#### Controlling Parallelism

The requested parallelism (`.spec.parallelism`) can be set to any non-negative value.
If it is unspecified, it defaults to 1.
If it is specified as 0, then the Job is effectively paused until it is increased.

A job can be scaled up using the `kubectl scale` command. For example, the following
command sets `.spec.parallelism` of a job called `myjob` to 10:

```console
$ kubectl scale --replicas=$N jobs/myjob
job "myjob" scaled
```

You can also use the `scale` subresource of the Job resource.

With some of these patterns, you can suggest how many pods should run
concurrently by setting `.spec.parallelism` to the number of pods you would
like to have running concurrently. This number is a suggestion. The number
running concurrently may be lower or higher for a variety of reasons. For
example, it may be lower if the number of remaining completions is less, or as
the controller is ramping up, or if it is throttling the job due to excessive
failures. It may be higher for example if a pod is gracefully shutdown, and
the replacement starts early.
Actual parallelism (number of pods running at any instant) may be more or less than requested
parallelism, for a variety or reasons:

If you do not specify `.spec.parallelism`, then it defaults to `.spec.completions`.
- For Fixed Completion Count jobs, the actual number of pods running in parallel will not exceed the number of
remaining completions. Higher values of `.spec.parallelism` are effectively ignored.
- For work queue jobs, no new pods are started after any pod has succeded -- remaining pods are allowed to complete, however.
- If the controller has not had time to react.
- If the controller failed to create pods for any reason (lack of ResourceQuota, lack of permission, etc),
then there may be fewer pods than requested.
- The controller may throttle new pod creation due to excessive previous pod failures in the same Job.
- When a pod is gracefully shutdown, it make take time to stop.

Depending on the pattern you are using, you will either set `.spec.completions`
to 1 or to the number of units of work (see [Job Patterns] for an explanation).

## Handling Pod and Container Failures

Expand Down
20 changes: 13 additions & 7 deletions examples/job/work-queue-2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,8 @@ using above links. Then build the image:
$ docker build -t job-wq-2 .
```

### Push the image

For the [Docker Hub](https://hub.docker.com/), tag your app image with
your username and push to the Hub with the below commands. Replace
`<username>` with your Hub username.
Expand All @@ -194,6 +196,9 @@ docker tag job-wq-2 <username>/job-wq-2
docker push <username>/job-wq-2
```

You need to push to a public repository or [configure your cluster to be able to access
your private repository](../../../docs/user-guide/images.md).

If you are using [Google Container
Registry](https://cloud.google.com/tools/container-registry/), tag
your app image with your project ID, and push to GCR. Replace
Expand All @@ -220,7 +225,6 @@ spec:
selector:
matchLabels:
app: job-wq-2
completions: 1
parallelism: 2
template:
metadata:
Expand Down Expand Up @@ -263,17 +267,19 @@ Now wait a bit, then check on the job.
$ ./kubectl describe jobs/job-wq-2
Name: job-wq-2
Namespace: default
Image(s): gcr.io/causal-jigsaw-637/job-wq-2
Image(s): gcr.io/exampleproject/job-wq-2
Selector: app in (job-wq-2)
Parallelism: 2
Completions: 1
Completions: Unset
Start Time: Mon, 11 Jan 2016 17:07:59 -0800
Labels: app=job-wq-2
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Pods Statuses: 1 Running / 0 Succeeded / 0 Failed
No volumes.
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
1m 1m 1 {job } SuccessfulCreate Created pod: job-wq-2-7r7b2
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
33s 33s 1 {job-controller } Normal SuccessfulCreate Created pod: job-wq-2-lglf8


$ kubectl logs pods/job-wq-2-7r7b2
Worker with sessionID: bbd72d0a-9e5c-4dd6-abf6-416cc267991f
Expand Down
1 change: 0 additions & 1 deletion examples/job/work-queue-2/job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ spec:
selector:
matchLabels:
app: job-wq-2
completions: 1
parallelism: 2
template:
metadata:
Expand Down
2 changes: 1 addition & 1 deletion pkg/apis/extensions/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -481,7 +481,7 @@ type JobSpec struct {
Parallelism *int `json:"parallelism,omitempty"`

// Completions specifies the desired number of successfully finished pods the
// job should be run with. Defaults to 1.
// job should be run with. When unset, any pod exiting signals the job to complete.
Completions *int `json:"completions,omitempty"`

// Optional duration in seconds relative to the startTime that the job may be active
Expand Down
13 changes: 9 additions & 4 deletions pkg/apis/extensions/v1beta1/defaults.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,12 +120,17 @@ func addDefaultingFuncs(scheme *runtime.Scheme) {
obj.Labels = labels
}
}
if obj.Spec.Completions == nil {
completions := int32(1)
obj.Spec.Completions = &completions
// For a non-parallel job, you can leave both `.spec.completions` and
// `.spec.parallelism` unset. When both are unset, both are defaulted to 1.
if obj.Spec.Completions == nil && obj.Spec.Parallelism == nil {
obj.Spec.Completions = new(int32)
*obj.Spec.Completions = 1
obj.Spec.Parallelism = new(int32)
*obj.Spec.Parallelism = 1
}
if obj.Spec.Parallelism == nil {
obj.Spec.Parallelism = obj.Spec.Completions
obj.Spec.Parallelism = new(int32)
*obj.Spec.Parallelism = 1
}
},
func(obj *HorizontalPodAutoscaler) {
Expand Down