Skip to content

Latest commit

 

History

History
189 lines (130 loc) · 8.23 KB

generated.asciidoc

File metadata and controls

189 lines (130 loc) · 8.23 KB

API Reference

Packages

kubeflow.org/v1

Package v1 is the v1 version of the API.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Definitions

JobCondition

JobCondition describes the state of the job at a certain point.

Appears In:
Field Description

Type of job condition.

Status of the condition, one of True, False, Unknown.

reason string

The reason for the condition’s last transition.

message string

A human readable message indicating details about the transition.

lastUpdateTime Time

The last time this condition was updated.

lastTransitionTime Time

Last time the condition transitioned from one status to another.

JobConditionType (string)

JobConditionType defines all kinds of types of JobStatus.

Appears In:

JobStatus

JobStatus represents the current observed state of the training Job.

Appears In:
Field Description

conditions JobCondition array

Conditions is an array of current observed job conditions.

replicaStatuses object (keys:ReplicaType, values:ReplicaStatus)

ReplicaStatuses is map of ReplicaType and ReplicaStatus, specifies the status of each replica.

startTime Time

Represents time when the job was acknowledged by the job controller. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.

completionTime Time

Represents time when the job was completed. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.

lastReconcileTime Time

Represents last time when the job was reconciled. It is not guaranteed to be set in happens-before order across separate operations. It is represented in RFC3339 form and is in UTC.

ReplicaSpec

ReplicaSpec is a description of the replica

Appears In:
Field Description

replicas integer

Replicas is the desired number of replicas of the given template. If unspecified, defaults to 1.

template PodTemplateSpec

Template is the object that describes the pod that will be created for this replica. RestartPolicy in PodTemplateSpec will be overide by RestartPolicy in ReplicaSpec

restartPolicy RestartPolicy

Restart policy for all replicas within the job. One of Always, OnFailure, Never and ExitCode. Default to Never.

ReplicaStatus

ReplicaStatus represents the current observed state of the replica.

Appears In:
Field Description

active integer

The number of actively running pods.

succeeded integer

The number of pods which reached phase Succeeded.

failed integer

The number of pods which reached phase Failed.

ReplicaType (string)

ReplicaType represents the type of the replica. Each operator needs to define its own set of ReplicaTypes.

Appears In:

RestartPolicy (string)

RestartPolicy describes how the replicas should be restarted. Only one of the following restart policies may be specified. If none of the following policies is specified, the default one is RestartPolicyAlways.

Appears In:

RunPolicy

RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.

Appears In:
Field Description

cleanPodPolicy CleanPodPolicy

CleanPodPolicy defines the policy to kill pods after the job completes. Default to Running.

ttlSecondsAfterFinished integer

TTLSecondsAfterFinished is the TTL to clean up jobs. It may take extra ReconcilePeriod seconds for the cleanup, since reconcile gets called periodically. Default to infinite.

activeDeadlineSeconds integer

Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer.

backoffLimit integer

Optional number of retries before marking this job failed.

schedulingPolicy SchedulingPolicy

SchedulingPolicy defines the policy related to scheduling, e.g. gang-scheduling

SchedulingPolicy

SchedulingPolicy encapsulates various scheduling policies of the distributed training job, for example minAvailable for gang-scheduling.

Appears In:
Field Description

minAvailable integer