Skip to content
This repository has been archived by the owner on Sep 12, 2023. It is now read-only.

Common job controller library #5

Merged
merged 6 commits into from
Apr 4, 2019

Conversation

richardsliu
Copy link
Contributor

@richardsliu richardsliu commented Apr 4, 2019

Part 2 of kubeflow/training-operator#960

This is the common library functions copied from tf-operator. Mapping is as follows:

tf-operator/pkg/control              -->   common/job_controller
tf-operator/pkg/common/jobcontroller -->   common/job_controller
tf-operator/pkg/common/util/<version>/testutil     --> common/test_util/<version>
tf-operator/pkg/common/logger         -->     common/util
tf-operator/pkg/common/util           -->     common/util

Also added a test_job to represent generic training jobs. Originally all the tests were written using tfjob, but this repo cannot take a dependency on tf-operator.

The other files are auto-generated for the testjob custom resource.


This change is Reviewable

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just some nitpicks.
/lgtm

// e.g. 15s, 30s, 60s, 120s...
ReconcilerSyncLoopPeriod metav1.Duration

// Enable gang scheduling by kube-arbitrator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kube-arbitrator -> kube-batch


}

// When a pod is updated, figure out what tfjob/s manage it and wake them up.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tfjob can be changed to be more generic name here

logger := commonutil.LoggerForPod(pod, jc.Controller.GetAPIGroupVersionKind().Kind)

if job == nil {
// If this is a TFJob pod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFJob -> more generic name

}

// ConvertTFJobToUnstructured uses JSON to convert TFJob to Unstructured.
func ConvertTFJobToUnstructured(testJob *testjobv1.TestJob) (*unstructured.Unstructured, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function required?


// +genclient
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// +resource:path=tfjob
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TfJOb -> job in this file


// DefaultPortName is name of the port used to communicate between PS and
// workers.
DefaultPortName = "tfjob-port"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFjob -> job in this file

@richardsliu
Copy link
Contributor Author

Fixed the comments, please take a look again.

@richardsliu
Copy link
Contributor Author

/unassign @gaocegege

@richardsliu
Copy link
Contributor Author

/assign @gaocegege

@terrytangyuan
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit c2e45f5 into kubeflow:master Apr 4, 2019
georgkaleido pushed a commit to georgkaleido/common that referenced this pull request Jun 9, 2022
Co-authored-by: Paul Angerer <dabauxi@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants