This repo contains the libraries for writing a custom job operators such as tf-operator and pytorch-operator. To write a custom operator, user need to do following three steps
- Write a custom controller that implements controller interface, such as the TestJobController and instantiate a testJobController object
testJobController := TestJobController {
...
}
- Instantiate a JobController struct object and pass in the custom controller written in step 1 as a parameter
jobController := JobController {
Controller: testJobController,
Config: v1.JobControllerConfiguration{EnableGangScheduling: false},
Recorder: recorder,
}
- Within you main reconcile loop, call the JobController.ReconcileJobs method.
reconcile(...) {
// Your main reconcile loop.
...
jobController.ReconcileJobs(...)
...
}
Note that this repo is still under construction, API compatibility is not guaranteed at this point.
The API fies are located under job_controller/api/v1
:
- constants.go: the constants such as label keys.
- interface.go: the interfaces to be implemented by custom controllers.
- controller.go: the main
JobController
that contains theReconcileJobs
API method to be invoked by user. This is the entrypoint of the JobController logic. The rest of code underjob_controller/
folder contains the core logic for theJobController
to work, such as creating and managing worker pods, services etc.