Skip to content
This repository has been archived by the owner on Dec 30, 2020. It is now read-only.

Generalizing to other job schedulers #161

Open
jakirkham opened this issue Jun 20, 2019 · 6 comments
Open

Generalizing to other job schedulers #161

jakirkham opened this issue Jun 20, 2019 · 6 comments

Comments

@jakirkham
Copy link

As there are many different job schedulers used on HPC, it would be interesting to know to what extent the work here could be generalized to apply to other job schedulers to cover more use cases. For instance what would it take to get this to work on SGE or LSF or some other arbitrary job scheduler? Would it be possible to parameterize things a bit? To what extent is this tied to SLURM specifically? Thanks in advance for your thoughts. 🙂

@sashayakovtseva
Copy link
Contributor

Hello @jakirkham,

The only part tied to slurm specifically is red-box and virtual-kubelet provider (a bit). Core logic is in red-box, it implements WorkloadManager interface, and the rest elements use that interface to communicate. So if anyone wants to extend this, new WorkloadManager implementation (new red-box) is the way to go :)

@pisarukv
Copy link
Contributor

Potentially, operator can work with any WLM. The only thing you need to do, is to implement GRPc server corresponding to our workload.proto spec. And use your implementation instead red-box(which is actually just workload.proto implementation for SLURM)

@bauerm97
Copy link

@jakirkham Thanks for stopping by! I just want to say that we'd be more than happy to work with the community to accept contributions which are enabling other WLMs into this architecture.

@dgruber
Copy link

dgruber commented Jun 20, 2019

great discussion. just wondering if a generic implementation on an open standard like DRMAA would be useful for that -> https://github.com/dgruber/drmaa

@pisarukv
Copy link
Contributor

pisarukv commented Jun 20, 2019

Actually we have taken a look at DRMAA. The second version(drmaa2) looks perfect for us, but it seems not widely used. About DRMAA v1 it seems to miss some important for us features. For example, it's very important for us to have a possibility to get an information about WLM partitions(queues) and resources they have. At this moment I'm not sure if it's possible with the first version.

@dgruber
Copy link

dgruber commented Jun 20, 2019

Yeah, agreed. Adoption could be better. I started a generic implementation of DRMAA2 in Go (https://github.com/dgruber/drmaa2os). An initial cli wrapper for slurm exists (https://github.com/dgruber/drmaa2os/tree/master/pkg/jobtracker/slurmcli). Could serve as a starting point...deserves certainly more attention. Contributions welcome!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants