Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource allocation + GPU allocation #266

Open
uriafranko opened this issue Aug 18, 2023 · 2 comments
Open

Resource allocation + GPU allocation #266

uriafranko opened this issue Aug 18, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request runner Feature for the runner backend

Comments

@uriafranko
Copy link

Hey guys, love your work :)

I'm trying to figure out how can I allocate the relevant resources for each flow.

For example:
I would like to allocate for a specific flow 2CPU, 8GB memory
and for another flow 4 CPU 16GB memory and 1 GPU core.

In both Pulumi and Ray clusters, you can specify those for each worker, any idea how can we modify Buildflow to support GPUs and custom resource allocation?

@JoshTanke
Copy link
Collaborator

Hey Uria, I appreciate the kind words! 🙂

We allocate resources at the processor level - for example you can set the num_cpus to use per replica inside the pipeline decorator like: @app.pipeline(..., num_cpus=2). Here are docs on this

It sounds like you have 2 separate workflows you want to set up? Flows are essentially just a container type in our framework, so you could either model it as 2 flows, or 2 processors in the same flow. Combining them into the same flow would let them share compute resources, which is usually more cost effective (they will both autoscale independent of each other still).

I don't think we have exposed the memory & gpu options yet, but that should be trivial to add. I can get a PR together this weekend!

Aside: I'm happy to hop on a call this next week if you'd like to dive into any more specifics! Would love to hear about your use case to see how we can best support it 🙂

@JoshTanke JoshTanke added enhancement New feature or request runner Feature for the runner backend labels Aug 19, 2023
@JoshTanke JoshTanke self-assigned this Aug 19, 2023
@JoshTanke
Copy link
Collaborator

JoshTanke commented Aug 19, 2023

#267 exposes the ray options. I'll need to sync with Caleb about adding support in the autoscaler before I land it, but you can install the temporary change by pointing pip at the expose-gpu-and-memory branch: pip install git+https://github.com/launchflow/buildflow.git@expose-gpu-and-memory

One thing to watch out for: Ray's memory option is just for its own internal scheduler and will not enforce that your processor only uses X amount of memory (ray docs on this). If you start hitting OOM errors with this option set, you're most likely using more memory than you told the ray scheduler you would (we hit this a bunch early on).

I would personally recommend not setting the memory option if you can avoid it - errors in the ray scheduler can be really hard to diagnose. Once it comes time to deploy, you can control the memory usage by changing the machine type you use for each worker in your ray cluster, and then let the scheduler keep track of memory usage for you.

If you're running locally, you can start your ray cluster with specific resources limits: ray start --head --num-gpus=3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request runner Feature for the runner backend
Projects
None yet
Development

No branches or pull requests

2 participants