You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, i want to use the cluster's cpu resources to run the stable diffusion inference demo. i do not have GPUs. I thought through the ray framework, the cpus can also be used to execute some inference task.
I got two wsls to set up the ray cluster. wsl A has 12 cpus and is as the head node. wsl B has 12 cpus and is as the worker node. So run 'ray status' command, it shows:
======== Autoscaler status: 2024-03-22 00:59:14.244899 ========
Node status
Active:
1 node_88349db0fa0ccd3086db2f5a4c79ab9a527acb4aca4c023cb8120c8b
1 node_5cb133607c13b47fa48631b86114996f49a7ced083a5bcbeafbc20b8
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
Usage:
0.0/24.0 CPU
0B/43.54GiB memory
0B/21.04GiB object_store_memory
Demands:
(no resource demands)
Then i run the stable diffusion batch inference demo, and set the pipe and device parameters to 'cpu' as below script shows. Then i set the num_cpus=16. In my opinion, the ray cluster may use the 16/24 cpus to run the task. However , it raise the error:
(autoscaler +6s) Error: No available node types can fulfill resource request {'CPU': 16.0}. Add suitable node types to this cluster to resolve this issue.
Only when i set the num_cpus <= 12 (the original wsl A's total cpu num), it will work and only one of the two worker will execute the task.
I saw the document says, the num_cpus is the number of CPUs to reserve for each parallel map worker and the concurrency is the number of ray workers to use concurrently. So i try to set the concurrency=2 and the num_cpus=8, i thought 2*8=16 cpus may work. However, when executing the inference process, the error occurred again.
So my point is ,how can i make use of the cpu resources in the cluster to execute one inference task?
Versions / Dependencies
ray 2.9.3
python3.10.12
wsl2
Reproduction script
model_id = "stabilityai/stable-diffusion-2-1"
prompt = "a photo of an astronaut riding a horse on mars"
# Set a different seed for every image in batch
self.pipe.generator = [
torch.Generator(device="cpu").manual_seed(i) for i in range(len(batch))
]
images = self.pipe(list(batch["prompt"])).images
return {"images": np.array(images, dtype=object)}
The text was updated successfully, but these errors were encountered:
mct2611
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Apr 8, 2024
c21
added
P2
Important issue, but not time-critical
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
May 15, 2024
What happened + What you expected to happen
Hi, i want to use the cluster's cpu resources to run the stable diffusion inference demo. i do not have GPUs. I thought through the ray framework, the cpus can also be used to execute some inference task.
I got two wsls to set up the ray cluster. wsl A has 12 cpus and is as the head node. wsl B has 12 cpus and is as the worker node. So run 'ray status' command, it shows:
======== Autoscaler status: 2024-03-22 00:59:14.244899 ========
Node status
Active:
1 node_88349db0fa0ccd3086db2f5a4c79ab9a527acb4aca4c023cb8120c8b
1 node_5cb133607c13b47fa48631b86114996f49a7ced083a5bcbeafbc20b8
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
Usage:
0.0/24.0 CPU
0B/43.54GiB memory
0B/21.04GiB object_store_memory
Demands:
(no resource demands)
Then i run the stable diffusion batch inference demo, and set the pipe and device parameters to 'cpu' as below script shows. Then i set the num_cpus=16. In my opinion, the ray cluster may use the 16/24 cpus to run the task. However , it raise the error:
(autoscaler +6s) Error: No available node types can fulfill resource request {'CPU': 16.0}. Add suitable node types to this cluster to resolve this issue.
Only when i set the num_cpus <= 12 (the original wsl A's total cpu num), it will work and only one of the two worker will execute the task.
I saw the document says, the num_cpus is the number of CPUs to reserve for each parallel map worker and the concurrency is the number of ray workers to use concurrently. So i try to set the concurrency=2 and the num_cpus=8, i thought 2*8=16 cpus may work. However, when executing the inference process, the error occurred again.
So my point is ,how can i make use of the cpu resources in the cluster to execute one inference task?
Versions / Dependencies
ray 2.9.3
python3.10.12
wsl2
Reproduction script
model_id = "stabilityai/stable-diffusion-2-1"
prompt = "a photo of an astronaut riding a horse on mars"
import ray
import ray.data
import pandas as pd
ds = ray.data.from_pandas(pd.DataFrame([prompt], columns=['prompt']))
class PredictCallable:
def init(self, model_id: str, revision: str = None):
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch
def call(self, batch: pd.DataFrame) -> pd.DataFrame:
import torch
import numpy as np
preds = ds.map_batches(
PredictCallable,
fn_constructor_kwargs=dict(model_id=model_id),
concurrency=1,
num_cpus=16,
batch_size=1,
batch_format='pandas'
)
results = preds.take_all()
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: