-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Description
Currently, pipeline modules are moved to the preferred compute device during __call__
. This is reasonable, as they stay there as long as the user keeps passing the same torch_device
across calls.
However, in multi-GPU model-serving scenarios, it could be useful to move each pipeline to a dedicated device during or immediately after instantiation. This would make it possible to create, say, 8 different pipelines and move each one to a different GPU. Doing it this way could potentially save CPU memory while preparing the service.
Currently, the workaround to achieve the same would be to perform a call with fake data immediately after the instantiation.
Describe the solution you'd like
Ideally, the following should work:
pipe = StableDiffusionPipeline.from_pretrained(model_id).to("cuda:1")
Describe alternatives you've considered
Current workaround:
pipe = StableDiffusionPipeline.from_pretrained(model_id)
_ = pipe(["cat"], num_inference_steps=1, torch_device="cuda:1")
Another alternative would be to pass the device to the initializer. This could be done in addition to adding a to
method, but I believe it's not necessary as to
is familiar enough to PyTorch users.
Additional context
See discussion in this Slack thread.