You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for taking the time to submit an enhancement request to the LocalAI project.
Based on your description, it seems that adding the --tensor-parallel-size setting to the vllm backend configuration through a YAML file would indeed be a useful enhancement. This would allow users to leverage multiple GPUs for inference, which is an essential capability for handling larger workloads.
I will pass your suggestion along to the project maintainers. They will review it and decide whether or not to implement it. You may not receive a direct reply, but rest assured, your request has been noted and will be considered.
Remember, LocalAI is a community-driven project, so if you would like to contribute to its development, please feel free to join the community and helps.
Thank you once again for your feedback, and we look forward to seeing more contributions from you!
Is your feature request related to a problem? Please describe.
No similar issues were found.
Describe the solution you'd like
I would like it to be possible to set --tensor-parallel-size from yaml, similar to gpu_memory_utilization and trust_remote_code in the vllm backend.
https://docs.vllm.ai/en/latest/models/engine_args.html
The current settings only allow inference on a single GPU.
This setting is necessary to make this compatible with multiple GPUs.
The text was updated successfully, but these errors were encountered: