Drop in support for Distributed Llama for more speed and larger LLM support #6599
KodeMunkie
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Feature request use case:
Distributing Inference with drop in support for https://github.com/b4rtaz/distributed-llama
If a drop in replacement is possible, or using a similar toolkit, I would like to run >70B models and distribute the inference among multiple machines allowing greater performance if network bandwidth allows.
A number of people like me have old machines and Raspberry PI 4/5s lying around capable of ~1-3 TOPS each and having 8-32Gb of RAM and this would allow most home users to have a grade of LLM model typically requiring cloud compute level of resources, locally.
N.b. if this is too much of a refactor effort please leave / consider the cost-benefit vs more worthwhile features (there are plenty of others that are useful to me instead :) )
Beta Was this translation helpful? Give feedback.
All reactions