Drop in support for Distributed Llama for more speed and larger LLM support #6599

KodeMunkie · 2024-10-30T21:41:14Z

KodeMunkie
Oct 30, 2024

Feature request use case:

Distributing Inference with drop in support for https://github.com/b4rtaz/distributed-llama

If a drop in replacement is possible, or using a similar toolkit, I would like to run >70B models and distribute the inference among multiple machines allowing greater performance if network bandwidth allows.

A number of people like me have old machines and Raspberry PI 4/5s lying around capable of ~1-3 TOPS each and having 8-32Gb of RAM and this would allow most home users to have a grade of LLM model typically requiring cloud compute level of resources, locally.

N.b. if this is too much of a refactor effort please leave / consider the cost-benefit vs more worthwhile features (there are plenty of others that are useful to me instead :) )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Drop in support for Distributed Llama for more speed and larger LLM support #6599

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Drop in support for Distributed Llama for more speed and larger LLM support #6599

Uh oh!

Uh oh!

KodeMunkie Oct 30, 2024

Feature request use case:

Replies: 0 comments

KodeMunkie
Oct 30, 2024