[Feature Request]: Continuous batching

Does torchchat plan to support asynchronous requests and continuous batching?


To get higher tokens/second by making efficient use of compute, continuous batching is a common strategy that is used.

We could specify the `batch_size` `n` as a parameter and `torchchat` behind the scene would send `n` number of prompts with varying lengths asynchronously 

```
python3 torchchat.py generate llama3 --prompt "write me a story about a boy and his bear" --batch_size 8
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request]: Continuous batching #857

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request]: Continuous batching #857

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions