Roadmap May 2023 #1220

ggerganov · 2023-04-28T18:31:39Z

ggerganov
Apr 28, 2023
Maintainer

dakennedyd · 2023-04-29T02:14:54Z

dakennedyd
Apr 29, 2023

3 replies

verhovsky Apr 29, 2023

What do you mean?

sroussey Apr 30, 2023

Likely meaning stablelm

Green-Sky Apr 30, 2023
Collaborator

no stablelm is supported (by ggml, see repo) but its quality is underwhelming....
still, latent diffusion models would be sick

4t0m · 2023-05-02T16:44:13Z

4t0m
May 2, 2023

If I'm understanding the idea of llama_state--that it will allow multiple "inference threads" from a single loaded model, then it definitely seems worth implementing, since it opens up a lot of possibilities.

Is the idea that we can get a lot of the same gains by just quickly swapping out stored contexts? A lot of llm applications benefit from having multiple instances that can build on one another, or different instances that receive diverse queries.

I haven't started using it yet, because I've been waiting for someone to post an example. It's a bit hard for me to parse the api.

8 replies

ggerganov May 20, 2023
Maintainer Author

The right way to do it is like we do it in whisper.cpp:

https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h

If someone wants to give it a try at implementing it here

bullno1 Jun 8, 2023

Should every public API has an implicit and an explicit _with_state & _from_state version though? That seems pretty verbose.
Granted, the implicit version could probably just forward to the explicit version.

Still, does that mean if one choose the explicit state version, the context would still contain an unused state?

ggerganov Jun 10, 2023
Maintainer Author

I agree that it is a bit over-verbose, but we didn't see a better way. Open to suggestions

Still, does that mean if one choose the explicit state version, the context would still contain an unused state?

There are init calls that explicitly do not create an internal state:

https://github.com/ggerganov/whisper.cpp/blob/57543c169e27312e7546d07ed0d8c6eb806ebc36/whisper.h#L109

didzis Jun 11, 2023

I suggest using the llama_context as the llama_state. For me that seems semantically correct and the most logical approach, it is fully backwards compatible requiring no changes for existing public API users. It's also very simple with only few changes internally and only few additions to the public API only for those who would like to use this feature.

I created a pull request here: #1797 (comment)

bullno1 Jun 12, 2023

I just tried to create a llama_state but the duplication in sampling is too much.
All sampling functions rely on ctx->rng.

It's also a constant duplication factor to all new public API going forward.

@didzis change is great, just 2 new public functions.

monircsueb · 2024-04-03T23:18:19Z

monircsueb
Apr 3, 2024

do we have weigh model support for Encodec? If yes, could you please tell me how to build the sgml-model.bin?
Thank you appreciate your help and time.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap May 2023 #1220

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 11 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Roadmap May 2023 #1220

ggerganov Apr 28, 2023 Maintainer

High-prio

Low-prio

Replies: 3 comments · 11 replies

Green-Sky Apr 30, 2023 Collaborator

ggerganov May 20, 2023 Maintainer Author

ggerganov Jun 10, 2023 Maintainer Author

ggerganov
Apr 28, 2023
Maintainer

Replies: 3 comments 11 replies

Green-Sky Apr 30, 2023
Collaborator

ggerganov May 20, 2023
Maintainer Author

ggerganov Jun 10, 2023
Maintainer Author