Why do I want input state instead of the "traditional model" #16

nedtwigg · 2024-04-15T20:47:53Z

From your docs:

The structure of an effective Transformer LLM call is:

System Prompt

Preamble

Few shot-examples

Question

With statecraft and your SSM, we instead simply have:

Inputted state (with problem context, initial instructions, textbooks, and few-shot examples all baked in)

Short question

So the goal of this new framing is to take:

System Prompt
Preamble
Few shot-examples

And mush them all into a "state". So now instead of having to pass all the prompt / few-shot examples around, I can just pass the state.

Pros: cheaper/faster in terms of compute, you can hide proprietary prompts and examples, you can use statespace-specific "warmup" techniques that might not be possible with transformers, e.g. closed-loop warmup
Cons: you can't look under the hood of the prompts that created the state

Something Docker did well was the "docker file" vs "docker image" thing. The dockerimage is the binary recording of a machine state, the dockerfile are the instructions that creates the image. So you can understand an image by looking at its dockerfile, and you can hack a dockerfile to get a new image.

If I understand correctly, Statecraft has an analog to "docker image", but it is missing the analogous "dockerfile". Do I understand the system correctly?

koayon · 2024-04-15T23:35:40Z

Hey @nedtwigg, great question!

So the goal of this new framing is to take:

System Prompt
Preamble
Few shot-examples
And mush them all into a "state". So now instead of having to pass all the prompt / few-shot examples around, I can just pass the state.

Pros: cheaper/faster in terms of compute, you can hide proprietary prompts and examples, you can use statespace-specific "warmup" techniques that might not be possible with transformers, e.g. closed-loop warmup

Yes that's exactly correct, I love the way you frame it here! 🙌

If I understand correctly, Statecraft has an analog to "docker image", but it is missing the analogous "dockerfile". Do I understand the system correctly?

I really like the Docker analogy perhaps I should lean into this more!

You're correct that the state acts as the Docker Image in this case. In terms of the Dockerfile, each state comes with a metadata json file which contains the name of the model that was used to create it (and hence the configuration) as well as the prompt that was used to create the state (or a url_reference to the text that was used for the prompt, if the prompt is long).
This acts as the Dockerfile analogue which can be used to create the image and the metadata also contains a plain-text description for any other notes on the state creation or intended usage.

Perhaps I should align the messaging with Docker and call this object the Statefile or similar, that's an interesting point 💡

Yeah you have a great understanding, thanks again for your question!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do I want input state instead of the "traditional model" #16

Why do I want input state instead of the "traditional model" #16

nedtwigg commented Apr 15, 2024

koayon commented Apr 15, 2024 •

edited

Loading

Why do I want input state instead of the "traditional model" #16

Why do I want input state instead of the "traditional model" #16

Comments

nedtwigg commented Apr 15, 2024

koayon commented Apr 15, 2024 • edited Loading

koayon commented Apr 15, 2024 •

edited

Loading