Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do I want input state instead of the "traditional model" #16

Open
nedtwigg opened this issue Apr 15, 2024 · 1 comment
Open

Why do I want input state instead of the "traditional model" #16

nedtwigg opened this issue Apr 15, 2024 · 1 comment

Comments

@nedtwigg
Copy link

From your docs:

The structure of an effective Transformer LLM call is:

  • System Prompt
  • Preamble
  • Few shot-examples
  • Question

With statecraft and your SSM, we instead simply have:

  • Inputted state (with problem context, initial instructions, textbooks, and few-shot examples all baked in)
  • Short question

So the goal of this new framing is to take:

  • System Prompt
  • Preamble
  • Few shot-examples

And mush them all into a "state". So now instead of having to pass all the prompt / few-shot examples around, I can just pass the state.

Pros: cheaper/faster in terms of compute, you can hide proprietary prompts and examples, you can use statespace-specific "warmup" techniques that might not be possible with transformers, e.g. closed-loop warmup
Cons: you can't look under the hood of the prompts that created the state

Something Docker did well was the "docker file" vs "docker image" thing. The dockerimage is the binary recording of a machine state, the dockerfile are the instructions that creates the image. So you can understand an image by looking at its dockerfile, and you can hack a dockerfile to get a new image.

If I understand correctly, Statecraft has an analog to "docker image", but it is missing the analogous "dockerfile". Do I understand the system correctly?

@koayon
Copy link
Owner

koayon commented Apr 15, 2024

Hey @nedtwigg, great question!

So the goal of this new framing is to take:

System Prompt
Preamble
Few shot-examples
And mush them all into a "state". So now instead of having to pass all the prompt / few-shot examples around, I can just pass the state.

Pros: cheaper/faster in terms of compute, you can hide proprietary prompts and examples, you can use statespace-specific "warmup" techniques that might not be possible with transformers, e.g. closed-loop warmup

Yes that's exactly correct, I love the way you frame it here! 🙌

If I understand correctly, Statecraft has an analog to "docker image", but it is missing the analogous "dockerfile". Do I understand the system correctly?

I really like the Docker analogy perhaps I should lean into this more!

You're correct that the state acts as the Docker Image in this case. In terms of the Dockerfile, each state comes with a metadata json file which contains the name of the model that was used to create it (and hence the configuration) as well as the prompt that was used to create the state (or a url_reference to the text that was used for the prompt, if the prompt is long).
This acts as the Dockerfile analogue which can be used to create the image and the metadata also contains a plain-text description for any other notes on the state creation or intended usage.

Perhaps I should align the messaging with Docker and call this object the Statefile or similar, that's an interesting point 💡

Yeah you have a great understanding, thanks again for your question!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants