Support Chat Mode #662

thomasahle · 2024-03-16T01:50:48Z

Hopefully the new LM backend will allow us to make better use of models that are trained for "Chat".
Below is a good example of how even good models like GPT-3.5 are currently have trouble understanding the basic DSPy format:

Right now we use chat mode as if it was completion mode.
We send:

messages: [
{"from": "user", "message": "guidance, input0, output0, input1, output1, input2"}
}

And expect the agent to reply with

{"from": "agent", "message": "output2"}

A better use of the Chat APIs would be to send

messages: [
{"from": "system", "message": guidance},
{"from": "user", "message": input0},
{"from": "agent", "message": output0},
{"from": "user", "message": input1},
{"from": "agent", "message": output2},
{"from": "user", "message": input2},
]

That is, we simulate a previous chat, where the agent always replied with the output in the format we expect.
This teaches the agent to not start it's message with "OK! Let me get to it!" or repeating the template as in the gpt-3.5 screenshot above.

Also, using the system message for the guidance should help avoid prompt injection attacks.

The text was updated successfully, but these errors were encountered:

okhat · 2024-03-16T01:53:51Z

Totally agreed but I'd love for this to be more data-driven.

Either: (a) meta prompt engineering for popular models + easy addition of new adapters for new LMs if needed, or (b) automatic exploration of a new LM on standard tasks to automatically establish the patterns that "work" for that LM.

Do you have a way that fixes the sql_query example you had?

okhat · 2024-03-16T01:58:00Z

Also I wonder to what extent this behavior you saw is because of "Follow the following format." is not explicitly saying "Complete the unfilled fields in accordance with the following format." Basically the instruction is slightly misleading for chat models.

CyrusOfEden · 2024-03-16T03:18:14Z

My sense is that interleaving inputs/outputs as a default would be a footgun because I would assume all outputs depend on all inputs, and the LLM doesn’t have access to this.

Right now our focus is using LiteLLM for broad support + moving over all dsp code into DSPy.

I’d love to tackle something like this when we look at the current Template usage and how that’s currently responsible for going from Example => prompt, and offering users some more flexibility with how an LLM gets called with an example.

thomasahle · 2024-03-17T00:19:43Z

@CyrusOfEden I'm not sure what you mean by "interleaving inputs/outputs". This is already how DSPy works, no?

I think you misunderstood what I mean by (input_i, output_i).
I'm talking about an entire example/demo. Not two fields from the same demo.

meditans · 2024-03-17T06:33:27Z

I feel the problem with this is most of the time the positioning of the user end token. Say you have a prompt template that wraps the user message in [INST] and [/INST], like in mixtral. You would have:

[INST]
...

input1: ...
input2: ...
output:[/INST]

???

It seems to me that the model is lead to believe that the user turn is done (and the user has written a complete template, albeit with an empty output). It would be more correct to say:

[INST]
...

input1: ...
input2: ...[/INST]

output: ???

explicitly leaving the output field to the assistant (it suffices to create a {"role": "assistant", "content":"output:"} message).

Also I wonder to what extent this behavior you saw is because of "Follow the following format." is not explicitly saying "Complete the unfilled fields in accordance with the following format." Basically the instruction is slightly misleading for chat models.

@okhat I have done this experiment by writing a mini-version of Predict myself, with the prompt you are suggesting. I still have the same problem @thomasahle demonstrated in his initial post. This reinforces for me the belief that the token position is to blame. The version I proposed works instead.

Totally agreed but I'd love for this to be more data-driven.

I don't know precisely what you have in mind, but it seems to me that fixing the semantic of the multi-turn user-assistant conversation is orthogonal to the concern of wording the prompt differently.

thomasahle · 2024-03-17T07:29:19Z

@meditans I suppose mixtral is not a "chat model" but an "instruction model".

What Omar says about having the framework automatically find the best prompting would og course be great.
But if Mixtral can be shown to work better in 90% of cases with @meditans token placement, then I'd be more than happy to just have that built into the Mixtral LM class.

We may also note that others have thought about how to best do few shot prompting with chat models. Such as

meditans · 2024-03-17T07:53:06Z

@thomasahle you are right, I am using a (local, quantized) chat finetune of mixtral, not baseline mixtral.

In fact, the langchain page you proposed is quite close to what I'm saying here (essentially the same thing).

CyrusOfEden · 2024-03-17T18:25:57Z

@CyrusOfEden I'm not sure what you mean by "interleaving inputs/outputs". This is already how DSPy works, no?

I think you misunderstood what I mean by (input_i, output_i).

I'm talking about an entire example/demo. Not two fields from the same demo.

I see now and this makes sense to me! I thought it was inputs/outputs not examples/demos :)

mitchellgordon95 · 2024-03-19T01:01:05Z

+1 to the problem @thomasahle is describing. I am also seeing it on gemini-1.0-pro.

And +1 to @meditans , the root of the problem is that special tokens for conversational formatting are being added to the prompt without anyone really thinking about it.

I like @thomasahle 's proposed solution of just formatting the few-shots in chat mode. The only downside I see is that it will no longer be possible to force the model to follow a specific prefix for the rationale. But this can probably be solved with some prompt engineering. Something like

Follow the following format.

Question: ${question}
Rationale: Let's think step by step in order to ${produce the answer}. We ...
Answer: ${answer}

Repeat the user's message verbatim, and then finish the example.

mitchellgordon95 · 2024-03-19T01:24:52Z

Regardless of whether we do meta prompting or not, we will need to update the LM interface and template class to support chat formatting as a special case, since most LLM providers do not expose which special tokens they use to do chat formatting and only allow it through the API. This could probably be done during the LiteLLM integration.

And since we're going to do that, I think it would be good to just put a default chat-style format that works ok for most models, while structuring the code in such a way that meta optimization can be added easily later. My intuition is that default prompts just need to be "good enough" to bootstrap a few good traces, and as long as that works people won't really care about how good the default prompt format is or care to optimize it for their particular model.

meditans · 2024-03-19T01:59:00Z

I think it would be good to just put a default chat-style format that works ok for most models

When you say "default chat-style format", what do you have in mind? I struggle to understand if you're referring to the wording or the structure of the payload most api providers and local servers use.

meditans · 2024-03-19T02:03:34Z

Also, regardless, could we leave a escape hatch for the user to provide a function that builds the arguments to send to the llm? Then one could just use the default one or provide tweaks.

okhat · 2024-03-19T13:55:28Z

Adding few-shot examples in chat turns will probably not fix the fact that most programs will need to bootstrap starting from zero-shot prompts. But major +1 to any exploration of how to get most chat models to reliably understand that we want them to complete the missing fields.

okhat · 2024-03-19T13:57:14Z

Btw I suspect this is easy. It's not happening right now just because no one ever tried :D. We've been using the same template since 2022 before RLHF and chat models (i.e., since text-davinci-002). The DSPy optimizers help make this less urgent than it would be otherwise because most models learn to do things properly with compiling, but ideally zero-shot (unoptimized) usage works reliably too. That will lead to better optimization.

okhat · 2024-03-19T13:57:45Z

@isaacbmiller This is a great self-contained exploration. We can do this for 3-4 diverse chat models?

KCaverly · 2024-03-19T19:19:32Z

Just catching up on this. It may be helpful for folks to take a look at the new Template class, it should contain all the TemplateV2/TemplateV3 functionality.

Additionally, all functionality for generating a prompt and passing to the LM, are contained within the new Backends themselves. We've already got a TemplateBackend which should match current functionality, along with a JSON backend which returns the content as a JSON directly.

We could always create a seperate version of the Template which returns the Signature + Examples, as a series of ChatML messages, which we then pass instead of a prompt directly to the LiteLM model.

Currently to call the LMs we do this:

# Generate Example
example = Example(demos=demos, **kwargs)

# Initialize and call template
# prompt is generated as a string
template = Template(signature)
prompt = template(example)

# Pass through language model provided
result = self.lm(prompt=prompt, **config))

It would be pretty straightforward to do something like this instead.
For the BaseLM abstraction we could make both messages and prompt optional, and ensure that one or the other, but not both are provided.

# Generate Example
example = Example(demos=demos, **kwargs)

# Initialize and call template
# messages is generated as a [{"role": "...", "content": "..."}]
template = ChatTemplate(signature)
messages = template(example)

# Pass through language model provided
result = self.lm(messages=messages, **config))

@CyrusOfEden and I have chatted about this in the past, not sure how we should separate out Templates vs Backends. Each Backend, will need a Template of some kind to format prompts, but each Backend, can leverage a variety of Template so its not quite one-to-one.

We should be fairly close on landing the new Backend framework in Main, and then I think this is a great next step.

thomasahle · 2024-03-19T20:32:22Z

I think it's an interesting idea to support multiple different Template's.
I assume the code you wrote would all be inside Predict, so the user never actually has to call self.lm(...).
Maybe a Predict can have a template, similar to how it has a signature.
Then we can even have a TemplateOptimizer that optimizes the template the same way SignatureOptimizer optimizes the signature.

Then you code would look like this:

template = self.template_type(self.signature)
messages = template(example)

@CyrusOfEden and @KCaverly would this fit into the refactor?

thomasahle · 2024-03-20T23:28:21Z

Some more examples of LMs being unable to understand the basic format:
claude-3-opus-20240229:

gpt-4:

gpt-3.5-turbo:

gpt-3.5-turbo-instruct:

thomasnormal · 2024-03-21T21:05:31Z

Relevant discussion: #420

conradlee · 2024-03-27T06:38:31Z

I've been running into this problem as well using typed predictors.

@okhat I think your suggestion of substituting "Follow the following format." with "Complete the unfilled fields in accordance with the following format." if a chat model is used would go a long way in the short run, but in the long run an approach that adapts the prompting technique to each model would be idea.

I'll note that the problem is especially bad with the TypedChainOfThought predictor, because of the way this one mixes structured output with unstructured 'Think it through step by step'. This leads the model to produce bits of unstructured text where DSPy expects a structured output.

KCaverly · 2024-03-27T11:20:35Z

FWIW - the new backed system would allow you to provide your own templates and supports chat mode. If you have a fuller example you can share, I would be keen to see if I can test it out and see if there are any improvements.

isaacbmiller · 2024-03-27T13:34:28Z

@KCaverly Is there a better way to pass a template than through a config option?

KCaverly · 2024-03-27T13:42:44Z

Ive been working on it here: #717.

So far, im passing it during generation. The backend has a default template argument, that can be overridden in modules when the backend is called. Which would allow us to either pass them dynamically as the Module evolves, or set it at the Module level, and pass through etc.

isaacbmiller · 2024-03-27T14:21:20Z

That looks great. I will take an in-depth look later today.

Should I switch to building off of that branch?

thomasahle · 2024-03-28T19:20:08Z

TIL you can "prefill" the responses from agents in both Claude and GPT: https://docs.anthropic.com/claude/docs/prefill-claudes-response

import anthropic

client = anthropic.Anthropic(
    # defaults to os.environ.get("ANTHROPIC_API_KEY")
    api_key="my_api_key",
)
message = client.messages.create(
    model="claude-2.1",
    max_tokens=1000,
    temperature=0,
    messages=[
        {
            "role": "user",
            "content": "Please extract the name, size, price, and color from this product description and output it within a JSON object.\n\n<description>The SmartHome Mini is a compact smart home assistant available in black or white for only $49.99. At just 5 inches wide, it lets you control lights, thermostats, and other connected devices via voice or app—no matter where you place it in your home. This affordable little hub brings convenient hands-free control to your smart devices.\n</description>"
        }
        {
            "role": "assistant",
            "content": "{"
        }
    ]
)
print(message.content)

This means we can still use "prefixes" when using the chat api.

Also, moving the first output-variable to the "agent side" is probably better than what we do now - putting it at the end of the user side. Similar to @meditans comment about Mixtral.

Does this fit into your new template system @KCaverly?

KCaverly · 2024-03-28T19:26:47Z

If you take a look at the JSONBackend, we do something very similar. For json mode models, we prompt the model to complete the json as an incomplete object, as opposed to rewriting it from scratch. Additionally all of the demo objects are also shown in the completed JSON format which hopefully helps enforce the appropriate schema as well.

Josephrp · 2024-03-28T19:35:59Z

I think it's an interesting idea to support multiple different Template's.

I also think so because i use a lot of models, every single one has a different template the problem is compounded by DSPY , but the good news is that we can probably organise custom templates in a special .contrib folder , so that templates that naturally have to be written for any new model (or task!) can also be pushed upstream.

thomasahle · 2024-03-28T20:18:04Z

If you take a look at the JSONBackend, we do something very similar. For json mode models, we prompt the model to complete the json as an incomplete object, as opposed to rewriting it from scratch. Additionally all of the demo objects are also shown in the completed JSON format which hopefully helps enforce the appropriate schema as well.

Where should I look? In https://github.com/KCaverly/dspy/blob/f9c1adf837f1384fca60ed71dd2f32db47969746/dspy/modeling/templates/json.py#L29 it seems like everything gets stuffed into the user message.

But this prefill trick should actually be used by the text template/backend too, I believe. If the backend API is chat.

KCaverly · 2024-03-28T20:21:55Z

Everything is currently stuffed into on message, but instead of providing the question and asking for a json response, we send an incomplete json object and ask to complete it. Not quite prefilling, but kinda similar.

thomasahle · 2024-03-29T00:24:27Z

Sending an incomplete object can work well if the model understands it's supposed to complete it. Prefilling makes this easier for chat api models.

I'm not saying you always have to use prefilling, just asking if I'll be able to make a template that works this way?

I should probably pull your code and try it out 😁

KCaverly · 2024-03-29T00:34:56Z

I think it should work, would be a great test.

Serjobas · 2024-03-29T08:47:32Z

#701

This PR should've added this functionality.

thomasahle · 2024-03-30T20:59:21Z

@Serjobas What do you mean?

okhat · 2024-03-30T22:11:31Z

Btw @isaacbmiller @thomasahle @KCaverly I confirmed that "Follow the format below. Start your completions where the supplied fields end." communicates what we intended more clearly here with chat models, though obviously pre-filling appears to be a better approach in conjunction with this.

What we need IMO is a way to have a breaking version of DSPy (where all old caches and everything else will not work anymore) and a sustained version, until we make a major release.

One way to do that is to have a flag, e.g. dspy.settings.configure(mode=2024) or something like that.

KCaverly · 2024-03-30T23:17:20Z

@okhat that sounds good to me. Currently, the backend-refactor branch, is built to be completely backwards compatible, with the only breaking changes surround versioning on openai, others etc for litellm, but litellm is set up as an optional extra. If a backend is configured, we would prioritize and use the new backend structure otherwise it would operate the same as the old method.

Maybe we would want some sort of deprecation message, in the interim pointing to new documentation on the backend?

thomasahle · 2024-03-31T02:51:22Z

Btw @isaacbmiller @thomasahle @KCaverly I confirmed that "Follow the format below. Start your completions where the supplied fields end." communicates what we intended more clearly here with chat models, though obviously pre-filling appears to be a better approach in conjunction with this.

What we need IMO is a way to have a breaking version of DSPy (where all old caches and everything else will not work anymore) and a sustained version, until we make a major release.

What do you think about #717 's approach of doing

dspy.settings.configure(backend=TemplateBackend(ChatTemplate()))

That wouldn't break any existing code / notebooks.

I think the modified default guidelines you mention could be helpful, but as you say, they break existing caches.
I also agree that we might need to have a "breaking changes" release at one point. Maybe mode=2024 is a good approach for this, or maybe we can use feature flags, like configure(enable_backends=True).

But the changes needed for this particular issue seem to not need breaking anything. If we keep the default backend to be an exact copy of the current behavior.

flexorRegev · 2024-04-09T11:33:18Z

Sorry for jumping in pretty late to this discussion - but I think I have some ideas and I really want to use them and understand how I can contribute.
The reality I see with using chat models is that what usually works the best at making them follow instructions in a few-shot task specific setting would be a format like this:
in a zero shot manner:
System: """|Task description|
|Output formatting requirements|"""
User: """|User input|"""

I think this makes the most sense if I look at the formatting of prompts today in the sense of keeping the signature + output formatting in the system prompt of the model and the examples as chat conversation.
what would be the correct way to support that?
I'd be glad to contribute an example/ PR if any revision is needed because from what I'm seeing this boosts performance and instruction following by quite a bit on chat models I'm working on..

@okhat @thomasahle your thoughts?

CyrusOfEden · 2024-04-09T13:57:55Z

@flexorRegev exactly! Support for this is landing in backend-refactor

ryanh-ai · 2024-04-25T05:43:19Z

Is there a workaround identified for this until the new backend is completed? I am running into this repeatedly with GPT4

ryanh-ai · 2024-04-28T14:05:52Z

Follow up here, with dspy.OpenAI models, I was able to make the below work:

gpt4_turbo = dspy.OpenAI(model='gpt-4-turbo-2024-04-09', 
                                    system_prompt="Follow the format below. Start your completions where the supplied fields end.", 
                                    model_type='chat')

Not all models have the system_prompt parameter, so I may try to weave this in with an assertion instead of cutting into the base code given y'all are working on backend_refeactor

wullli · 2024-05-02T13:00:59Z

@meditans @thomasahle I'm not sure if this is related to the mentioned repetition of the prompt, since I don't know which model client you used. In my case, I think there is a mistake with the output for the HFModel class. For example, when I load a model that has the MistralForCausalLMarchitecture, the attribute drop_prompt_from_output will be set to false. However, looking at the docs, we can see that the model actually returns a transformers.modeling_outputs.CausalLMOutputWithPast. So of course the prompt will be included.

I don't understand how the current try-except block can be used to decide which kind of output (with or without past) it is? The type of the output should probably be checked.

derenrich · 2024-05-14T00:18:27Z

Follow up here, with dspy.OpenAI models, I was able to make the below work:
gpt4_turbo = dspy.OpenAI(model='gpt-4-turbo-2024-04-09', 
                                    system_prompt="Follow the format below. Start your completions where the supplied fields end.", 
                                    model_type='chat')
Not all models have the system_prompt parameter, so I may try to weave this in with an assertion instead of cutting into the base code given y'all are working on backend_refeactor

this workaround doesn't work for me (when using gpt4o). it still outputs "Reasoning:" at the start of its reasoning when doing CoT

Support for chat models is pretty critical given the limitations on instruct models (e.g. no GPT4 instruct model)

firoz47 · 2024-05-22T12:44:40Z

Hey did you find any workaround, I am having the same issue.

conradlee · 2024-05-23T10:48:23Z

My workaround is to just use a TypedPredictor rather than the TypedChainOfThoughtPredictor and then include fields in my schemas called 'reasoning' whose descriptions as for a CoT-style justification of the output. Put these fields earlier up in your data structure -- it will only help if the reasoning precedes the fields which contain the important output.

This way the prompt and expected output is no longer a confusing mix of unstructured and structured text -- instead, it's all structured.

As I primarily use OpenAI's models, I really wish that function calling were supported, similar to how the Instructor package does it. The typed predictors seem so similar in spirit to the Instructor approach, it's just that Instructor handles the communication of the expected output format much better. It would be interesting to see someone build an optimization layer on top of instructor.

mikeedjones · 2024-08-07T11:13:00Z

Interesting paper on how function calling affects downstream performance in some models released this week: https://arxiv.org/pdf/2408.02442

okhat · 2024-09-24T17:14:38Z

(blob below copy-pasted here since I'm closing related issues)

Thanks for opening this! We released DSPy 2.5 yesterday. I think the new dspy.LM and the underlying dspy.ChatAdapter will probably resolve this problem.

Here's the (very short) migration guide, it should typically take you 2-3 minutes to change the LM definition and you should be good to go: https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb

Please let us know if this resolves your issue. I will close for now but please feel free to re-open if the problem persists.

arnavsinghvi11 mentioned this issue Mar 18, 2024

Empty Prediction #630

Closed

arnavsinghvi11 mentioned this issue Mar 24, 2024

Feature mistralai api #608

Merged

thomasnormal mentioned this issue Mar 27, 2024

Final API Changes for Backend LM Refactor #717

Merged

arnavsinghvi11 mentioned this issue Apr 16, 2024

HFModel puts the entire prompt as the output #823

Open

imflash217 mentioned this issue Apr 22, 2024

Need help on the observations of dspy experiments #838

Open

AriMKatz mentioned this issue Apr 26, 2024

Support for Chat-Completion model APIs #243

Open

okhat closed this as completed Sep 24, 2024

Support Chat Mode #662

Support Chat Mode #662

Comments

thomasahle commented Mar 16, 2024

okhat commented Mar 16, 2024

okhat commented Mar 16, 2024

CyrusOfEden commented Mar 16, 2024

thomasahle commented Mar 17, 2024 • edited Loading

meditans commented Mar 17, 2024 • edited Loading

thomasahle commented Mar 17, 2024 • edited Loading

meditans commented Mar 17, 2024 • edited Loading

CyrusOfEden commented Mar 17, 2024

mitchellgordon95 commented Mar 19, 2024

mitchellgordon95 commented Mar 19, 2024 • edited Loading

meditans commented Mar 19, 2024

meditans commented Mar 19, 2024

okhat commented Mar 19, 2024

okhat commented Mar 19, 2024

okhat commented Mar 19, 2024

KCaverly commented Mar 19, 2024

thomasahle commented Mar 19, 2024 • edited Loading

thomasahle commented Mar 20, 2024 • edited Loading

thomasnormal commented Mar 21, 2024

conradlee commented Mar 27, 2024

KCaverly commented Mar 27, 2024

isaacbmiller commented Mar 27, 2024

KCaverly commented Mar 27, 2024

isaacbmiller commented Mar 27, 2024

thomasahle commented Mar 28, 2024

KCaverly commented Mar 28, 2024

Josephrp commented Mar 28, 2024

thomasahle commented Mar 28, 2024

KCaverly commented Mar 28, 2024

thomasahle commented Mar 29, 2024

KCaverly commented Mar 29, 2024

Serjobas commented Mar 29, 2024 • edited Loading

thomasahle commented Mar 30, 2024

okhat commented Mar 30, 2024

KCaverly commented Mar 30, 2024

thomasahle commented Mar 31, 2024

flexorRegev commented Apr 9, 2024 • edited Loading

CyrusOfEden commented Apr 9, 2024

ryanh-ai commented Apr 25, 2024

ryanh-ai commented Apr 28, 2024 • edited Loading

wullli commented May 2, 2024 • edited Loading

derenrich commented May 14, 2024

firoz47 commented May 22, 2024

conradlee commented May 23, 2024

mikeedjones commented Aug 7, 2024

okhat commented Sep 24, 2024

thomasahle commented Mar 17, 2024 •

edited

Loading

meditans commented Mar 17, 2024 •

edited

Loading

thomasahle commented Mar 17, 2024 •

edited

Loading

meditans commented Mar 17, 2024 •

edited

Loading

mitchellgordon95 commented Mar 19, 2024 •

edited

Loading

thomasahle commented Mar 19, 2024 •

edited

Loading

thomasahle commented Mar 20, 2024 •

edited

Loading

Serjobas commented Mar 29, 2024 •

edited

Loading

flexorRegev commented Apr 9, 2024 •

edited

Loading

ryanh-ai commented Apr 28, 2024 •

edited

Loading

wullli commented May 2, 2024 •

edited

Loading