-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Chat Mode #662
Comments
Totally agreed but I'd love for this to be more data-driven. Either: (a) meta prompt engineering for popular models + easy addition of new adapters for new LMs if needed, or (b) automatic exploration of a new LM on standard tasks to automatically establish the patterns that "work" for that LM. Do you have a way that fixes the sql_query example you had? |
Also I wonder to what extent this behavior you saw is because of "Follow the following format." is not explicitly saying "Complete the unfilled fields in accordance with the following format." Basically the instruction is slightly misleading for chat models. |
My sense is that interleaving inputs/outputs as a default would be a footgun because I would assume all outputs depend on all inputs, and the LLM doesn’t have access to this. Right now our focus is using LiteLLM for broad support + moving over all dsp code into DSPy. I’d love to tackle something like this when we look at the current Template usage and how that’s currently responsible for going from Example => prompt, and offering users some more flexibility with how an LLM gets called with an example. |
@CyrusOfEden I'm not sure what you mean by "interleaving inputs/outputs". This is already how DSPy works, no? I think you misunderstood what I mean by (input_i, output_i). |
I feel the problem with this is most of the time the positioning of the user end token. Say you have a prompt template that wraps the user message in
It seems to me that the model is lead to believe that the user turn is done (and the user has written a complete template, albeit with an empty output). It would be more correct to say:
explicitly leaving the output field to the assistant (it suffices to create a
@okhat I have done this experiment by writing a mini-version of
I don't know precisely what you have in mind, but it seems to me that fixing the semantic of the multi-turn user-assistant conversation is orthogonal to the concern of wording the prompt differently. |
@meditans I suppose mixtral is not a "chat model" but an "instruction model". What Omar says about having the framework automatically find the best prompting would og course be great. We may also note that others have thought about how to best do few shot prompting with chat models. Such as |
@thomasahle you are right, I am using a (local, quantized) chat finetune of mixtral, not baseline mixtral. In fact, the langchain page you proposed is quite close to what I'm saying here (essentially the same thing). |
I see now and this makes sense to me! I thought it was inputs/outputs not examples/demos :) |
+1 to the problem @thomasahle is describing. I am also seeing it on gemini-1.0-pro. And +1 to @meditans , the root of the problem is that special tokens for conversational formatting are being added to the prompt without anyone really thinking about it. I like @thomasahle 's proposed solution of just formatting the few-shots in chat mode. The only downside I see is that it will no longer be possible to force the model to follow a specific prefix for the rationale. But this can probably be solved with some prompt engineering. Something like
|
Regardless of whether we do meta prompting or not, we will need to update the LM interface and template class to support chat formatting as a special case, since most LLM providers do not expose which special tokens they use to do chat formatting and only allow it through the API. This could probably be done during the LiteLLM integration. And since we're going to do that, I think it would be good to just put a default chat-style format that works ok for most models, while structuring the code in such a way that meta optimization can be added easily later. My intuition is that default prompts just need to be "good enough" to bootstrap a few good traces, and as long as that works people won't really care about how good the default prompt format is or care to optimize it for their particular model. |
When you say "default chat-style format", what do you have in mind? I struggle to understand if you're referring to the wording or the structure of the payload most api providers and local servers use. |
Also, regardless, could we leave a escape hatch for the user to provide a function that builds the arguments to send to the llm? Then one could just use the default one or provide tweaks. |
Adding few-shot examples in chat turns will probably not fix the fact that most programs will need to bootstrap starting from zero-shot prompts. But major +1 to any exploration of how to get most chat models to reliably understand that we want them to complete the missing fields. |
Btw I suspect this is easy. It's not happening right now just because no one ever tried :D. We've been using the same template since 2022 before RLHF and chat models (i.e., since text-davinci-002). The DSPy optimizers help make this less urgent than it would be otherwise because most models learn to do things properly with compiling, but ideally zero-shot (unoptimized) usage works reliably too. That will lead to better optimization. |
@isaacbmiller This is a great self-contained exploration. We can do this for 3-4 diverse chat models? |
Just catching up on this. It may be helpful for folks to take a look at the new Template class, it should contain all the TemplateV2/TemplateV3 functionality. Additionally, all functionality for generating a prompt and passing to the LM, are contained within the new Backends themselves. We've already got a We could always create a seperate version of the Currently to call the LMs we do this: # Generate Example
example = Example(demos=demos, **kwargs)
# Initialize and call template
# prompt is generated as a string
template = Template(signature)
prompt = template(example)
# Pass through language model provided
result = self.lm(prompt=prompt, **config)) It would be pretty straightforward to do something like this instead. # Generate Example
example = Example(demos=demos, **kwargs)
# Initialize and call template
# messages is generated as a [{"role": "...", "content": "..."}]
template = ChatTemplate(signature)
messages = template(example)
# Pass through language model provided
result = self.lm(messages=messages, **config)) @CyrusOfEden and I have chatted about this in the past, not sure how we should separate out Templates vs Backends. Each Backend, will need a We should be fairly close on landing the new Backend framework in Main, and then I think this is a great next step. |
I think it's an interesting idea to support multiple different Then you code would look like this: template = self.template_type(self.signature)
messages = template(example) @CyrusOfEden and @KCaverly would this fit into the refactor? |
Relevant discussion: #420 |
I've been running into this problem as well using typed predictors. @okhat I think your suggestion of substituting I'll note that the problem is especially bad with the |
FWIW - the new backed system would allow you to provide your own templates and supports chat mode. If you have a fuller example you can share, I would be keen to see if I can test it out and see if there are any improvements. |
@KCaverly Is there a better way to pass a template than through a config option? |
Ive been working on it here: #717. So far, im passing it during generation. The backend has a default template argument, that can be overridden in modules when the backend is called. Which would allow us to either pass them dynamically as the Module evolves, or set it at the Module level, and pass through etc. |
That looks great. I will take an in-depth look later today. Should I switch to building off of that branch? |
TIL you can "prefill" the responses from agents in both Claude and GPT: https://docs.anthropic.com/claude/docs/prefill-claudes-response import anthropic
client = anthropic.Anthropic(
# defaults to os.environ.get("ANTHROPIC_API_KEY")
api_key="my_api_key",
)
message = client.messages.create(
model="claude-2.1",
max_tokens=1000,
temperature=0,
messages=[
{
"role": "user",
"content": "Please extract the name, size, price, and color from this product description and output it within a JSON object.\n\n<description>The SmartHome Mini is a compact smart home assistant available in black or white for only $49.99. At just 5 inches wide, it lets you control lights, thermostats, and other connected devices via voice or app—no matter where you place it in your home. This affordable little hub brings convenient hands-free control to your smart devices.\n</description>"
}
{
"role": "assistant",
"content": "{"
}
]
)
print(message.content) This means we can still use "prefixes" when using the chat api. Also, moving the first output-variable to the "agent side" is probably better than what we do now - putting it at the end of the user side. Similar to @meditans comment about Mixtral. Does this fit into your new template system @KCaverly? |
If you take a look at the JSONBackend, we do something very similar. For json mode models, we prompt the model to complete the json as an incomplete object, as opposed to rewriting it from scratch. Additionally all of the demo objects are also shown in the completed JSON format which hopefully helps enforce the appropriate schema as well. |
I also think so because i use a lot of models, every single one has a different template the problem is compounded by DSPY , but the good news is that we can probably organise custom templates in a special .contrib folder , so that templates that naturally have to be written for any new model (or task!) can also be pushed upstream. |
Where should I look? In https://github.com/KCaverly/dspy/blob/f9c1adf837f1384fca60ed71dd2f32db47969746/dspy/modeling/templates/json.py#L29 it seems like everything gets stuffed into the user message. But this prefill trick should actually be used by the text template/backend too, I believe. If the backend API is chat. |
Everything is currently stuffed into on message, but instead of providing the question and asking for a json response, we send an incomplete json object and ask to complete it. Not quite prefilling, but kinda similar. |
Sending an incomplete object can work well if the model understands it's supposed to complete it. Prefilling makes this easier for chat api models. I'm not saying you always have to use prefilling, just asking if I'll be able to make a template that works this way? I should probably pull your code and try it out 😁 |
I think it should work, would be a great test. |
This PR should've added this functionality. |
@Serjobas What do you mean? |
Btw @isaacbmiller @thomasahle @KCaverly I confirmed that "Follow the format below. Start your completions where the supplied fields end." communicates what we intended more clearly here with chat models, though obviously pre-filling appears to be a better approach in conjunction with this. What we need IMO is a way to have a breaking version of DSPy (where all old caches and everything else will not work anymore) and a sustained version, until we make a major release. One way to do that is to have a flag, e.g. |
@okhat that sounds good to me. Currently, the backend-refactor branch, is built to be completely backwards compatible, with the only breaking changes surround versioning on openai, others etc for litellm, but litellm is set up as an optional extra. If a backend is configured, we would prioritize and use the new backend structure otherwise it would operate the same as the old method. Maybe we would want some sort of deprecation message, in the interim pointing to new documentation on the backend? |
What do you think about #717 's approach of doing dspy.settings.configure(backend=TemplateBackend(ChatTemplate())) That wouldn't break any existing code / notebooks. I think the modified default guidelines you mention could be helpful, but as you say, they break existing caches. But the changes needed for this particular issue seem to not need breaking anything. If we keep the default backend to be an exact copy of the current behavior. |
Sorry for jumping in pretty late to this discussion - but I think I have some ideas and I really want to use them and understand how I can contribute. in a few shot manner: I think this makes the most sense if I look at the formatting of prompts today in the sense of keeping the signature + output formatting in the system prompt of the model and the examples as chat conversation. @okhat @thomasahle your thoughts? |
@flexorRegev exactly! Support for this is landing in backend-refactor |
Is there a workaround identified for this until the new backend is completed? I am running into this repeatedly with GPT4 |
Follow up here, with
Not all models have the system_prompt parameter, so I may try to weave this in with an assertion instead of cutting into the base code given y'all are working on |
@meditans @thomasahle I'm not sure if this is related to the mentioned repetition of the prompt, since I don't know which model client you used. In my case, I think there is a mistake with the output for the I don't understand how the current try-except block can be used to decide which kind of output (with or without past) it is? The type of the output should probably be checked. |
this workaround doesn't work for me (when using gpt4o). it still outputs "Reasoning:" at the start of its reasoning when doing CoT Support for chat models is pretty critical given the limitations on instruct models (e.g. no GPT4 instruct model) |
Hey did you find any workaround, I am having the same issue. |
My workaround is to just use a TypedPredictor rather than the TypedChainOfThoughtPredictor and then include fields in my schemas called 'reasoning' whose descriptions as for a CoT-style justification of the output. Put these fields earlier up in your data structure -- it will only help if the reasoning precedes the fields which contain the important output. This way the prompt and expected output is no longer a confusing mix of unstructured and structured text -- instead, it's all structured. As I primarily use OpenAI's models, I really wish that function calling were supported, similar to how the Instructor package does it. The typed predictors seem so similar in spirit to the Instructor approach, it's just that Instructor handles the communication of the expected output format much better. It would be interesting to see someone build an optimization layer on top of instructor. |
Interesting paper on how function calling affects downstream performance in some models released this week: https://arxiv.org/pdf/2408.02442 |
(blob below copy-pasted here since I'm closing related issues) Thanks for opening this! We released DSPy 2.5 yesterday. I think the new dspy.LM and the underlying dspy.ChatAdapter will probably resolve this problem. Here's the (very short) migration guide, it should typically take you 2-3 minutes to change the LM definition and you should be good to go: https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb Please let us know if this resolves your issue. I will close for now but please feel free to re-open if the problem persists. |
Hopefully the new LM backend will allow us to make better use of models that are trained for "Chat".
Below is a good example of how even good models like GPT-3.5 are currently have trouble understanding the basic DSPy format:
Right now we use chat mode as if it was completion mode.
We send:
And expect the agent to reply with
A better use of the Chat APIs would be to send
That is, we simulate a previous chat, where the agent always replied with the output in the format we expect.
This teaches the agent to not start it's message with "OK! Let me get to it!" or repeating the template as in the gpt-3.5 screenshot above.
Also, using the system message for the guidance should help avoid prompt injection attacks.
The text was updated successfully, but these errors were encountered: