-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lack of Guidance on Optimizing/Finetuning ReAct Agent with Few-shot Examples #703
Comments
Agents have not been the priority. But they're no different to other programs: import dspy
# Define some models.
gpt3 = dspy.OpenAI('gpt-3.5-turbo-0125', max_tokens=1000)
colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.configure(lm=gpt3, rm=colbert)
# Declare the agent.
agent = dspy.ReAct("question -> answer", tools=[dspy.Retrieve(k=1)])
# Try it in zero-shot mode.
agent(question="what is 1+1?")
# See what happened in the final N prompts.
gpt3.inspect_history(n=1)
# Get some data to optimize.
from dspy.datasets import HotPotQA
dataset = HotPotQA(train_seed=1, train_size=200, eval_seed=2023, dev_size=500, test_size=0)
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]
# Let's optimize
from dspy.teleprompt import BootstrapFewShotWithRandomSearch
tp = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match, max_bootstrapped_demos=2, max_labeled_demos=0, num_candidate_programs=5, num_threads=8)
compiled_agent = tp.compile(agent, trainset=trainset[:50], valset=trainset[50:150])
# Now you can use the compiled_agent
compiled_agent(question="how many storeys are in the castle that David Gregory inherited?") Hope this helps. |
Thanks for the quick response @okhat! Perhaps, I initially need to provide a comprehensive explanation for the ReAct agent that I intend to optimize for operation. The objective of this agent is to navigate within a mobile phone app (or any screen in general). As such, the agent integrates the following functionalities (tools):
How would the DSPy framework optimize for this specific task? The screen description and proposed action are dynamically constructed. From my understanding the LLM shall view the whole ReAct cycle as few-shot examples rather then providing a question and the answer (like you did with the HotPotQA example). Hence, it won't be sufficient. Currently, I employ LangChain and Mixtral8x7b for this purpose, with a customized ReAct prompt and a few custom-made trajectories. Hence, I wonder, if I can switch to the DSPy framework for the exactly the reasons you mention within the FAQ section (https://dspy-docs.vercel.app/docs/faqs) |
The current ReAct documentation lacks clear instructions on optimizing or finetuning a ReAct agent using few-shot examples. Both the main ReAct documentation ReAct Docs and the examples documentation Examples Docs do not provide sufficient guidance in this regard. It's essential to understand that for the ReAct agent to effectively learn from few-shot examples, the complete ReAct cycle (Question, Action, Action Input, Observation) should be encapsulated within these examples.
The provided example in the documentation, such as:
qa_pair = dspy.Example(question="This is a question?", answer="This is an answer.")
does not demonstrate the correct way to optimize or finetune a ReAct agent with few-shot examples.
Could someone please provide a clear example demonstrating the correct approach to optimizing or finetuning a ReAct agent, particularly with few-shot examples? This would greatly benefit users seeking to leverage ReAct effectively.
The text was updated successfully, but these errors were encountered: