-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
text-davinci-001: majority_vote_ function does not handle cases where pred[prediction_field] is a list of str instead of str #47
Comments
Thanks a lot for catching this! Do you have an example of where that might happen? |
Hmm, thanks! We do already do CoT+SC with dsp.majority in the intro notebook. Here's a simplified version.
The only difference I see is that we extract the answer at the end. Is there anyother difference? |
Maybe you're saying we have some regression. The same code from the intro notebook no longer works on the latest branch? |
I think I know what the problem is. I guess either specifying that the code does not support certain GPT models, or adding codes to handle the different LM behaviors could work. If you want to go down the second path, the |
If your PR fixes this, I'm happy to merge it. But I'm unclear on what you mean by text-davinci-001 returning a list. Don't all models return a list of strings (i.e., a list of completions)? Are you saying that text-davinci-001 returns a list of lists? |
Please don't merge just yet. As I mentioned, the
Since I'm on a $5 budget for openai calls, so I'm afraid I won't be able to help you test more cases. |
Thanks for reporting this and helping with it. We haven’t been able to reproduce it unfortunately. cc: @lawliet19189 |
Hi @okhat , I believe the problem may come from the The function takes an example, and overwrites
In this case, the template fails to extract In dataset like My guess comes from the following observation. Here is a code snippet from my CS224U homework: test_case = dev_exs[56] # an entry of dev set of SQUAD
print("Question:", test_case.question)
print("Expected answer:", test_case.answer)
print(“Generated answer:”, few_shot_openqa_custom(test_case, 3).answer) And this is the output:
If you want, I can write a PR to set |
Thanks @tsunrise ! I strongly recommend not passing a populated .answer field to your program. Otherwise you risk unintended leakage one way or another. Luckily, your evaluation function seems to expect a string and will raise an exception, I believe, if you try feeding it a list, so hopefully this ensures no leakage in your specific case. If you remove that field before calling generate, dsp.generate will invoke the LLM as many times as needed to make sure all output fields are produced. A PR that unsets output fields that are not overwritten by extract will be highly welcome, although the correct semantics in this case seem ambiguous: what does the user expect when some of the output fields are already set? In general, always make sure to pass only the questions (or example objects with only questions set) to your program. (Or, at least, unset the output values you expect dsp.generate to produce, when there are multiple output fields.) |
Sounds good! I have created a PR for that |
Thanks for opening this, this got fundamentally resolved in v2 (DSPy). Closing. |
Link to the code snippet in question:
https://github.com/stanfordnlp/dsp/blob/12d7f1106f2f524be161d35865a81e0ea016e929/dsp/primitives/predict.py#L211
Expected behavior:
The function should work with
pred[prediction_field]
of typeUnion[List[str], str]
.Current behavior:
The function raises an error when calling
normalize_text
onpred[prediction_field]
when it is of typeList[str]
.The text was updated successfully, but these errors were encountered: