-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Fix error in Evaluate with display_table=True with outputs that cannot be converted to dict #1682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix error in Evaluate with display_table=True with outputs that cannot be converted to dict #1682
Conversation
| # Create a program that extracts entities from text and returns them as a list, | ||
| # rather than returning a Predictor() wrapper. This is done intentionally to test | ||
| # the case where the program does not output a dictionary-like object because | ||
| # Evaluate() has failed for this case in the past | ||
| lambda text: TypedPredictor("text: str -> entities: List[str]")(text=text).entities, | ||
| dspy.Example(text="United States", entities=["United States"]).with_inputs("text"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@okhat This appears to be a valid setup for DSPy evaluation - i.e. all other parts of evaluation work with such a lambda except for displaying table output. Let me know if there's a strong reason to disallow this and require all DSPy modules / programs to return Prediction() objects
| # rather than returning a Predictor() wrapper. This is done intentionally to test | ||
| # the case where the program does not output a dictionary-like object because | ||
| # Evaluate() has failed for this case in the past | ||
| lambda text: TypedPredictor("text: str -> entities: List[str]")(text=text).entities, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case fails on main:
...
test_evaluate.py:163:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../dspy/evaluate/evaluate.py:229: in __call__
data = [
../../dspy/evaluate/evaluate.py:230: in <listcomp>
merge_dicts(example, prediction) | {"correct": score} for _, example, prediction, score in predicted_devset
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
d1 = Example({'text': 'United States', 'entities': ['United States']}) (input_keys={'text'}), d2 = ['United States']
def merge_dicts(d1, d2) -> dict:
merged = {}
for k, v in d1.items():
if k in d2:
merged[f"example_{k}"] = v
else:
merged[k] = v
> for k, v in d2.items():
E AttributeError: 'list' object has no attribute 'items'
../../dspy/evaluate/evaluate.py:286: AttributeError
================================================================================= warnings summary =================================================================================
../../../miniconda3/envs/default/lib/python3.10/site-packages/pydantic/_internal/_config.py:291
/Users/corey.zumar/miniconda3/envs/default/lib/python3.10/site-packages/pydantic/_internal/_config.py:291: PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.9/migration/
warnings.warn(DEPRECATION_MESSAGE, DeprecationWarning)
../../../miniconda3/envs/default/lib/python3.10/site-packages/wandb/env.py:16
/Users/corey.zumar/miniconda3/envs/default/lib/python3.10/site-packages/wandb/env.py:16: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
from distutils.util import strtobool
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================= short test summary info ==============================================================================
FAILED test_evaluate.py::test_evaluate_display_table[True-True-program_with_example1] - AttributeError: 'list' object has no attribute 'items'
FAILED test_evaluate.py::test_evaluate_display_table[True-False-program_with_example1] - AttributeError: 'list' object has no attribute 'items'
FAILED test_evaluate.py::test_evaluate_display_table[True-1-program_with_example1] - AttributeError: 'list' object has no attribute 'items'
FAILED test_evaluate.py::test_evaluate_display_table[False-True-program_with_example1] - AttributeError: 'list' object has no attribute 'items'
FAILED test_evaluate.py::test_evaluate_display_table[False-False-program_with_example1] - AttributeError: 'list' object has no attribute 'items'
FAILED test_evaluate.py::test_evaluate_display_table[False-1-program_with_example1] - AttributeError: 'list' object has no attribute 'items'
dspy/evaluate/evaluate.py
Outdated
|
|
||
| data = [ | ||
| merge_dicts(example, prediction) | {"correct": score} for _, example, prediction, score in predicted_devset | ||
| dict(example) | {"prediction": prediction, "correct": score} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @dbczumar ! Everything looks great to me, besides the following questions.
Is the moving of prediction under a key called "prediction" intentional? What if the prediction has many fields? How will that get displayed? I imagine it breaks the niceness of columns, unless the table displayer is smart and shows some kind of sub-columns.
I do like getting rid of merge_dicts which is not needed in 3.9+.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@okhat Got it - makes perfect sense :). For dict-like outputs, I've restored the preexsting behavior. For non-dict-like outputs, I think including the full value in a predictions column makes sense.
| data = [ | ||
| merge_dicts(example, prediction) | {"correct": score} for _, example, prediction, score in predicted_devset | ||
| ( | ||
| merge_dicts(example, prediction) | {"correct": score} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely presuppose that merge_dicts is there for legacy reasons, and that example | prediction may have the same effect, is that false? (it may be false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i see, merge_dicts does some kind of disambiguation instead of overwriting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, exactly
If I write a DSPy module that returns an object that is not coercable to
dict, I cannot currently callEvaluate()on the module withdisplay_table=Truebecausedisplay_table()assumes that all outputs are dictionaries. This PR fixes the issue and adds a test.