Fix error in Evaluate with display_table=True with outputs that cannot be converted to dict #1682

dbczumar · 2024-10-23T19:52:29Z

If I write a DSPy module that returns an object that is not coercable to dict, I cannot currently call Evaluate() on the module with display_table=True because display_table() assumes that all outputs are dictionaries. This PR fixes the issue and adds a test.

Signed-off-by: dbczumar <corey.zumar@databricks.com>

dbczumar · 2024-10-23T19:58:55Z

tests/evaluate/test_evaluate.py

+            # Create a program that extracts entities from text and returns them as a list,
+            # rather than returning a Predictor() wrapper. This is done intentionally to test
+            # the case where the program does not output a dictionary-like object because
+            # Evaluate() has failed for this case in the past
+            lambda text: TypedPredictor("text: str -> entities: List[str]")(text=text).entities,
+            dspy.Example(text="United States", entities=["United States"]).with_inputs("text"),


@okhat This appears to be a valid setup for DSPy evaluation - i.e. all other parts of evaluation work with such a lambda except for displaying table output. Let me know if there's a strong reason to disallow this and require all DSPy modules / programs to return Prediction() objects

dbczumar · 2024-10-23T20:19:08Z

tests/evaluate/test_evaluate.py

+            # rather than returning a Predictor() wrapper. This is done intentionally to test
+            # the case where the program does not output a dictionary-like object because
+            # Evaluate() has failed for this case in the past
+            lambda text: TypedPredictor("text: str -> entities: List[str]")(text=text).entities,


This case fails on main:

... test_evaluate.py:163: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../dspy/evaluate/evaluate.py:229: in __call__ data = [ ../../dspy/evaluate/evaluate.py:230: in <listcomp> merge_dicts(example, prediction) | {"correct": score} for _, example, prediction, score in predicted_devset _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ d1 = Example({'text': 'United States', 'entities': ['United States']}) (input_keys={'text'}), d2 = ['United States'] def merge_dicts(d1, d2) -> dict: merged = {} for k, v in d1.items(): if k in d2: merged[f"example_{k}"] = v else: merged[k] = v > for k, v in d2.items(): E AttributeError: 'list' object has no attribute 'items' ../../dspy/evaluate/evaluate.py:286: AttributeError ================================================================================= warnings summary ================================================================================= ../../../miniconda3/envs/default/lib/python3.10/site-packages/pydantic/_internal/_config.py:291 /Users/corey.zumar/miniconda3/envs/default/lib/python3.10/site-packages/pydantic/_internal/_config.py:291: PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.9/migration/ warnings.warn(DEPRECATION_MESSAGE, DeprecationWarning) ../../../miniconda3/envs/default/lib/python3.10/site-packages/wandb/env.py:16 /Users/corey.zumar/miniconda3/envs/default/lib/python3.10/site-packages/wandb/env.py:16: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives from distutils.util import strtobool -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================================================================= short test summary info ============================================================================== FAILED test_evaluate.py::test_evaluate_display_table[True-True-program_with_example1] - AttributeError: 'list' object has no attribute 'items' FAILED test_evaluate.py::test_evaluate_display_table[True-False-program_with_example1] - AttributeError: 'list' object has no attribute 'items' FAILED test_evaluate.py::test_evaluate_display_table[True-1-program_with_example1] - AttributeError: 'list' object has no attribute 'items' FAILED test_evaluate.py::test_evaluate_display_table[False-True-program_with_example1] - AttributeError: 'list' object has no attribute 'items' FAILED test_evaluate.py::test_evaluate_display_table[False-False-program_with_example1] - AttributeError: 'list' object has no attribute 'items' FAILED test_evaluate.py::test_evaluate_display_table[False-1-program_with_example1] - AttributeError: 'list' object has no attribute 'items'

okhat · 2024-10-23T20:27:58Z

dspy/evaluate/evaluate.py


        data = [
-            merge_dicts(example, prediction) | {"correct": score} for _, example, prediction, score in predicted_devset
+            dict(example) | {"prediction": prediction, "correct": score}


Thanks so much @dbczumar ! Everything looks great to me, besides the following questions.

Is the moving of prediction under a key called "prediction" intentional? What if the prediction has many fields? How will that get displayed? I imagine it breaks the niceness of columns, unless the table displayer is smart and shows some kind of sub-columns.

I do like getting rid of merge_dicts which is not needed in 3.9+.

@okhat Got it - makes perfect sense :). For dict-like outputs, I've restored the preexsting behavior. For non-dict-like outputs, I think including the full value in a predictions column makes sense.

Signed-off-by: dbczumar <corey.zumar@databricks.com>

okhat · 2024-10-24T00:32:19Z

dspy/evaluate/evaluate.py

        data = [
-            merge_dicts(example, prediction) | {"correct": score} for _, example, prediction, score in predicted_devset
+            (
+                merge_dicts(example, prediction) | {"correct": score}


I vaguely presuppose that merge_dicts is there for legacy reasons, and that example | prediction may have the same effect, is that false? (it may be false)

oh, i see, merge_dicts does some kind of disambiguation instead of overwriting

Yeah, exactly

dbczumar added 4 commits October 23, 2024 12:23

fix

c7d0339

Signed-off-by: dbczumar <corey.zumar@databricks.com>

fix

11aca74

Signed-off-by: dbczumar <corey.zumar@databricks.com>

fix

4a1bc0b

Signed-off-by: dbczumar <corey.zumar@databricks.com>

fix

9053c54

Signed-off-by: dbczumar <corey.zumar@databricks.com>

dbczumar commented Oct 23, 2024

View reviewed changes

dbczumar requested a review from okhat October 23, 2024 19:59

dbczumar commented Oct 23, 2024

View reviewed changes

okhat reviewed Oct 23, 2024

View reviewed changes

dbczumar added 2 commits October 23, 2024 17:22

fix

6c0c177

Signed-off-by: dbczumar <corey.zumar@databricks.com>

fix

ced68ca

Signed-off-by: dbczumar <corey.zumar@databricks.com>

okhat reviewed Oct 24, 2024

View reviewed changes

okhat merged commit a68f2d9 into stanfordnlp:main Oct 24, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix error in Evaluate with display_table=True with outputs that cannot be converted to dict #1682

Fix error in Evaluate with display_table=True with outputs that cannot be converted to dict #1682

Uh oh!

dbczumar commented Oct 23, 2024

Uh oh!

dbczumar Oct 23, 2024

Uh oh!

dbczumar Oct 23, 2024

Uh oh!

okhat Oct 23, 2024

Uh oh!

dbczumar Oct 24, 2024

Uh oh!

okhat Oct 24, 2024

Uh oh!

okhat Oct 24, 2024

Uh oh!

dbczumar Oct 24, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix error in Evaluate with display_table=True with outputs that cannot be converted to dict #1682

Fix error in Evaluate with display_table=True with outputs that cannot be converted to dict #1682

Uh oh!

Conversation

dbczumar commented Oct 23, 2024

Uh oh!

dbczumar Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

dbczumar Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

okhat Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

dbczumar Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

okhat Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

okhat Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

dbczumar Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants