Skip to content

Conversation

@dbczumar
Copy link
Collaborator

If I write a DSPy module that returns an object that is not coercable to dict, I cannot currently call Evaluate() on the module with display_table=True because display_table() assumes that all outputs are dictionaries. This PR fixes the issue and adds a test.

Signed-off-by: dbczumar <corey.zumar@databricks.com>
Signed-off-by: dbczumar <corey.zumar@databricks.com>
Signed-off-by: dbczumar <corey.zumar@databricks.com>
Signed-off-by: dbczumar <corey.zumar@databricks.com>
Comment on lines +129 to +134
# Create a program that extracts entities from text and returns them as a list,
# rather than returning a Predictor() wrapper. This is done intentionally to test
# the case where the program does not output a dictionary-like object because
# Evaluate() has failed for this case in the past
lambda text: TypedPredictor("text: str -> entities: List[str]")(text=text).entities,
dspy.Example(text="United States", entities=["United States"]).with_inputs("text"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okhat This appears to be a valid setup for DSPy evaluation - i.e. all other parts of evaluation work with such a lambda except for displaying table output. Let me know if there's a strong reason to disallow this and require all DSPy modules / programs to return Prediction() objects

@dbczumar dbczumar requested a review from okhat October 23, 2024 19:59
# rather than returning a Predictor() wrapper. This is done intentionally to test
# the case where the program does not output a dictionary-like object because
# Evaluate() has failed for this case in the past
lambda text: TypedPredictor("text: str -> entities: List[str]")(text=text).entities,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case fails on main:

...
test_evaluate.py:163:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../dspy/evaluate/evaluate.py:229: in __call__
    data = [
../../dspy/evaluate/evaluate.py:230: in <listcomp>
    merge_dicts(example, prediction) | {"correct": score} for _, example, prediction, score in predicted_devset
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

d1 = Example({'text': 'United States', 'entities': ['United States']}) (input_keys={'text'}), d2 = ['United States']

    def merge_dicts(d1, d2) -> dict:
        merged = {}
        for k, v in d1.items():
            if k in d2:
                merged[f"example_{k}"] = v
            else:
                merged[k] = v

>       for k, v in d2.items():
E       AttributeError: 'list' object has no attribute 'items'

../../dspy/evaluate/evaluate.py:286: AttributeError
================================================================================= warnings summary =================================================================================
../../../miniconda3/envs/default/lib/python3.10/site-packages/pydantic/_internal/_config.py:291
  /Users/corey.zumar/miniconda3/envs/default/lib/python3.10/site-packages/pydantic/_internal/_config.py:291: PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.9/migration/
    warnings.warn(DEPRECATION_MESSAGE, DeprecationWarning)

../../../miniconda3/envs/default/lib/python3.10/site-packages/wandb/env.py:16
  /Users/corey.zumar/miniconda3/envs/default/lib/python3.10/site-packages/wandb/env.py:16: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
    from distutils.util import strtobool

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================= short test summary info ==============================================================================
FAILED test_evaluate.py::test_evaluate_display_table[True-True-program_with_example1] - AttributeError: 'list' object has no attribute 'items'
FAILED test_evaluate.py::test_evaluate_display_table[True-False-program_with_example1] - AttributeError: 'list' object has no attribute 'items'
FAILED test_evaluate.py::test_evaluate_display_table[True-1-program_with_example1] - AttributeError: 'list' object has no attribute 'items'
FAILED test_evaluate.py::test_evaluate_display_table[False-True-program_with_example1] - AttributeError: 'list' object has no attribute 'items'
FAILED test_evaluate.py::test_evaluate_display_table[False-False-program_with_example1] - AttributeError: 'list' object has no attribute 'items'
FAILED test_evaluate.py::test_evaluate_display_table[False-1-program_with_example1] - AttributeError: 'list' object has no attribute 'items'


data = [
merge_dicts(example, prediction) | {"correct": score} for _, example, prediction, score in predicted_devset
dict(example) | {"prediction": prediction, "correct": score}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @dbczumar ! Everything looks great to me, besides the following questions.

Is the moving of prediction under a key called "prediction" intentional? What if the prediction has many fields? How will that get displayed? I imagine it breaks the niceness of columns, unless the table displayer is smart and shows some kind of sub-columns.

I do like getting rid of merge_dicts which is not needed in 3.9+.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okhat Got it - makes perfect sense :). For dict-like outputs, I've restored the preexsting behavior. For non-dict-like outputs, I think including the full value in a predictions column makes sense.

Signed-off-by: dbczumar <corey.zumar@databricks.com>
Signed-off-by: dbczumar <corey.zumar@databricks.com>
data = [
merge_dicts(example, prediction) | {"correct": score} for _, example, prediction, score in predicted_devset
(
merge_dicts(example, prediction) | {"correct": score}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely presuppose that merge_dicts is there for legacy reasons, and that example | prediction may have the same effect, is that false? (it may be false)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i see, merge_dicts does some kind of disambiguation instead of overwriting

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly

@okhat okhat merged commit a68f2d9 into stanfordnlp:main Oct 24, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants