Feat/port tests to dspy lm client #1585

mikeedjones · 2024-10-05T16:11:00Z

This PR attempts to port all tests currently using DummyLM to a new DummyLM which inherits from dspy.LM - hence migrating the test suite to use the new ChatAdapter.

It also replicates all current tests which use the renamed DspDummyLM(dsp.LM) into a folder tests/DSP_LM, to be deleted when 2.6 is released and dsp.LM is deprecated.

The tests for the to-be-deprectaed MIPRO have not been migrated to use dspy.LM.

I also included a fix for predictors which return Literal types (or any type the origin of which does not have a __name__ attribute) with a small tweak to dspy/adapters/chat_adapter.py:get_annotation_name. Without this change tests/functional/test_functional.py:test_literal.* fail. Can also skip those tests and create another PR.

This PR also adds an auto-used fixture which resets the dspy.settings to default after each test without which some tests were interdependent.

… files

mikeedjones · 2024-10-06T05:35:12Z

Discovered a cute gotcha - if you configure the LM (dspy.settings.configure(lm=lm)) after initializing your modules you get a different prompt to if you configure your LM after initializing your modules.

    dspy.settings.configure(lm=lm)
    pot = ChainOfThought(BasicQA)

Adds "reasoning" to the output signature.

    pot = ChainOfThought(BasicQA)
    dspy.settings.configure(lm=lm)

Adds "rationale" to the output signature.

Because the dspy.settings.lm is referenced in ChainOfThought.__init__.

Will raise an issue - maybe just something to be added in the migration docs. Resolved when deprecating dsp.LM

mikeedjones · 2024-10-06T06:44:43Z

dspy/adapters/chat_adapter.py

-        args_str = ', '.join(get_annotation_name(arg) for arg in args)
-        return f"{origin.__name__}[{args_str}]"
+        args_str = ", ".join(get_annotation_name(arg) for arg in args)
+        return f"{get_annotation_name(origin)}[{args_str}]"


Change to account for the origin of Literal not having a __name__ attribute.

okhat · 2024-10-06T17:41:21Z

Thanks so much, @mikeedjones !!! This is amazing. Great catch on the issue.

okhat · 2024-10-06T17:44:30Z

dsp/utils/settings.py

-                experimental=False,
-                backoff_time = 10
-            )
+            config = DEFAULT_CONFIG


I like the refactor but there's a subtle risk here. This is now a reference to a global variable, and I fear somehow that we should make a deep copy here instead. (Arguably it's a singleton so who knows, maybe it's OK now, but I'd like to be sure it's a unique object)

ahh good point. Changed that thanks!

okhat · 2024-10-06T17:48:17Z

@mikeedjones This is super awesome. I'm tempted to merge as-is but I left a comment to be resolved up about DEFAULT_CONFIG.

More importantly, though, I see that the current tests are checking the output string, e.g. [[ ## name ## ]], which may well change frequently in 2.5 until we have 2.6 which is when we won't adjust the default adapters anymore.

More generally, basically we should test that the adapter's parse of the response is unchanged, not that the string is unchanged, if you know what I mean.

mikeedjones · 2024-10-06T18:55:45Z

Yeah, I agree - I just tried to replicate old tests as closely as I could - maybe worth writing a dummy adapter as well as a dummy LM along with a test suite for ChatAdapter? Also have this issue where the lines passed to DummyLM would have to change as the ChatAdapter changes.

okhat · 2024-10-07T10:36:46Z

tests/dsp_LM/examples/test_baleen.py

+
+# @pytest.mark.slow_test
+# TODO: Find a way to make this test run without openai
+def _test_baleen():


Wait what's this file? Was it there before? Not sure we want this.

From this PR: https://github.com/stanfordnlp/dspy/pull/451/files#diff-a1699b14a8933ad094957aac992107cfac4a4a54a7bd20fb0efca6d0cb751fb1

okhat · 2024-10-07T10:38:15Z

tests/dsp_LM/retrieve/integration_test_pgvectorrm.py

@@ -0,0 +1,94 @@
+"""Instructions:


I think the whole retrieve folder at tests/dsp_LM/retrieve/ probably doesn't need to be under dsp_LM?

tests/dsp_LM/retrieve/test_llama_index_rm.py uses DspyDummyLM but the tests are skipped in the CI.

okhat · 2024-10-07T10:45:31Z

OK just reiterating this looks great to me (except I'm not sure I understand the implications of test_baleen.py) but we can't the parts with hard-coded prompt strings and output strings can't be merged directly as they would break on any changes to ChatAdapter. And I expect changes to happen in ChatAdapter.

We could introduce a dummy adapter but at that point what are we really testing? In any case, if making a dummy adapter makes sense to you and makes it easy to get this update overall merged, it sounds good to me in the short-to-medium term.

mikeedjones · 2024-10-07T11:44:20Z

Maybe we remove the history comparisons and pass a list of field_name:value dicts into DummyLM which then has some output formatter? When a dev makes changes to ChatAdapter they have to update DummyLM's output formatter, doesn't seem like too much of a lift. Might make a good smoke test that their changes to ChatAdapter are working as intended?

okhat · 2024-10-07T13:29:49Z

Maybe we remove the history comparisons

Sure. Later we can figure out how to test this.

pass a list of field_name:value dicts into DummyLM which then has some output formatter

Hmm, ideally that formatter is grabbed directly from the adapter I guess? Basically the test in essence will just check that parse(format_output_values(values)) == values, right?

I don't want us to maintain two different copies of the same thing for the tests' sake.

Btw I'm totally happy to have us merge this PR without this bit about formatting altogether, then we can discuss that part as a second PR. Your call. It's fine this way too.

mikeedjones · 2024-10-07T14:08:41Z

I guess the formatter for DummyLM would be the inverse of ChatAdapter.parse so possible using the example formatter I think?

I think merge and i'll make a PR removing the history comparison and adding the formatter now if that's ok?

mikeedjones · 2024-10-07T14:52:20Z

#1595 <- can close this in favor of an update?

Michael Jones added 5 commits October 5, 2024 12:24

feat(dspy): add dummyLM based upon dspy.clients.lm.LM

9c753b7

feat(dspy): move tests using DspyDummyLM to own folder for deprecation

b0d4f2c

feat(dspy): add unqiue base names for all tests by adding __init__.py…

9753f3e

… files

feat(dspy): follow same pattern as other uses of DSPDummyLM

1100658

feat(dspy): update dummyLM completions for ChatAdapter instructions

7de20e6

mikeedjones marked this pull request as draft October 5, 2024 16:11

feat(dspy): revert removal of whitespace

f817dd3

feat(dspy): update remaining tests

476b865

mikeedjones marked this pull request as ready for review October 6, 2024 06:03

mikeedjones changed the title ~~DRAFT: Feat/port tests to dspy lm client~~ Feat/port tests to dspy lm client Oct 6, 2024

Michael Jones added 4 commits October 6, 2024 06:06

feat(dspy): revert changes to MIPRO tests

789b7a1

feat(dspy): clear settings between tests

03d8f2b

feat(dspy): rm print

843d63b

feat(dspy): handle case where type origin doesn't have a __name__ attr

6d7c968

mikeedjones commented Oct 6, 2024

View reviewed changes

feat(dspy): whitespace

5221ab3

okhat reviewed Oct 6, 2024

View reviewed changes

Michael Jones added 2 commits October 6, 2024 19:31

deepcopy(DEFAULT_CONFIG) to avoid reference

dafd8ff

deepcopy(DEFAULT_CONFIG) to avoid reference

5e03bc4

okhat reviewed Oct 7, 2024

View reviewed changes

feat(dspy): remove files where DSPDummyLM not used

53b24c8

Merge branch 'main' into feat/port-tests-to-dspy-lm-client

a9eee35

mikeedjones closed this Oct 8, 2024

mikeedjones mentioned this pull request Oct 11, 2024

Fix TypedPredictor formatting with list output values #1609

Merged

Feat/port tests to dspy lm client #1585

Feat/port tests to dspy lm client #1585

Uh oh!

Conversation

mikeedjones commented Oct 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikeedjones commented Oct 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikeedjones Oct 6, 2024

Choose a reason for hiding this comment

Uh oh!

okhat commented Oct 6, 2024

Uh oh!

okhat Oct 6, 2024

Choose a reason for hiding this comment

Uh oh!

mikeedjones Oct 6, 2024

Choose a reason for hiding this comment

Uh oh!

okhat commented Oct 6, 2024

Uh oh!

mikeedjones commented Oct 6, 2024

Uh oh!

okhat Oct 7, 2024

Choose a reason for hiding this comment

Uh oh!

mikeedjones Oct 7, 2024

Choose a reason for hiding this comment

Uh oh!

okhat Oct 7, 2024

Choose a reason for hiding this comment

Uh oh!

mikeedjones Oct 7, 2024

Choose a reason for hiding this comment

Uh oh!

okhat commented Oct 7, 2024

Uh oh!

mikeedjones commented Oct 7, 2024

Uh oh!

okhat commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikeedjones commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikeedjones commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mikeedjones commented Oct 5, 2024 •

edited

Loading

mikeedjones commented Oct 6, 2024 •

edited

Loading

okhat commented Oct 7, 2024 •

edited

Loading

mikeedjones commented Oct 7, 2024 •

edited

Loading

mikeedjones commented Oct 7, 2024 •

edited

Loading