Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template issues when fields are missing from demos #650

Open
thomasahle opened this issue Mar 14, 2024 · 4 comments
Open

Template issues when fields are missing from demos #650

thomasahle opened this issue Mar 14, 2024 · 4 comments

Comments

@thomasahle
Copy link
Collaborator

Consider this DSPy program:

    demos = [dspy.Example(input="What is the speed of light?", output="3e8")]
    program = LabeledFewShot(k=len(demos)).compile(
        student=dspy.TypedPredictor("input -> thoughts, output"),
        trainset=[ex.with_inputs("input") for ex in demos],
    )
    dspy.settings.configure(lm=DummyLM(["My thoughts", "Paris"]))
    assert program(input="What is the capital of France?").output == "Paris"

You would think the inspect_history(n=1) to look like:

Given the fields `input`, produce the fields `output`, `thoughts`.

---

Follow the following format.

Input: ${input}
Thoughts: ${thoughts}
Output: ${output}

---

Input: What is the speed of light?
Output: 3e8

---

Input: What is the capital of France?
Output: My thoughts
Thoughts: Paris

Or in some other reasonable way handle the lack of "thoughts" in the labeled data.

However, what we get instead is

Given the fields `input`, produce the fields `thoughts`, `output`.

---

Follow the following format.

Input: ${input}
Thoughts: ${thoughts}
Output: ${output}

---

Input: What is the speed of light?
Output: 3e8

Input: What is the capital of France?
Thoughts: My thoughts
Output:Paris

Which has a big problem: The --- line is missing.
Whatever solution we have to "fields missing from examples", this shouldn't be it.

Things get even worse if the last field is missing, rather than a field in the middle.

Consider this DSPy program:

    demos = [dspy.Example(input="What is the speed of light?", output="3e8")]
    program = LabeledFewShot(k=len(demos)).compile(
        student=dspy.TypedPredictor("input -> output, thoughts"),
        trainset=[ex.with_inputs("input") for ex in demos],
    )
    dspy.settings.configure(lm=DummyLM(["My thoughts", "Paris"]))
    assert program(input="What is the capital of France?").output == "Paris"

Where I moved "thoughts" to be after "output".
Now I get this trace:

Given the fields `input`, produce the fields `output`, `thoughts`.

---

Follow the following format.

Input: ${input}
Output: ${output}
Thoughts: ${thoughts}

---

Input: What is the capital of France?
Output: My thoughts
Thoughts:Paris

We see that the example has completely disappeared.
This confused me for quite a while.

@thomasahle thomasahle added the bug Something isn't working label Mar 14, 2024
@okhat
Copy link
Collaborator

okhat commented Mar 15, 2024

The first and last fields are required in the current design, not a bug but can be "behavior to improve" in some way

@okhat okhat removed the bug Something isn't working label Mar 15, 2024
@thomasnormal
Copy link

thomasnormal commented Mar 15, 2024

If it's required, and it doesn't throw an error, then it's just another kind of bug :-)

Can you think of anything that would break if I fixed this to return something like this?

---

Input: What is the speed of light?
Output: 3e8

---

Input: What is the capital of France?
Output: My thoughts
Thoughts: Paris

?

@okhat
Copy link
Collaborator

okhat commented Mar 25, 2024

@thomasahle Yeah I wouldn’t make that change. Just pass N/A to the fields you don’t need or something to that effect. This shouldn’t throw an error, it’s almost guaranteed that there will be missing fields in some examples with respect to some modules. The behavior is to not show them in that case.

But what we might need is a per-module way to assign explicit labeled data. There, more checks can be enforced.

@thomasnormal
Copy link

@okhat What I'm suggesting in #650 (comment) is exactly to not show the fields if they are not set.

The current behaviour is this:

Input: What is the speed of light?
Output: 3e8

Input: What is the capital of France?
Thoughts: My thoughts
Output:Paris

That is, the separator --- is missing.
Surely that's never the right thing to do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants