Bootstrapping ScoNe CoT demos with a separate LLM from the main one #260
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @okhat!
This is a simple example showing how to use a different (and presumably more powerful) model for bootstrapping examples than the one being used for the central predictions.
The example task is the ScoNe "one scoping negation" category, which is one of the hardest categories in ScoNe. Zero-shot, turbo is at chance, and using turbo to bootstrap full CoT examples seemed not to work well, but using GPT-4 for bootstrapping (a possibility you made me aware of!) took performance all the way north of 85% accuracy (and one of my runs was at 93%). This extremely high, but everything seems to be set up correctly, so this seems like a nice illustration of the power of this strategy.
Do let me know if I should make an adjustments to the way the notebook is set up. The hope is that this is code people can copy-paste for their own work (almost none of the DSPy code is even specific to ScoNe).
---Chris