Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add problem solving to evol-instruct #33

Open
walking-octopus opened this issue May 26, 2023 · 3 comments
Open

Add problem solving to evol-instruct #33

walking-octopus opened this issue May 26, 2023 · 3 comments

Comments

@walking-octopus
Copy link

walking-octopus commented May 26, 2023

Description:

The current performance of Evol-Instruct on math, geometry, and physics problem solving is rather poor. To enhance the overall reasoning/basic math capabilities of WizardLM, I believe more high-school level physics, algebra, or geometry problems should be present within the dataset. GPT-4 seems to do mostly fine on them, so it seems doable, though being a smaller model, it would be quite interesting to see how far it can get.

This dataset I found didn't contain much physics questions in particular, which tracks well with hallucinated formulas and inability to reason step by step to find intermediary values.

@nlpxucan
Copy link
Owner

Thanks for your valuable suggestion, we found that the skills you mentioned improved when finetune with the larger llama model (i.e., 13B). We will continue to think about new ideas to improve these skills.

@walking-octopus
Copy link
Author

Thank you for the timely response. I'd be interested to see how well the 13B model performed on these questions, which I can't do since I only have 8GB of RAM and a pretty weak CPU, only being able to play with the model on Gradio or through LLaMA.cpp.

Still, I find it fascinating to see how projects like this push the limits of what's possible with that low of a parameter count, prompting even the attention of Google and Microsoft (referring to Google's "we have no moat" memo and Microsoft's TinyStories experiment). I wonder if any meaningful results on this complex task can be achieved at just 7B without even training a model from scratch.

@walking-octopus
Copy link
Author

walking-octopus commented Jun 7, 2023

The newly released WizardLM 13B, which dataset included more physics questions, had finally started forming coherent reasoning chains, correctly doing basic calculations, rearranging equations, and solves simple problems as well as gpt-3.5, which Guanaco 65B couldn't achieve.

However, interestingly, WizardLM 30B consistently hallucinates an incorrect reasoning chain, giving us snowballing hallucinations that end up with an incorrect answer. Perhaps this can give us some insight into effective scaling and training settings for a given dataset and foundation model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants