Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Math conf #2

Merged
merged 2 commits into from
Nov 8, 2023
Merged

Math conf #2

merged 2 commits into from
Nov 8, 2023

Conversation

msaroufim
Copy link
Member

@msaroufim msaroufim commented Oct 31, 2023

So here we need to decide what level we want exactly - just opt for 5 everywhere?

Then we can choose to include chain of thought in the prompt as well or not

@weiweiy
Copy link
Contributor

weiweiy commented Oct 31, 2023

I suggest we create a config for each of the private scenario and then we can do something similar to this to give a budget and then generate the private_sparse_run-config.

@weiweiy
Copy link
Contributor

weiweiy commented Nov 1, 2023

@msaroufim, can you remind me what level means here again? I'd suggest we opt for either 1 or 3 or 5 and use official sample and chain of thought. In general these are pretty hard math questions. @artidoro what do you think?

@msaroufim
Copy link
Member Author

msaroufim commented Nov 1, 2023

From the paper https://arxiv.org/pdf/2103.03874.pdf - levels follow https://artofproblemsolving.com/wiki/index.php/AoPS_Wiki:Competition_ratings with

  • 5: More difficult AIME problems (10-12), simple proof-based Olympiad-style problems (early JBMO questions, easiest USAJMO 1/4).
  • 1: Problems strictly for beginner, on the easiest elementary school or middle school levels (MOEMS, MATHCOUNTS Chapter, AMC 8 1-20, AMC 10 1-10, AMC 12 1-5, and others that involve standard techniques introduced up to the middle school level), most traditional middle/high school word problems.

And this is how a smaller model like GPT-2 performs when you increase the difficulty, they are indeed quite hard

image

@msaroufim msaroufim merged commit 1111ae1 into neurips_eval Nov 8, 2023
3 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants