Math conf #2

msaroufim · 2023-10-31T21:48:33Z

So here we need to decide what level we want exactly - just opt for 5 everywhere?

Then we can choose to include chain of thought in the prompt as well or not

weiweiy · 2023-10-31T22:29:58Z

I suggest we create a config for each of the private scenario and then we can do something similar to this to give a budget and then generate the private_sparse_run-config.

weiweiy · 2023-11-01T01:04:52Z

@msaroufim, can you remind me what level means here again? I'd suggest we opt for either 1 or 3 or 5 and use official sample and chain of thought. In general these are pretty hard math questions. @artidoro what do you think?

msaroufim · 2023-11-01T04:00:47Z

From the paper https://arxiv.org/pdf/2103.03874.pdf - levels follow https://artofproblemsolving.com/wiki/index.php/AoPS_Wiki:Competition_ratings with

5: More difficult AIME problems (10-12), simple proof-based Olympiad-style problems (early JBMO questions, easiest USAJMO 1/4).
1: Problems strictly for beginner, on the easiest elementary school or middle school levels (MOEMS, MATHCOUNTS Chapter, AMC 8 1-20, AMC 10 1-10, AMC 12 1-5, and others that involve standard techniques introduced up to the middle school level), most traditional middle/high school word problems.

And this is how a smaller model like GPT-2 performs when you increase the difficulty, they are indeed quite hard

Math conf

2f1349e

Merge branch 'neurips_eval' into msaroufim/math

c8e46bc

msaroufim merged commit 1111ae1 into neurips_eval Nov 8, 2023
3 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Math conf #2

Math conf #2

msaroufim commented Oct 31, 2023 •

edited

Loading

weiweiy commented Oct 31, 2023

weiweiy commented Nov 1, 2023

msaroufim commented Nov 1, 2023 •

edited

Loading

Math conf #2

Math conf #2

Conversation

msaroufim commented Oct 31, 2023 • edited Loading

weiweiy commented Oct 31, 2023

weiweiy commented Nov 1, 2023

msaroufim commented Nov 1, 2023 • edited Loading

msaroufim commented Oct 31, 2023 •

edited

Loading

msaroufim commented Nov 1, 2023 •

edited

Loading