This repo provides the code, prompts, and our answers solved for the AAAI2024 Global Competition on Math Problem Solving and Reasoning
Our final submission of the answer file is in:
/submission/TAL_SAQ6K_EN_prediction.json
In the end we get the result file as:{"TAL_SAQ6K_EN_Public_Acc": 45.28, "TAL_SAQ6K_EN_Private_Acc": 46.52}
This README document is organized in the following order.
- Methods: The main methods used in the competition and the procedures used to obtain the final results.
- Final Submission: Location for submitting answer results
- Setup: Preparation before running the code
- Usage Demo: Steps to run the code quickly
-
We first reason with the
chatGLM-6b-basemodel using the following prompt, which yields the result:41.96 -
We then conducted the following experiment:
where the
1w calculation questionsin the table refer to the 1w passes in the data set ofMathGLM_Arithemetic:THUDM/MathGLM: Official Pytorch Implementation for MathGLM (github.com)】8k inferencemeans 8k tracts in the GSM8K dataset:openai/grade-school-math (github.com)The
2w goatis the 2w tract in the GOAT dataset:liutiedong/goat: a Fine-tuned LLaMA that is Good at Arithmetic Tasks (github.com)
Based on the results of the fine-tuned reasoning in the table, we vote on the answer corresponding to the data in the table and the answer corresponding to 41.96 obtained in the first step above, i.e., for the same question solved multiple times, the answer with the most number of times of occurrence of the answer is chosen as the final answer chosen.
After this voting process, the final answer is 43.19.
-
In addition to this, we use code solving (Program-Aided Program,
PAL, [reasoning-machines/pal: PaL: Program-Aided Language Models (ICML 2023) (github.com)](https:// github.com/reasoning-machines/pal)) for solving the problem:We first divided the dataset into the following 14 classes using the
KMeansalgorithm, and then we solved for 5, 6, 7, 8, 9, 10, 12, and 13 of them that were suitable to be solved by the code with answer substitutions on the answer of43.19 -
After the above steps, we get the final result:
{"TAL_SAQ6K_EN_Public_Acc": 45.28, "TAL_SAQ6K_EN_Private_Acc": 46.52}
After the above methods, we finally submitted the answer file in:
/submission/TAL_SAQ6K_EN_prediction.json
-
Pull the code repository
-
Go to
/chatglm3-6b-base-model/chatglm3-6b-baseand pull theChatGLM-6b-basemodel from ModelScope or Huggingface -
Execute the following code file to reason with
ChatGLM-6b-baseand our prompt (remember to modify the relevant input and output paths)./scripts/main.sh -
Run the code of voting for different models:
/code/vote/vote.pyOr if you want to finetune the chatGlm model again, you need to fork ChatGLM3 (github.com) first:
cd chatglm3-6b-base-model git clone https://github.com/THUDM/ChatGLM3.gitThen you can use the data in
dataset\finetune\, run the following bash script to finetune the model. (Remember to modify the max steps and the relavant input and output paths)./scripts/finetune_lora.sh -
If you want to use the Program-Aided Program approach to solving, just execute the following code file (remember to modify the relevant input and output paths).
/code/PAL/PAL/scripts/pal_chatglm.py
