ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models

This repo provides the source code & data of our paper: ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models (Arxiv 2023).

@InProceedings{Chen-ChatCot-2023,
      title = {ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models}, 
      author = {Zhipeng Chen and Kun Zhou and Beichen Zhang and Zheng Gong and Wayne Xin Zhao and Ji-Rong Wen},
      year = {2023},
      eprint = {2305.14323},
      archivePrefix = {arXiv},
      primaryClass = {cs.CL}
}

Data & Code

math/: the code about ChatCoT on MATH dataset
- demo/: the demos used in in-context learning on few-shot setting
- math/result/: the result files of different methods
- math/scripts/: the running scripts of different methods
- math/ablation: the code of ablation study
- math/self_consistency: the code to explore combining ChatCoT with CoT improvement strategies

The hotpotqa/ folder is similar with math/ folder

Usage

Prepare

You can use following scripts to install related python package through pip:

git clone https://github.com/RUCAIBox/ChatCoT.git
cd ChatCoT
pip install -r requirements.txt

Inference

You can run ChatCot on the sub-task of MATH dataset by running run_turbo_chatcot.sh:

cd math
bash scripts/run_turbo_chatcot.sh

You have to replace YOUR_API_KEY with you openai api key in the code. Specially, we run ChatCoT through multi-processing, and you should prepare a list of api key in order to run the code correctly.

Evaluate

You can evaluate the results by running eval.sh:

cd math
bash scripts/eval.sh

Results

Main Results

Methods	Algebra	CP	PC	PA	Geometry	IA	NT
CoT	48.10	31.43	21.06	56.60	22.34	18.27	29.07
CoT w/ Tool	35.89	22.57	9.34	40.53	13.57	9.41	19.44
CoT w/ Retri	52.74	32.70	18.86	58.44	29.23	19.93	31.67
ChatCoT	56.11	34.18	23.81	59.24	29.85	19.49	32.59

Methods	HotpotQA
CoT	37.99
CoT w/ Tool	31.42
ChatCoT w/o Feedback	53.79
ChatCoT	59.16

Ablation Study

Methods	PC	Geo	NT
ChatCoT	23.81	29.85	32.59
ChatCoT w/o TK	23.26	29.23	30.56
ChatCoT w/o RATK	19.96	27.35	30.93
ChatCoT w/o MRF	21.61	24.22	32.22

The results of ablation study. TK, RATK, and MRF denote if using tool knowledge, retrieval-augmented task knowledge, and multi-turn reasoning format at early turns of the conversation, respectively.

Combining CoT Improvement Strategies

Methods	CP	NT
CoT	31.43	29.07
CoT + SC	35.23	34.44
ChatCoT	34.18	32.59
ChatCoT + SC	40.08	38.33

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
hotpotqa		hotpotqa
math		math
picture		picture
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hotpotqa

hotpotqa

math

math

picture

picture

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models

Data & Code

Usage

Prepare

Inference

Evaluate

Results

Main Results

Ablation Study

Combining CoT Improvement Strategies

About

Releases

Packages

Languages

RUCAIBox/ChatCoT

Folders and files

Latest commit

History

Repository files navigation

ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models

Data & Code

Usage

Prepare

Inference

Evaluate

Results

Main Results

Ablation Study

Combining CoT Improvement Strategies

About

Resources

Stars

Watchers

Forks

Languages