In this paper, we introduce Concise Chain-of-Thought (CCoT) prompting. We compared standard CoT and CCoT prompts to see how conciseness impacted response length and correct-answer accuracy. We evaluated this using GPT-3.5 and GPT-4 with a multiple-choice question-and-answer (MCQA) benchmark.
CCoT reduced average response length by 48.70% for both GPT-3.5 and GPT-4 while having a negligible impact on problem-solving performance. However, on math problems, GPT-3.5 with CCoT incurs a performance penalty of 27.69%. Overall, CCoT leads to an average per-token cost reduction of 22.67%.
These results have practical implications for AI systems engineers using LLMs to solve real-world problems with CoT prompt-engineering techniques. In addition, these results provide more general insight for AI researchers studying the emergent behavior of step-by-step reasoning in LLMs.
- Source - contains all source code
- Models - contains the model-specific code
- Prompts - contains LLM agent prompt code
- Process - contains the data pre-processing scripts
- Analyze - contains the data analysis scripts
- Exams - contains the test dataset
- Results - contains the high-level test results
- Details - contains the low-level test results
- Logs - contains the experiment event logs
- Plots - contains all data visualizations