The code and data for paper "Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective".
Code generation aims to understand the problem description and generate corresponding code snippets, where existing works generally decompose such complex tasks into intermediate steps by prompting strategies, such as Chain-of-Thought and its variants. While these studies have achieved some success, their effectiveness is highly dependent on the capabilities of advanced Large Language Models (LLMs) such as GPT-4, particularly in terms of API calls, which significantly limits their practical applicability. Consequently, how to enhance the code generation capabilities of small and medium-scale code LLMs without significantly increasing training costs is an appealing challenge. In this study, we suggest that code comments are the natural logic pivot between natural language and code language and propose using comments to boost the code generation ability of code LLMs.
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy.
For more details, please refer our paper on arxiv
peft
deepspeed==0.9.3
accelerate==0.21.0
torch==2.0.1
human_eval
tqdm
transformers==4.33.0
tokenizers==0.13.3
datasets
tensorboardx
The raw training data train_data/codem-python.json
is from https://github.com/NL2Code/CodeM , and the post-processed data is train_data/python_neg_contrastive.json
.
The test set includes HumanEval and MBPP, and the test files are put in testsets
.
The testsets/new_humaneval_revised.jsonl
is a reconstruction test set for humaneval, which unifies the assertment error information.
Train a model via comment contrastive learning
bash train.sh
Inference with the trained model and logical comment prompting strategy.
bash infer_humaneval.sh
bash infer_mbpp.sh
Evaluate the output results
output_path= # set your inference result path here
python src/process_humaneval.py --path ${output_path} --out_path ${output_path}.jsonl --add_prompt > eval_human.log
evaluate_functional_correctness ${output_path}.jsonl
The detailed usage of revised HumanEval test file is to analysis the error type of model ouputs, and you can use the following command and the script result_statistic.py
to get the statistic results.
output_path= # set your inference result path here
evaluate_functional_correctness ${output_path}.jsonl --problem_file=testsets/new_humaneval_revised.jsonl
python result_statistic.py --path ${output_path}.jsonl_results.jsonl
Please kindly cite our work if you find the paper or code helpful.
@misc{chen2024comments,
title={Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective},
author={Yijie Chen and Yijin Liu and Fandong Meng and Yufeng Chen and Jinan Xu and Jie Zhou},
year={2024},
eprint={2404.07549},
archivePrefix={arXiv},
primaryClass={cs.CL}
}