Mango

The code and data for paper "Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective".

Introduction

Code generation aims to understand the problem description and generate corresponding code snippets, where existing works generally decompose such complex tasks into intermediate steps by prompting strategies, such as Chain-of-Thought and its variants. While these studies have achieved some success, their effectiveness is highly dependent on the capabilities of advanced Large Language Models (LLMs) such as GPT-4, particularly in terms of API calls, which significantly limits their practical applicability. Consequently, how to enhance the code generation capabilities of small and medium-scale code LLMs without significantly increasing training costs is an appealing challenge. In this study, we suggest that code comments are the natural logic pivot between natural language and code language and propose using comments to boost the code generation ability of code LLMs.

We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy.

For more details, please refer our paper on arxiv

Requirements

peft
deepspeed==0.9.3
accelerate==0.21.0
torch==2.0.1
human_eval 
tqdm
transformers==4.33.0 
tokenizers==0.13.3
datasets
tensorboardx

Datasets

The raw training data train_data/codem-python.json is from https://github.com/NL2Code/CodeM , and the post-processed data is train_data/python_neg_contrastive.json.

The test set includes HumanEval and MBPP, and the test files are put in testsets.

The testsets/new_humaneval_revised.jsonl is a reconstruction test set for humaneval, which unifies the assertment error information.

How to use

Train

Train a model via comment contrastive learning

bash train.sh

Inference

Inference with the trained model and logical comment prompting strategy.

bash infer_humaneval.sh
bash infer_mbpp.sh

Evaluation

Evaluate the output results

output_path= # set your inference result path here
python src/process_humaneval.py --path ${output_path} --out_path ${output_path}.jsonl --add_prompt > eval_human.log
evaluate_functional_correctness ${output_path}.jsonl

Detailed usage

The detailed usage of revised HumanEval test file is to analysis the error type of model ouputs, and you can use the following command and the script result_statistic.py to get the statistic results.

output_path= # set your inference result path here
evaluate_functional_correctness ${output_path}.jsonl --problem_file=testsets/new_humaneval_revised.jsonl 
python result_statistic.py --path ${output_path}.jsonl_results.jsonl

Citation

Please kindly cite our work if you find the paper or code helpful.

@misc{chen2024comments,
      title={Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective}, 
      author={Yijie Chen and Yijin Liu and Fandong Meng and Yufeng Chen and Jinan Xu and Jie Zhou},
      year={2024},
      eprint={2404.07549},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
testsets		testsets
train_data		train_data
README.md		README.md
__init__.py		__init__.py
deepspeed_config1.json		deepspeed_config1.json
infer_humaneval.sh		infer_humaneval.sh
infer_mbpp.sh		infer_mbpp.sh
my_modeling_gpt_bigcode.py		my_modeling_gpt_bigcode.py
my_modeling_llama.py		my_modeling_llama.py
prompt_ext.json		prompt_ext.json
result_statistic.py		result_statistic.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mango

Introduction

Requirements

Datasets

How to use

Train

Inference

Evaluation

Detailed usage

Citation

About

Releases

Packages

Languages

pppa2019/Mango

Folders and files

Latest commit

History

Repository files navigation

Mango

Introduction

Requirements

Datasets

How to use

Train

Inference

Evaluation

Detailed usage

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages