GitHub - feedback-to-code/codeGLEU

CodeGLEU

While CodeBLEU performs well on sentence-to-sentence comparisons or comparisons between small methods (Ren et al, 2020), the score quickly degenerates when used to compare large methods or files which were only partially modified.

This occurs due to an deficit with the original BLEU in regards to text modification or synthesis tasks. Since BLEU considers n-gram precision, simply repeating the source text leads to a high score in monolingual tasks (Napoles et al, 2015). The smaller the changes made by the references, the higher the score of the source sentence.

In the context of CodeBLEU and code-modification, this means models are often rewarded not just for making appropriate changes, but also for making as few changes as possible, severely polluting the metric.

This deficit was successfully resolved for textual tasks with GLEU (Napoles et al, 2015), which introduces a penalty for n-grams that should have been modified but were not, bringing the metric into alignment with human evaluations even for complex grammatical error correction tasks.

A similarly improved metric for the evaluation of code synthesis does not yet, however, exist. As such, i am introducing the CodeGLEU metric, which aims to extend CodeBLEU in the same way that GLEU extends BLEU.

TODO

Related Works / Citations

Papineni et al, 2002: BLEU: a Method for Automatic Evaluation of Machine Translation
Ren et al, 2020: CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
Napoles et al, 2015: Ground Truth for Grammatical Error Correction Metrics
Napoles et al, 2016: GLEU Without Tuning

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
.vscode		.vscode
codegleu		codegleu
data		data
figs		figs
results		results
tests		tests
torrents		torrents
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
collect.py		collect.py
data_eval.py		data_eval.py
evaluate.py		evaluate.py
evaluate.sh		evaluate.sh
generate_snippets.py		generate_snippets.py
git_utils.py		git_utils.py
pearsonrci.py		pearsonrci.py
pyproject.toml		pyproject.toml
rank.py		rank.py
reducedata.py		reducedata.py
runranking.py		runranking.py
runranking.sh		runranking.sh
test.ipynb		test.ipynb
test.py		test.py
test_example.py		test_example.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CodeGLEU

TODO

Related Works / Citations

About

Uh oh!

Releases

Packages

Languages

License

feedback-to-code/codeGLEU

Folders and files

Latest commit

History

Repository files navigation

CodeGLEU

TODO

Related Works / Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages