Release EvalPlus v0.2.1 · evalplus/evalplus

Main updates

HumanEval+ and MBPP+ datasets are on the hub now:
- HumanEval+: https://huggingface.co/datasets/evalplus/humanevalplus
- MBPP+: https://huggingface.co/datasets/evalplus/mbppplus
HumanEval+ is ported to original HumanEval format. Release files have a new home now:
- HumanEval+: https://github.com/evalplus/humanevalplus_release
- MBPP+: https://github.com/evalplus/mbppplus_release
You can use EvalPlus through bigcode-evaluation-harness now
Docker image now uses Python 3.10 since some model might generate Python code using latest syntax, leading to false positive using older Python
Sanitizer is now merged into the pacakge
Several improvements and bug fixes to the sanitizer
Test suite reduction now moved to tools
Fixes the CACHE_DIR nonexistance issue
Simplified the format of eval_results.json for readability
Use EVALPLUS_TIMEOUT_PER_TASK env var to set the maximum testing time for each task
Timeout per test is set to 0.5s by default
Fixes argument validity for inputgen.py