Dataset of Codex generated tests for the CodaMosa project, that was used in the evaluation submitted to ICSE'23.
README.md: this filefinal-exp: the directory containing results of the "main" CodaMOSA experiments. The results are zipped per benchmark. After you unzip one (or multiple)BENCHMARK.zipfiles, you will have the following directory structure:TECHNIQUE: there are seven techniques we report results for.mosaandcodex-onlyare our two baselines.codamosa-0.8-uninterpis the main CodaMOSA, the other techniques are ablations:codamosa-0.8uses no uninterpreted statements;codamosa-0.2-uninterpuses lower sampling temperature;codamosa-0.8-uninterp-no-targetingsamples target functions randomly rather than by coverage;codamosa-0.8-uninterp-smalladds a small test case to the prompts pass to codex. Whatever the technique, the data inside is structured as follows:BENCHMARK-i: theith run of the configuration on the benchmarkBENCHMARK, which corresponds to a particular python module:codex_generations.py: (for all codamosa variants + CodexOnly) the raw tests generated by Codex.codamosa_timeline.csv: (for all codamosa variants) a csv where the first column is the time in seconds at which one round of targeted generation ended, and the second column is the number of codex-generated test cases (cumulatively) that were accepted at that timestatistics.csv: all the Pynguin-collected statistics. See the heading of the file for the name of each statistics and the runtime_variable.py file in the source code to see a detailed description of the statistic.test_MODULE.py: the generated test cases at end of search that do not throw exceptionstest_MODULE_failing.py: the generated test cases at end of search that throw exceptions
packages-exp.zip: this zip file contains result of a small experiment for the motivating example in CodaMOSA, showing that better performance is achieved when doctests are removed from the documentation of the function under tests.