24 May 11:04

e5dd851

v0.1.1 - First PyPI-published release Latest

Latest

CleanTest-Agent v0.1.1

First PyPI-published release.

Install

pip install cleantest-agent

That command worked starting with this release. v0.1.0 was a
source-only release on GitHub.

What changed since v0.1.0

This is a packaging release. No public API or numerical claim has
changed; the pipeline, skills, and Filter 3 model-mode metrics are
identical to v0.1.0. The reason for cutting it as a new version is
that PyPI does not accept the v0.1.0 sdist + wheel built before the
metadata migration described below, and PyPI does not allow
re-uploading under an existing version number.

Packaging

Migrated to PEP 639 for license metadata.
pyproject.toml now declares license = "MIT" and
license-files = ["LICENSE"]; the legacy
License :: OSI Approved :: MIT License classifier is removed.
Bumped build-backend pin to setuptools >= 77, the first
release with full PEP 639 support.
Expanded classifiers with Development Status :: 4 - Beta,
Intended Audience :: Developers, Intended Audience :: Science/Research,
and Topic :: Scientific/Engineering :: Artificial Intelligence.
Broadened keywords with test-generation, skill-md,
methods2test, tree-sitter, and aho-corasick.
Added Project URLs: Documentation, Issues, Changelog,
and a direct link to the v0.1.0 paper PDF asset.
Tightened the description to a single concrete sentence:
"Rule-first three-filter pipeline for cleaning unit-test training
data, packaged as four SKILL.md skills."

CD pipeline

New tag-driven publish workflow at
.github/workflows/publish.yml. Pushing a v* tag triggers
build -> twine check --strict -> upload to PyPI via OIDC
Trusted Publisher (no API token) -> sigstore keyless signing ->
attach signed sdist + wheel + sigstore bundles to the matching
GitHub Release with gh release upload --clobber.
Operations runbook added at docs/PYPI-PUBLISHING.md:
one-time Trusted Publisher claim, per-release tagging flow, local
dry-run checklist, offline sigstore verification command, and the
yank-vs-delete policy.

Documentation

PyPI-friendly README. The hero image now references
raw.githubusercontent.com instead of a repo-relative path so it
renders on the PyPI project page. The Quick Start section leads
with pip install cleantest-agent and demotes the source-install
path to a "for development" alternative.
PyPI version badge added between the CI badge and the Python
versions badge.

CI / quality

All flake8 (F401) and mypy (union-attr, return-value)
warnings flagged on the v0.1.0 push are fixed.
GitHub Actions matrix runs green on Python 3.10, 3.11, and 3.12.
actions/checkout, actions/setup-python, and
codecov/codecov-action bumped to Node.js 24-compatible versions
(v5 / v6 / v5).

Verifying the published wheel

To confirm a clean install works:

python -m venv /tmp/cta-verify && source /tmp/cta-verify/bin/activate
pip install cleantest-agent==0.1.1
cleantest --help
python -c "import cleantest_agent; print('ok')"

To verify the sigstore signature offline:

pip install sigstore
sigstore verify identity \
    --bundle cleantest_agent-0.1.1-py3-none-any.whl.sigstore.json \
    --cert-identity 'https://github.com/jimmy0717/cleantest-agent/.github/workflows/publish.yml@refs/tags/v0.1.1' \
    --cert-oidc-issuer 'https://token.actions.githubusercontent.com' \
    cleantest_agent-0.1.1-py3-none-any.whl

Citation

The bibtex stanza for this release is unchanged from v0.1.0; this
is a packaging release, not a new artefact.

Acknowledgements

Same as v0.1.0. See
https://github.com/jimmy0717/cleantest-agent/releases/tag/v0.1.0.

Git tag: v0.1.1
Date: 2026-05-24

Assets 6

24 May 08:45

jimmy0717

v0.1.0

b8b9498

v0.1.0 - First public release

CleanTest-Agent v0.1.0

First public release.

CleanTest-Agent removes noisy (focal_method, test_case) pairs from
unit-test training corpora such as Methods2Test, ATLAS, and any dataset
that follows the same schema. It is a from-scratch reimplementation of
the CleanTest pipeline (Zhang et al., FSE 2025 Distinguished Paper),
restructured as four composable Agent Skills that drop into CodeBuddy,
Claude Code, Cursor, or any assistant that follows the SKILL.md
protocol.

This release covers the full three-filter pipeline, the fine-tuned
Qwen2.5-Coder-0.5B coverage regressor, the 36-test pytest suite, and
the four shippable skills.

Highlights

Three-filter pipeline -- syntax (AST + Aho-Corasick over a
21,954-pattern dictionary), relevance (AST name matching with an
optional 5-rule LLM reflection step), and coverage (JaCoCo-label
scan or model-mode regression).
Hybrid mode beats pure-LLM -- F1 0.965 in under 60 s on a
500-sample stratified subset of Methods2Test, versus 0.307 / 0.387
for LLM zero-shot / few-shot baselines that take ~25 minutes each.
Filter 3 without JaCoCo -- a fine-tuned Qwen2.5-Coder-0.5B
predicts branch coverage with held-out MAE 0.0309 (~2.6x lower than
the CodeGPT baseline reported in the original paper).
Four SKILL.md skills -- cleantest-pipeline,
cleantest-syntax-filter, cleantest-relevance-filter, and
cleantest-coverage-filter. Install with make install.
Cost -- ~$4.5 to clean the full 593,953-sample Methods2Test
corpus end-to-end with DeepSeek-V4-Flash, versus ~$35-58 for a
single-LLM-per-sample pipeline.

Installation

git clone https://github.com/jimmy0717/cleantest-agent.git
cd cleantest-agent
pip install -e ".[dev]"

To install the four skills into the local CodeBuddy directory:

make install

Quick start

# Bundled 5,000-row sample, no API needed:
cleantest --input_csv data/sample_5000.csv --output_dir output/

# With the optional LLM relevance check + reflection:
export OPENAI_API_KEY="your-key"
export OPENAI_BASE_URL="https://api.deepseek.com/v1"
cleantest --input_csv data/sample_5000.csv \
          --output_dir output/ \
          --llm_enhance --reflection

A noise report is written to output/noise_report.json plus a
human-readable Markdown summary.

What is included

cleantest_agent/ -- installable Python package with the
orchestrator, tree-sitter AST utilities, OpenAI-compatible LLM
wrapper, and the bundled 21,954-pattern annotation dictionary.
skills/ -- four SKILL.md skill bundles, each independently usable.
tests/ -- 36 pytest cases covering each filter, the orchestrator,
and the report generator.
experiments/ -- baseline runner with real DeepSeek API calls,
per-sample predictions for the 500-sample evaluation, and the
end-to-end Filter 3 training notebook.
data/sample_5000.csv -- a 5,000-row stratified subset of
Methods2Test, redistributed under the upstream MIT licence.
docs/ -- skill distribution guide, code-assistant usage guide,
Baidu AI Studio training guide, and the hero-image prompt.
report/ -- the 67-page LaTeX paper (acmart acmlarge,nonacm).
.github/workflows/ci.yml -- CI matrix on Python 3.10 / 3.11 / 3.12.

Reproducibility

The four numeric claims in the README are reproducible from this
release:

Claim	How to reproduce
F1 = 1.000 (rule-based) on the 500-sample subset	`python experiments/run_baselines.py --method rules`
F1 = 0.965 (hybrid) on the same subset	`python experiments/run_baselines.py --method hybrid` (requires `OPENAI_API_KEY`)
MAE = 0.0309 (Qwen 0.5B Filter 3)	`experiments/main-final.ipynb` end-to-end, or `experiments/results/coverage_run/test_metrics.json` for the archived numbers
Wall-clock < 3 min on Methods2Test	`cleantest --input_csv methods2test_full.csv --output_dir out/` after running the upstream Methods2Test export

The 500-sample stratified subset, the per-sample predictions for every
baseline, and the full Filter 3 training metrics are checked into
experiments/results/.

Compatibility

Python 3.10, 3.11, 3.12 (CI-tested).
macOS, Linux, Windows (tree-sitter wheels available for all three).
Filter 3 model mode requires PyTorch >= 2.1 and a Hugging Face
account with access to Qwen/Qwen2.5-Coder-0.5B. The default
label mode has no extra dependencies.
Any OpenAI-compatible chat endpoint works for the relevance LLM
step. The numbers in the paper are from DeepSeek-V4-Flash.

Known limitations

The annotation dictionary is Java-only. C# / Kotlin / Python test
corpora work with Filter 1's AST checks but will not benefit from
the 21,954-pattern Aho-Corasick scan.
Filter 3 model mode is trained on Java JaCoCo labels; predictions
on non-Java tests are not validated.
Reflection on Filter 2 is opt-in and additive. The headline
F1 = 0.965 was measured without reflection; enabling it produced
a small recall improvement on a 50-sample borderline subset that
is too small to claim a corpus-level gain. See report Section 6.2.
Filter 2 LLM mode currently issues one request per borderline
sample; batched requests are tracked as a future improvement
(report Section 9.3).

Datasets and licences

Code: MIT.
data/sample_5000.csv: derivative of Microsoft Methods2Test (MIT),
redistributed under the same terms; see data/README.md for the
full attribution.
The 21,954-pattern annotation dictionary in
cleantest_agent/data/noise_modifier_fm.txt is reconstructed from
the public CleanTest replication package (FSE 2025) and is
redistributed under MIT.

Citation

If you use CleanTest-Agent in academic work, please cite both the
original CleanTest paper and this implementation:

@inproceedings{zhang2025cleantest,
  title     = {Less is More: On the Importance of Data Quality for Unit Test Generation},
  author    = {Zhang, Junwei and Hu, Xing and Gao, Shan and Xia, Xin and Lo, David and Li, Shanping},
  booktitle = {Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering (FSE)},
  year      = {2025},
  note      = {Distinguished Paper Award; arXiv:2502.14212}
}

@misc{yang2026cleantestagent,
  title  = {{CleanTest-Agent}: A Multi-Agent Skill-Orchestrated System for Unit Test Training Data Quality Assurance},
  author = {Yang, Yong},
  year   = {2026},
  howpublished = {\url{https://github.com/jimmy0717/cleantest-agent}}
}

Acknowledgements

Zhang et al. for the original CleanTest definitions and the public
replication package.
Microsoft for releasing Methods2Test under MIT.
The Qwen team for releasing Qwen2.5-Coder-0.5B under Apache 2.0.
Reviewers in the Software Requirements Analysis and System Design
course at the School of Software, Beihang University, whose
feedback shaped the final structure of the report and the skills.

What's next

Tracked for a follow-up release (see report/main.tex Section 9.3
for the full list):

Batched Filter 2 LLM requests, expected to reduce wall-clock by
approximately 5x on the borderline subset.
Multi-language Filter 1 (extend the Aho-Corasick dictionary beyond
Java).
A cleantest-eval skill that takes a labelled validation slice and
reports precision / recall / F1 directly inside the assistant.

Issues and pull requests are welcome -- see CONTRIBUTING.md for
the development workflow.

Full diff: this is the first public release; there is no prior tag.
Git tag: v0.1.0
Date: 2026-05-24

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

CleanTest-Agent v0.1.1

Install

What changed since v0.1.0

Packaging

CD pipeline

Documentation

CI / quality

Verifying the published wheel

Citation

Acknowledgements

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

CleanTest-Agent v0.1.0

Highlights

Installation

Quick start

What is included

Reproducibility

Compatibility

Known limitations

Datasets and licences

Citation

Acknowledgements

What's next

Uh oh!

Releases: jimmy0717/cleantest-agent

v0.1.1 - First PyPI-published release

CleanTest-Agent v0.1.1

Install

What changed since v0.1.0

Packaging

CD pipeline

Documentation

CI / quality

Verifying the published wheel

Citation

Acknowledgements

Uh oh!

v0.1.0 - First public release

CleanTest-Agent v0.1.0

Highlights

Installation

Quick start

What is included

Reproducibility

Compatibility

Known limitations

Datasets and licences

Citation

Acknowledgements

What's next

Uh oh!