Skip to content

Respect --tmpdir across annotate (diamond) and predict (miniprot) steps#55

Merged
nextgenusfs merged 2 commits intomainfrom
respect-tmpdir-in-annotate
Apr 22, 2026
Merged

Respect --tmpdir across annotate (diamond) and predict (miniprot) steps#55
nextgenusfs merged 2 commits intomainfrom
respect-tmpdir-in-annotate

Conversation

@nextgenusfs
Copy link
Copy Markdown
Owner

Closes #52.

Summary

The --tmpdir CLI option was not honored for all intermediate files. Two distinct places hard-coded /tmp, causing FileNotFoundError on nodes where /tmp is not writable (HPC/slurm, containers) even when the user supplied --tmpdir.

This PR threads the user-supplied tmp directory end-to-end through both the annotate diamond searches and the predict miniprot alignment, and makes create_tmpdir robust against accidental deletion of a user-supplied volume.

Root cause

1. annotateswissprot_blast / merops_blastdiamond_blast

diamond_blast defaulted to tmpdir="/tmp" and its two wrappers did not accept a tmpdir kwarg at all. annotate.py never forwarded args.tmpdir, so diamond intermediates were always written to /tmp/diamond_<uuid> — exactly the traceback in #52.

2. predictalign_proteins

align_proteins defaulted to tmpdir="/tmp" and wrote mini_tmp = os.path.join(tmpdir, f"{uuid.uuid1()}.miniprot.out"). predict.py never passed a tmpdir, so miniprot output hit /tmp regardless of --tmpdir. Same class of bug, different tool.

3. create_tmpdir semantics

When the user passed any --tmpdir other than /tmp, create_tmpdir returned the directory verbatim — meaning the downstream shutil.rmtree(tmp_dir) in predict.py would wipe the user-supplied volume. It also blocked concurrent runs sharing a scratch volume.

Changes

funannotate2/search.py

  • diamond_blast(..., tmpdir="/tmp")diamond_blast(..., tmpdir=None) with fallback to tempfile.gettempdir() (honors $TMPDIR).
  • swissprot_blast(..., tmpdir=None) and merops_blast(..., tmpdir=None) — new kwarg, forwarded to diamond_blast.

funannotate2/annotate.py

  • swissprot_blast(Proteins, cpus=args.cpus)swissprot_blast(Proteins, cpus=args.cpus, tmpdir=tmp_dir).
  • merops_blast(Proteins, cpus=args.cpus)merops_blast(Proteins, cpus=args.cpus, tmpdir=tmp_dir).

funannotate2/align.py

  • Added import tempfile.
  • align_proteins(..., tmpdir="/tmp")align_proteins(..., tmpdir=None) with fallback to tempfile.gettempdir().
  • align_transcripts(..., tmpdir="/tmp")align_transcripts(..., tmpdir=None) for API parity (kwarg was and remains unused internally — gapmm2 writes directly to output=evidence; docstring updated to explain).

funannotate2/predict.py

  • Pass tmpdir=tmp_dir to both align_transcripts(...) and align_proteins(...).

funannotate2/utilities.py

  • create_tmpdir now always creates a unique <base>_<uuid> subdirectory inside the supplied volume (previously only for /tmp). Falls back to tempfile.gettempdir() when no volume is supplied (was CWD). This prevents the shutil.rmtree hazard on user-supplied volumes and makes concurrent runs safe on a shared scratch directory.

Behavior change for users

  • funannotate2 annotate --tmpdir /scratch/mydir now writes /scratch/mydir/<base>_<uuid>/... — previously it would write directly into /scratch/mydir and could later delete it.
  • When --tmpdir is omitted, temporary files now land in tempfile.gettempdir() (which honors $TMPDIR), not a hardcoded /tmp.

Tests

All new tests extend existing test modules — no new test files created.

  • tests/unit/test_utilities.py::TestCreateTmpdir — 5 tests covering unique-subdir creation, absolute paths, $TMPDIR fallback, and subdir isolation from the user-supplied volume.
  • tests/unit/test_search_comprehensive.py::TestDiamondTmpdirPlumbing — 4 new tests that mock subprocess.Popen and assert diamond's --out argument is under the caller-supplied tmpdir; plus two existing assertions updated for the new tmpdir=None kwarg on swissprot_blast/merops_blast.
  • tests/unit/test_predict.py::TestAlignProteinsTmpdir — 2 new tests that mock runSubprocess and assert the stdout path passed to miniprot is inside the caller-supplied tmpdir (explicit case) or inside tempfile.gettempdir() (fallback case).

Results

$ pytest tests/unit/test_utilities.py tests/unit/test_search_comprehensive.py tests/unit/test_predict.py -q
... 41 passed

$ pytest tests/ -q
196 passed, 2 failed, 61 skipped

The 2 failures (test_funannotate_cli::test_version_command, test_help_command) are pre-existing environment-specific failures — subprocess.run("python -m funannotate2 --version") returns 127 in the integration harness on this machine. Verified identical on git stash (before any change in this PR).

Non-goals / out of scope

  • abinitio.py's folder="/tmp" kwargs — these wrap work in tempfile.TemporaryDirectory() internally, so $TMPDIR is already honored at runtime. Stylistic cleanup only; left for a separate change.
  • evm.py::evm_consensus has a tmpdir="/tmp" default but is not imported or called anywhere else in the package (dead code).
  • Changing gapmm2 behavior to actually consume the tmpdir kwarg — gapmm2 writes directly to the caller-supplied output path; a staging tmp is out of scope.

Rollback

All changes are confined to funannotate2/{search,annotate,align,predict,utilities}.py and three existing unit-test modules. All added kwargs default to None, so the public API is backwards compatible.


Pull Request opened by Augment Code with guidance from the PR author

Jon Palmer added 2 commits April 21, 2026 23:05
…ent variable in temporary file handling

Agent-Id: agent-46564fa7-02ab-4480-8cd2-5657e9e74535
…ent variable in temporary file handling

Agent-Id: agent-46564fa7-02ab-4480-8cd2-5657e9e74535
@nextgenusfs nextgenusfs marked this pull request as ready for review April 22, 2026 06:20
@nextgenusfs nextgenusfs merged commit f1b2e44 into main Apr 22, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

diamond tmp - No such file or directory: '/tmp/diamond...

1 participant