Skip to content

Suppress repetitive GeneMark license banner in training/predict logs#56

Merged
nextgenusfs merged 1 commit intomainfrom
suppress-genemark-log-spam
Apr 22, 2026
Merged

Suppress repetitive GeneMark license banner in training/predict logs#56
nextgenusfs merged 1 commit intomainfrom
suppress-genemark-log-spam

Conversation

@nextgenusfs
Copy link
Copy Markdown
Owner

Problem

During funannotate2 train (and predict), gmhmme3 is invoked once per contig. Each invocation prints a two-line license banner to both stdout and stderr:

GeneMark.hmm eukaryotic 3
16 days remaining in the license period.

runSubprocess captures both streams and forwards them to logger.debug(...), and the file handler is set to DEBUG. Result: the log file gets flooded with duplicate banners (one block per contig — typically dozens to thousands).

Example from a user's log:

[Apr 22 08:53 AM] gmhmme3 -f gtf -k 0.03 -m gmhmm.mod -o g_272.fa.gtf g_272.fa
[Apr 22 08:53 AM] GeneMark.hmm eukaryotic 3
16 days remaining in the license period.

GeneMark.hmm eukaryotic 3
16 days remaining in the license period.

[Apr 22 08:53 AM] gmhmme3 -f gtf -k 0.03 -m gmhmm.mod -o g_947.fa.gtf g_947.fa
...

Fix

  1. funannotate2/utilities.py — add an opt-in output_filter=None kwarg to runSubprocess. When set, the success path delegates output handling to output_filter(logfile, stdout, stderr) instead of the default logoutput(logfile.debug, process). Default behavior is unchanged for every existing caller. Error path (CalledProcessError) is untouched.

  2. funannotate2/abinitio.py — add a module-level _GENEMARK_BANNER_LOGGED flag and a _genemark_output_filter helper. The helper splits each stream into banner lines (matching GeneMark.hmm eukaryotic or days remaining in the license period) vs. other content; the banner is logged at debug once per process, then suppressed on subsequent calls. Non-banner content (real warnings/errors) continues to go through log.debug. Both gmhmme3 call sites (run_genemark and the evaluation loop's tool == "genemark" branch) now pass output_filter=_genemark_output_filter.

Preserved behavior

  • The per-contig gmhmme3 -f gtf -k 0.03 ... command line is still logged (that one is useful for debugging; only the banner is suppressed).
  • Any non-banner gmhmme3 output (real warnings/errors) is still logged at debug.
  • Command failures (CalledProcessError) still log full stdout/stderr at error level — unchanged.
  • No behavioral change for Augustus / SNAP / GlimmerHMM / BUSCO or any other tool.

Verification

  • python3 -m py_compile funannotate2/utilities.py funannotate2/abinitio.py — both compile clean.
  • Ad-hoc behavior test against a stub logger:
    • First call with banner content → 1 debug emission, flag flips to True.
    • Second identical call → still 1 (banner suppressed).
    • Third call with banner + extra content → 3 (banner still not re-emitted, non-banner lines forwarded).
    • Empty / None inputs handled without raising.

Risk notes

  • Module-level flag lives for the Python process. Current call sites are serial per-contig in the main process. If a future caller moves gmhmme3 into a multiprocessing worker (fork), each worker would log the banner once — still a massive reduction.
  • Banner detection is substring-based. If GeneMark ever changes the banner wording significantly, a new line would fall through to normal debug logging (degrades to current behavior — non-fatal).

Pull Request opened by Augment Code with guidance from the PR author

Agent-Id: agent-4d333a81-18f1-408d-8560-0c846ac09121
@nextgenusfs nextgenusfs marked this pull request as ready for review April 22, 2026 17:47
@nextgenusfs nextgenusfs merged commit a340328 into main Apr 22, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant