Version: GQC 1.1.0 (also reproduced against main at GQC/stats.py:869)
Summary
readbench --strs crashes with ZeroDivisionError: division by zero in stats.write_read_str_stats when the included STR set contains a runlength for which no reads pass the qualification filters (alignment span / soft-clip / 5 bp flank checks).
Traceback
File ".../GQC/readbench.py", line 176, in main
stats.write_read_str_stats('trinuc', trinucstats, outputfiles, args)
File ".../GQC/stats.py", line 869, in write_read_str_stats
accrate = correctbylength[refstrlength]/(correctbylength[refstrlength] + lengtherrorbylength[refstrlength] + complexbylength[refstrlength] + flankbylength[refstrlength])
ZeroDivisionError: division by zero
Root cause
In errors.assess_str_read_coverage_require_flank (errors.py:376-377), every STR in the included BED is registered in strdict before read tallying:
runlength = end - start
if runlength not in strdict:
strdict[runlength] = {}
If zero reads at any STR with that runlength survive the filters at lines 401-434 (reference_start/reference_end checks, soft-clip requirement, <5 bp flank check), strdict[runlength] remains {}.
Then in stats.write_read_str_stats the inner loop over stats[refstrlength].keys() never runs, so correctbylength, lengtherrorbylength, complexbylength, and flankbylength all stay at 0, and the per-row division on line 869 fails.
Note: even when reads are present, this can still fire if the only category seen for a given length is HET, since HET is excluded from the denominator.
Reproduction
Run readbench --strs against a BAM with sparse coverage at a particular STR locus. We hit it on a low-coverage diploid mapping where one trinuc STR (TNR_7289_ATT at chr21_MATERNAL:47134596-47134638, runlength=42) had zero qualifying reads. The same crash also occurred on chr7_MATERNAL (trinuc) and on a genome-wide --rerun (tetranuc) in the same dataset.
Suggested fix
Guard the division and skip the row when there's no data:
for refstrlength in sorted(correctbylength.keys()):
denom = (correctbylength[refstrlength]
+ lengtherrorbylength[refstrlength]
+ complexbylength[refstrlength]
+ flankbylength[refstrlength])
if denom == 0:
continue
accrate = correctbylength[refstrlength] / denom
lfh.write(...)
Alternatively, write the row with accrate=NA so empty buckets are still represented in the output.
Environment
- GQC 1.1.0, installed via
uvx --from gqc readbench
- Python 3.14
- Linux x86_64
Bug patched and issue text generated with the help of claude opus 4.7
Version: GQC 1.1.0 (also reproduced against
mainatGQC/stats.py:869)Summary
readbench --strscrashes withZeroDivisionError: division by zeroinstats.write_read_str_statswhen the included STR set contains arunlengthfor which no reads pass the qualification filters (alignment span / soft-clip / 5 bp flank checks).Traceback
Root cause
In
errors.assess_str_read_coverage_require_flank(errors.py:376-377), every STR in the included BED is registered instrdictbefore read tallying:If zero reads at any STR with that
runlengthsurvive the filters at lines 401-434 (reference_start/reference_endchecks, soft-clip requirement,<5 bpflank check),strdict[runlength]remains{}.Then in
stats.write_read_str_statsthe inner loop overstats[refstrlength].keys()never runs, socorrectbylength,lengtherrorbylength,complexbylength, andflankbylengthall stay at 0, and the per-row division on line 869 fails.Note: even when reads are present, this can still fire if the only category seen for a given length is
HET, sinceHETis excluded from the denominator.Reproduction
Run
readbench --strsagainst a BAM with sparse coverage at a particular STR locus. We hit it on a low-coverage diploid mapping where one trinuc STR (TNR_7289_ATTatchr21_MATERNAL:47134596-47134638, runlength=42) had zero qualifying reads. The same crash also occurred onchr7_MATERNAL(trinuc) and on a genome-wide--rerun(tetranuc) in the same dataset.Suggested fix
Guard the division and skip the row when there's no data:
Alternatively, write the row with
accrate=NAso empty buckets are still represented in the output.Environment
uvx --from gqc readbenchBug patched and issue text generated with the help of claude opus 4.7