Skip to content

Fix wrong denominator in max_missing_an fraction filter (closes #998)#1000

Merged
jonbrenas merged 4 commits intomalariagen:masterfrom
rehanxt5:GH998-fix-max-missing-an-fraction-denominator
Mar 2, 2026
Merged

Fix wrong denominator in max_missing_an fraction filter (closes #998)#1000
jonbrenas merged 4 commits intomalariagen:masterfrom
rehanxt5:GH998-fix-max-missing-an-fraction-denominator

Conversation

@rehanxt5
Copy link
Copy Markdown
Contributor

@rehanxt5 rehanxt5 commented Mar 1, 2026

What

When max_missing_an is passed as a float, the missingness fraction was computed by dividing by an (called alleles) instead of total alleles. This made the filter consistently more restrictive than intended.

Why it's wrong

A SNP with exactly 5% missing data in 100 diploid samples (10 missing out of 200 total) would compute 10 / 190 = 0.0526 and fail a max_missing_an=0.05 threshold — even though 5% ≤ 5% should pass. At high missingness levels the formula produces values greater than 1.0, which is not a valid fraction.

Change

Before:

an_missing = (ds_out.sizes["samples"] * ds_out.sizes["ploidy"]) - an
if isinstance(max_missing_an, float):
    an_missing_frac = an_missing / an  # wrong: called alleles as denominator
    loc_missing = an_missing_frac <= max_missing_an

After:

an_total = ds_out.sizes["samples"] * ds_out.sizes["ploidy"]
an_missing = an_total - an
if isinstance(max_missing_an, float):
    an_missing_frac = an_missing / an_total  # correct: total alleles as denominator
    loc_missing = an_missing_frac <= max_missing_an

I also updated the tests for tests/anoph/test_snp_data.py

Notes

  • The integer path (max_missing_an as int) is unaffected.
  • The min_minor_ac float path below this uses an (called alleles) as denominator which is correct for allele frequency — that path is unchanged.

closes #998

@rehanxt5
Copy link
Copy Markdown
Contributor Author

rehanxt5 commented Mar 2, 2026

@jonbrenas hey 👋 can you please review this

@jonbrenas jonbrenas merged commit e51965a into malariagen:master Mar 2, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrong denominator in max_missing_an fraction filter

2 participants