Skip to content

Commit

Permalink
avoid error with pandas >=0.22.0 when all rates are missing
Browse files Browse the repository at this point in the history
This fixes a bug where genes can gen p=0.0 when no rates data is available for a gene.
Since pandas 0.22.0, whenever summing a list where all values are NA, pandas will by
by default give sum=0.0, which if all gene rates are missing, ends up giving an
expected number of mutation=0, and this results in a P=0. This change reverts to the
previous pandas behaviour, where summing a list of NAs results in NA.
  • Loading branch information
jeremymcrae committed Jun 22, 2018
1 parent 2f2a9df commit 5aa8f64
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion mupit/mutation_rates.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,9 +119,14 @@ def get_expected_mutations(rates, male, female):

expected = rates[["hgnc", "chrom"]].copy()

# account for how different pandas versions sum series with only NA
kwargs = {}
if pandas.__version__ >= '0.22.0':
kwargs = {'min_count': 1}

# get the number of expected mutations, given the number of transmissions
expected["lof_indel"] = rates["frameshift"] * autosomal
expected["lof_snv"] = (rates[["non", "splice_site"]].sum(axis=1, skipna=True)) * autosomal
expected["lof_snv"] = (rates[["non", "splice_site"]].sum(axis=1, skipna=True, **kwargs)) * autosomal
expected["missense_indel"] = (rates["frameshift"] / 9) * autosomal
expected["missense_snv"] = rates["mis"] * autosomal
expected["synonymous_snv"] = rates["syn"] * autosomal
Expand Down

0 comments on commit 5aa8f64

Please sign in to comment.