`func_score_to_gpm` works with scores for amino acid variants #16

caelanradford · 2019-03-29T22:01:38Z

Previously, the function would only be able to make a genotype phenotype map where each barcode variant is put into the map with its measured phenotype, regardless of whether some were actually the same amino acid mutant. Now, the function has the option to make a map where identical amino acid mutants are only put into the map once with their average phenotype.

Adds the aaSubs input to func_score_to_gpm, which allows you to make genotype phenotype maps with the average phenotype of identical amino acid mutants, rather than a genotype phenotype map with the phenotype of each individual barcoded mutant, where identical amino acid mutants could be in the map multiple times with different phenotypes.

jbloom · 2019-03-29T22:15:46Z

@caelanradford: This commit doesn't seem to do what it says it does.

In the new func_scores_to_gpm, it says that aaSubs makes it return the average phenotype of each amino-acid mutant. If that is in fact what it did, it should have a more informative name like avg_variants. But in fact, the code doesn't average anything. In fact, I think it gives the exact same results regardless of whether or not aaSubs is used. There is no averaging of phenotypes for any variants, the only difference appears to be whether the substitutions passed to codonSubsToSeq happen to be in amino-acid or codon form.

caelanradford · 2019-03-29T22:27:39Z

The documentation I wrote is not clear enough on this. You would only pass aaSubs as True if you generated a functional scores dataframe with the input by='aa_substitutions' and used that as the input into func_score_to_gpm. The averaging of the phenotypes of identical amino acid mutants happens earlier, when CodonVariantTable.func_scores is called. This change just lets func_scores_to_gpm use that as an input.

jbloom · 2019-03-29T22:43:29Z

In that case make the documentation clear. The docs need to stand alone and accurately and succinctly explain what that function does. Work on writing docs that do that.

You should explain aaSubs in terms of what it does: read mutations from aa_substitutions rather than codon_substitutions; if you want you can explain this is useful if func_scores is calculated in the way you mention above.

Also, note that in both docstrings you changed in this pull request, you are missing backticks on some references to the args. You also have a lot of typos in one of the docstrings.

jbloom · 2019-03-29T22:44:25Z

Also, you should re-name the pull request as it suffers from the same problem as the doc string: not really accurately describing the changes.

jbloom · 2019-03-29T22:56:21Z

@caelanradford: actually, now that I think about it more, I think all of this is overly complex.

The func_scores_df data frame that is passed to func_score_to_gpm always has an aa_substitutions column regardless of how CodonVariantTable.func_scores is called. So rather than add a confusing new aaSubs option to func_score_to_gpm, just make it always compute the genotypes using the aa_substitutions column rather than the codon_substitutions column. Then it will just work correctly in all of the scenarios you are worried above.

In addition to this, improve the docs for func_score_to_gpm to indicate exactly where func_scores_df comes from (:meth:CodonVariantTable.func_scores) and that the returned genotype-phenotype map is for amino-acid sequences. Finally, include a web link when you refer to the Harms lab package, as in: "returns a genotype-phenotype map from the gpmap <https://github.com/harmslab/gpmap>_ package. The current docs don't link to the package and also give it the wrong name (its gpmap, not gpm).

Gets rid of `by_aa` option and uses amino acid substitutions to generate genotypes for all func_score dataframe inputs

jbloom · 2019-03-30T02:41:40Z

@caelanradford:

Nice job on this and persevering through all the edits and comments.

Just one tiny fine tuning for the docs:

In the doc string, you refer to:

  `CodonVariantTable.func_scores`

In this line

dms_tools2/dms_tools2/codonvarianttable.py

Line 2868 in 515c3da

`CodonVariantTable.func_scores` method), narrowed down to one post

If you instead make it:

   :meth:`CodonVariantTable.from_variant_count_df`

sort of like in this line

dms_tools2/dms_tools2/codonvarianttable.py

Line 337 in 515c3da

We do this using :meth:`CodonVariantTable.from_variant_count_df`.

then you will get a nice hyperlink in the final sphinx rendered docs. This is how the built docs here have hyperlinks when you click on the methods: https://jbloomlab.github.io/dms_tools2/dms_tools2.codonvarianttable.html

You can read more about this here: https://stackoverflow.com/questions/34848253/whats-the-difference-in-behaviour-between-func-and-meth-roles-in-python-sph

You can even build the docs to look yourself if you do make html in ./docs subdirectory and then look in _build/html.

That's the only remaining change I can see, then this should be ready to merge.

caelanradford added 2 commits March 29, 2019 14:30

Fix docs in codonSubsToSeq and func_score_to_gpm

d370cb6

Make docs on func_score_to_gpm clearer

4fdc1c5

Make func_score_to_gpm simpler

515c3da

Gets rid of `by_aa` option and uses amino acid substitutions to generate genotypes for all func_score dataframe inputs

caelanradford changed the title ~~Add option to use average phenotypes of identical amino acid mutants in func_score_to_gpm~~ Make func_score_to_gpm work with functional score dataframes calculated by amino acid substitutions Mar 29, 2019

Add hyperlink to func_score_to_gpm doc

b8f7077

jbloom changed the title ~~Make func_score_to_gpm work with functional score dataframes calculated by amino acid substitutions~~ func_score_to_gpm works with scores for amino acid variants Apr 1, 2019

jbloom merged commit 207f142 into jbloomlab:master Apr 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`func_score_to_gpm` works with scores for amino acid variants #16

`func_score_to_gpm` works with scores for amino acid variants #16

caelanradford commented Mar 29, 2019

jbloom commented Mar 29, 2019

caelanradford commented Mar 29, 2019

jbloom commented Mar 29, 2019

jbloom commented Mar 29, 2019

jbloom commented Mar 29, 2019

jbloom commented Mar 30, 2019 •

edited

Loading

func_score_to_gpm works with scores for amino acid variants #16

func_score_to_gpm works with scores for amino acid variants #16

Conversation

caelanradford commented Mar 29, 2019

jbloom commented Mar 29, 2019

caelanradford commented Mar 29, 2019

jbloom commented Mar 29, 2019

jbloom commented Mar 29, 2019

jbloom commented Mar 29, 2019

jbloom commented Mar 30, 2019 • edited Loading

`func_score_to_gpm` works with scores for amino acid variants #16

`func_score_to_gpm` works with scores for amino acid variants #16

jbloom commented Mar 30, 2019 •

edited

Loading