Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

func_score_to_gpm works with scores for amino acid variants #16

Merged
merged 5 commits into from
Apr 1, 2019

Conversation

caelanradford
Copy link
Contributor

Previously, the function would only be able to make a genotype phenotype map where each barcode variant is put into the map with its measured phenotype, regardless of whether some were actually the same amino acid mutant. Now, the function has the option to make a map where identical amino acid mutants are only put into the map once with their average phenotype.

Adds the aaSubs input to func_score_to_gpm, which allows you to make genotype phenotype maps with the average phenotype of identical amino acid mutants, rather than a genotype phenotype map with the phenotype of each individual barcoded mutant, where identical amino acid mutants could be in the map multiple times with different phenotypes.
@jbloom
Copy link
Member

jbloom commented Mar 29, 2019

@caelanradford: This commit doesn't seem to do what it says it does.

In the new func_scores_to_gpm, it says that aaSubs makes it return the average phenotype of each amino-acid mutant. If that is in fact what it did, it should have a more informative name like avg_variants. But in fact, the code doesn't average anything. In fact, I think it gives the exact same results regardless of whether or not aaSubs is used. There is no averaging of phenotypes for any variants, the only difference appears to be whether the substitutions passed to codonSubsToSeq happen to be in amino-acid or codon form.

@caelanradford
Copy link
Contributor Author

The documentation I wrote is not clear enough on this. You would only pass aaSubs as True if you generated a functional scores dataframe with the input by='aa_substitutions' and used that as the input into func_score_to_gpm. The averaging of the phenotypes of identical amino acid mutants happens earlier, when CodonVariantTable.func_scores is called. This change just lets func_scores_to_gpm use that as an input.

@jbloom
Copy link
Member

jbloom commented Mar 29, 2019

In that case make the documentation clear. The docs need to stand alone and accurately and succinctly explain what that function does. Work on writing docs that do that.

You should explain aaSubs in terms of what it does: read mutations from aa_substitutions rather than codon_substitutions; if you want you can explain this is useful if func_scores is calculated in the way you mention above.

Also, note that in both docstrings you changed in this pull request, you are missing backticks on some references to the args. You also have a lot of typos in one of the docstrings.

@jbloom
Copy link
Member

jbloom commented Mar 29, 2019

Also, you should re-name the pull request as it suffers from the same problem as the doc string: not really accurately describing the changes.

@jbloom
Copy link
Member

jbloom commented Mar 29, 2019

@caelanradford: actually, now that I think about it more, I think all of this is overly complex.

The func_scores_df data frame that is passed to func_score_to_gpm always has an aa_substitutions column regardless of how CodonVariantTable.func_scores is called. So rather than add a confusing new aaSubs option to func_score_to_gpm, just make it always compute the genotypes using the aa_substitutions column rather than the codon_substitutions column. Then it will just work correctly in all of the scenarios you are worried above.

In addition to this, improve the docs for func_score_to_gpm to indicate exactly where func_scores_df comes from (:meth:CodonVariantTable.func_scores) and that the returned genotype-phenotype map is for amino-acid sequences. Finally, include a web link when you refer to the Harms lab package, as in: "returns a genotype-phenotype map from the gpmap <https://github.com/harmslab/gpmap>_ package. The current docs don't link to the package and also give it the wrong name (its gpmap, not gpm).

Gets rid of `by_aa` option and uses amino acid substitutions to generate genotypes for all func_score dataframe inputs
@caelanradford caelanradford changed the title Add option to use average phenotypes of identical amino acid mutants in func_score_to_gpm Make func_score_to_gpm work with functional score dataframes calculated by amino acid substitutions Mar 29, 2019
@jbloom
Copy link
Member

jbloom commented Mar 30, 2019

@caelanradford:

Nice job on this and persevering through all the edits and comments.

Just one tiny fine tuning for the docs:

In the doc string, you refer to:

  `CodonVariantTable.func_scores` 

In this line

`CodonVariantTable.func_scores` method), narrowed down to one post

If you instead make it:

   :meth:`CodonVariantTable.from_variant_count_df`

sort of like in this line

We do this using :meth:`CodonVariantTable.from_variant_count_df`.
then you will get a nice hyperlink in the final sphinx rendered docs. This is how the built docs here have hyperlinks when you click on the methods: https://jbloomlab.github.io/dms_tools2/dms_tools2.codonvarianttable.html

You can read more about this here: https://stackoverflow.com/questions/34848253/whats-the-difference-in-behaviour-between-func-and-meth-roles-in-python-sph

You can even build the docs to look yourself if you do make html in ./docs subdirectory and then look in _build/html.

That's the only remaining change I can see, then this should be ready to merge.

@jbloom jbloom changed the title Make func_score_to_gpm work with functional score dataframes calculated by amino acid substitutions func_score_to_gpm works with scores for amino acid variants Apr 1, 2019
@jbloom jbloom merged commit 207f142 into jbloomlab:master Apr 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants