New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hp encoding for proteins #758
Conversation
…line while writing umi and seq in the temp fasta file
@olgabot @luizirber @taylorreiter Added HP encoding, please review when you are free! |
|
We should also consider #751 before merging this |
I am not sure if the coverage is actually calculated correctly. The codecov bot edit history shows that coverage increased at one point yesterday, now decreased again, all I did was fix a test and add more tests and merge from master. |
Thank you! I'm really enthusiastic about this functionality, and these changes look good to me. I would like to have a chance to dig into them more carefully, but might not have the time in the near future. Questions I have:
|
Thanks for reviewing and asking great questions! |
We need to start thinking more about how to document these things... Adding new things to the JSON signatures should trigger data version bumps (should already have happened for dayhoff). But this is being discussed in #751 and is not really a blocker for this PR.
This PR is very similar to #689, and relatively straighforward to add to the Rust branches. |
@luizirber I can add the rust stuff in a different PR if that's blocking this PR. Let me know if I can directly check out from master and make those changes. |
Oh, that's not really blocking the PR. I think #751 is blocking more, but I'm comfortable with this one being merged now and we figure out the rest later. Adding the dayhoff encoding in the Rust branch was a bit annoying because I want to keep #424 with only the Python/CFFI changes, but obviously there were changes on the Rust code too. I ended up implementing it all in the same branch to be sure it worked, and then cherry-picked changes into a new PR #760 (which contains other changes that accumulated in #424 but were not really relevant to that PR). On @ctb comments about tests, even if we are covering the new lines I'm not so sure we are taking care of all the behaviors we expect. I'll take a closer look and come up with suggestions soon. |
thanks @luizirber |
Getting this error in here nf-core/kmermaid#17, at least locally: ``` == This is sourmash version 2.0.0a10.dev119+gf9bd45f. == == Please cite Brown and Irber (2016), doi:10.21105/joss.00027. == computing signatures for files: SRR4238355_subsamp_coding_reads_peptides.fasta Computing signature for ksizes: [6] Computing only Dayhoff-encoded protein (and not nucleotide) signatures. Computing a total of 1 signature(s). ... reading sequences from SRR4238355_subsamp_coding_reads_peptides.fasta Traceback (most recent call last): File "/opt/conda/envs/nfcore-kmermaid-0.1dev/bin/sourmash", line 8, in <module> sys.exit(main()) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/sourmash/__main__.py", line 83, in main cmd(sys.argv[2:]) File "/opt/conda/envs/nfcore-kmermaid-0.1dev/lib/python3.6/site-packages/sourmash/command_compute.py", line 376, in compute notify('... {} {} sequences', filename, n + 1) UnboundLocalError: local variable 'n' referenced before assignment ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Had a few suggestions for documentation and comments.
mh.add_protein(b'AGYYG') | ||
|
||
if hp: | ||
assert len(mh.get_mins()) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still so funny to me that this sequence is all p
so it only contains one k-mer!
Define 'n' before using it
add bam2fasta for requirements
@@ -12,3 +12,4 @@ sphinxcontrib-napoleon | |||
setuptools_scm | |||
setuptools_scm_git_archive | |||
nbsphinx | |||
bam2fasta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I should remove this, should we still keep bam2fasta optional? @olgabot @luizirber
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this up to @luizirber. I'd personally prefer to keep it there since the functionality will break if it is not installed, but not everyone needs the bam to fasta conversion.
@olgabot addressed your comments. Please review when you are free |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?