Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] initial refactor of compute command (and associated test module) #734

Merged
merged 4 commits into from
Sep 17, 2019

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Sep 12, 2019

Fixes #733.

Additional refactoring and tests will be the subject of another PR - this one aims to get the code in roughly the right place first!

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

cc @pranathivemuri @luizirber

@ctb
Copy link
Contributor Author

ctb commented Sep 12, 2019

Random questions:

Should we put things in commands_compute.py or command_compute.py?

Is test_sourmash_compute.py better than the more succinct test_compute.py?

@codecov
Copy link

codecov bot commented Sep 12, 2019

Codecov Report

Merging #734 into master will increase coverage by 0.03%.
The diff coverage is 79.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #734      +/-   ##
==========================================
+ Coverage   88.68%   88.71%   +0.03%     
==========================================
  Files          29       30       +1     
  Lines        4604     4617      +13     
  Branches       45       45              
==========================================
+ Hits         4083     4096      +13     
  Misses        519      519              
  Partials        2        2
Impacted Files Coverage Δ
sourmash/commands.py 88.42% <100%> (+3.23%) ⬆️
sourmash/tenx.py 98.95% <100%> (+0.01%) ⬆️
sourmash/command_compute.py 79.04% <79.04%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 628d862...92cde33. Read the comment docs.

@pranathivemuri
Copy link
Contributor

Firstly,
commands_compute.py makes sense to me
test_sourmash_compute.py works

Secondly, I was thinking we would move all the functions out of sourmash compute into a module called compute.py and unit test them. it might be complicated as we pass around the command line args inside the class as a dictionary but we wouldn't have them, we would need separate keywords for them. It shouldn't be that bad. We should keep the compute command inside commands.py so we have all commands in one place.

Lastly, if we have this pattern of making compute a separate file, should we have commands_compute_10x.py? and their corresponding tests

@ctb
Copy link
Contributor Author

ctb commented Sep 12, 2019

OK, in 4d1f0fa I made pathos and pysam optional, using pytest.importorskip.

Assuming tests pass and there aren't any more basic structural things to tackle, I'd like to suggest that this PR get merged soon, and then we can tackle better unit testing and other things with the code that's isolated into these modules. This will minimize conflicts with and from other ongoing work. Thoughts @luizirber?

In response to @pranathivemuri last comment - I'd like to avoid proliferating files as much as possible, so I think we should keep everything associated with making new sketches in one place rather than breaking things out further.

@ctb
Copy link
Contributor Author

ctb commented Sep 12, 2019

(actually, the better (& real) reason for merging now and then working on things in a new PR is that future diffs will be much smaller and reviewers will be able to tell which code was actually modified :)

@pranathivemuri
Copy link
Contributor

I agree, please feel free to merge it then!

@ctb
Copy link
Contributor Author

ctb commented Sep 12, 2019

(still want to have @luizirber sign off on it :)

@ctb ctb changed the title [WIP] initial refactor of compute command (and associated test module) [MRG] initial refactor of compute command (and associated test module) Sep 16, 2019
@ctb
Copy link
Contributor Author

ctb commented Sep 16, 2019

@olgabot if you have time to do a review I'd appreciate it :). There is a failing test that I need to fix, but other than that all is ready.

@luizirber
Copy link
Member

@ctb I added a new commit (92cde33) moving pysam into the functions that use it, and removing from requirements.txt. For those using 10x features, the install instruction is
pip install sourmash[10x] (but probably we will bring pysam and pathos in the bioconda recipe)

Copy link
Collaborator

@olgabot olgabot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! It would be great if the helper functions of compute were also refactored out separately as it makes the code a little confusing to read right now (at least it was for me when I was first exposed to the codebase)

sourmash compute file1.fa file2.fa --merge merged -o file.sig
=> creates one output file file.sig, with all sequences from
file1.fa and file2.fa combined into one signature.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
sourmash compute --input-is-10x --barcodes barcodes.tsv possorted_genome_bam.bam
"""

error("must specify -o with --merge")
sys.exit(-1)

def make_minhashes():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these functions be refactored to separate functions?

@ctb
Copy link
Contributor Author

ctb commented Sep 17, 2019

@olgabot those are excellent refactoring suggestions! but per #734 (comment), the plan is to do them in a separate PR. The one thing git doesn't handle that well is moving large blocks of code around (which is understandable :) and it's quite challenging to review diffs against big reorgs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move commands.compute function -> new file.
4 participants