Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

classify-consensus-blast: expose num_threads #155

Closed
nbokulich opened this issue Jan 21, 2020 · 9 comments
Closed

classify-consensus-blast: expose num_threads #155

nbokulich opened this issue Jan 21, 2020 · 9 comments
Assignees
Labels
diff:1|beginner Only limited knowledge of the languages and platform is required. good first issue Good for newcomers help wanted Extra attention is needed scope:0|this-project No other repositories are impacted. src:forum From the QIIME 2 Forum. time:1|short Should not take very much time to complete (assuming familiarity).

Comments

@nbokulich
Copy link
Member

Improvement Description
Expose num_threads to enable parallelization in classify-consensus-blast

Current Behavior
num_threads is not exposed.

Proposed Behavior
Expose it.

@nbokulich nbokulich added help wanted Extra attention is needed diff:1|beginner Only limited knowledge of the languages and platform is required. good first issue Good for newcomers scope:0|this-project No other repositories are impacted. src:forum From the QIIME 2 Forum. time:1|short Should not take very much time to complete (assuming familiarity). labels Jan 21, 2020
@splaisan
Copy link

please someone implement this, our jobs are taking forever against SILVA due to blast being single threaded. I would do the edit myself if I knew enough of python (sob)

@gregcaporaso gregcaporaso self-assigned this Jan 29, 2020
@gregcaporaso
Copy link
Member

Note: @cherman2 is going to work on this issue.

@thermokarst thermokarst assigned cherman2 and unassigned gregcaporaso Jan 29, 2020
@cherman2
Copy link
Collaborator

cherman2 commented Feb 19, 2020

I see that in past code, there was a num_thread that was set to 1 (num_threads: str=1), in the current version the numThread variable does not exist. I can see in the history that it was removed because "The 'num_threads' parameter had no effect with the mode that 'blastn' is invoked in." Should num_threads be reimplemented to equal 1, and be exposed, or should num_threads be an amount that is decided by the user? If so, what is the max amount of threads the code should allow?
The past code: 9a88406#diff-09b8f70df4e6b546aa7e9361837f99a8

@thermokarst
Copy link
Contributor

Awesome, thanks for the sleuthing, @cherman2! 🔍

Based on the issue/PR you linked to (#77), it is safe to assume that this issue (#155) is not actually something that can be accomplished (sorry @splaisan, our hands are tied), since -num_threads is ignored when the -subject parameter is supplied to the blastn tool.

What do you think about pivoting this issue and turning it into a documentation issue? Specifically, would you be able to add a comment to the source code where the classify_consensus_blast function is defined, explaining why -num_threads isn't exposed? Please include a link to issue #77, if possible. My rationale for this comment/documentation request is simple: hopefully it will prevent us from retreading over this ground (#155, #77) again!

Please let me know if you have any questions, or if I can clarify anything for you. Thanks!

PS - I confirmed that blastn 2.6.0 (which we are currently using in q2-feature-classifier) still emits this warning

cherman2 added a commit to cherman2/q2-feature-classifier that referenced this issue Feb 19, 2020
Here is my documentation for why issue qiime2#155(qiime2#155) is not accomplishable at this time. Issue qiime2#77 (qiime2#77) was fixed by removing -num_threads as it was ignored when subject was specified. Since num_threads is currently not a variable it can not be exposed.
@cherman2
Copy link
Collaborator

@gregcaporaso and @thermokarst. I created a fork of _blast.py with my proposed documentation of the issue. Let me know if there is anything else I need to do on my end!

@thermokarst
Copy link
Contributor

Thanks @cherman2!

Please open up a GitHub Pull Request (second ref) to get your changes ready for review.

Thanks!

@cherman2
Copy link
Collaborator

Done! @thermokarst

@MaestSi
Copy link

MaestSi commented Feb 28, 2020

Hi, is there any chance you could wrap makeblastdb, so that classify-consensus-blast first creates a blast-indexed database starting from the fasta file included in the FeatureData[Sequence] artifact, and then performs the alignment (specifying -db instead of -subject), so that it can be run with multi-threading?
Thanks,
Simone

@thermokarst
Copy link
Contributor

Closed in #157

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
diff:1|beginner Only limited knowledge of the languages and platform is required. good first issue Good for newcomers help wanted Extra attention is needed scope:0|this-project No other repositories are impacted. src:forum From the QIIME 2 Forum. time:1|short Should not take very much time to complete (assuming familiarity).
Projects
None yet
Development

No branches or pull requests

6 participants