Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set blastn threshold based on the number of cpu counts #62

Closed

Conversation

LuisFF
Copy link

@LuisFF LuisFF commented Jul 8, 2021

Hi,

I noticed that blastn, in particular v2.7.1, will fail to run if the number of threads requested is bigger than the number of CPUs available in a machine.

stderr:

Traceback (most recent call last):
  File "/usr/local/bin/bakta", line 11, in <module>
    load_entry_point('bakta==1.0.4', 'console_scripts', 'bakta')()
  File "/usr/local/lib/python3.8/dist-packages/bakta-1.0.4-py3.8.egg/bakta/main.py", line 372, in main
    oriCs = ori.predict_oris(genome, contigs_path, bc.FEATURE_ORIC)
  File "/usr/local/lib/python3.8/dist-packages/bakta-1.0.4-py3.8.egg/bakta/features/ori.py", line 39, in predict_oris
    raise Exception(f'blastn error! error code: {proc.returncode}')
Exception: blastn error! error code: 1

log file:

15:27:23.677 - DEBUG - ORI - cmd=['blastn', '-query', '/db/oric.fna', '-subject', '/tmp/tmpmho32og9/contigs.fna', '-culling_limit', '1', '-evalue', '1E-5', '-num_threads', '8', 
'-outfmt', '6 qseqid qstart qend qlen sseqid sstart send length nident sstrand', '-out', '/tmp/tmpmho32og9/ori.blastn.tsv']
15:27:23.910 - DEBUG - ORI - stdout='', stderr='USAGE
  blastn [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-negative_seqidlist filename]
    [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]
    [-db_hard_mask filtering_algorithm] [-subject subject_input_file]
    [-subject_loc range] [-query input_file] [-out output_file]
    [-evalue evalue] [-word_size int_value] [-gapopen open_penalty]
    [-gapextend extend_penalty] [-perc_identity float_value]
    [-qcov_hsp_perc float_value] [-max_hsps int_value]
    [-xdrop_ungap float_value] [-xdrop_gap float_value]
    [-xdrop_gap_final float_value] [-searchsp int_value]
    [-sum_stats bool_value] [-penalty penalty] [-reward reward] [-no_greedy]
    [-min_raw_gapped_score int_value] [-template_type type]
    [-template_length int_value] [-dust DUST_options]
    [-filtering_db filtering_database]
    [-window_masker_taxid window_masker_taxid]
    [-window_masker_db window_masker_db] [-soft_masking soft_masking]
    [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
    [-best_hit_score_edge float_value] [-window_size int_value]
    [-off_diagonal_range int_value] [-use_index boolean] [-index_name string]
    [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]
    [-outfmt format] [-show_gis] [-num_descriptions int_value]
    [-num_alignments int_value] [-line_length line_length] [-html]
    [-max_target_seqs num_sequences] [-num_threads int_value] [-remote]
    [-version]

DESCRIPTION
   Nucleotide-Nucleotide BLAST 2.7.1+

Use '-help' to print detailed descriptions of command line arguments
========================================================================

Error: Argument "num_threads". Illegal value, expected (>=1 and =<4):  `8'
Error:  (CArgException::eConstraint) Argument "num_threads". Illegal value, expected (>=1 and =<4):  `8'
'
15:27:23.910 - WARNING - ORI - oriC failed! blastn-error-code=1

I wonder if it would be okay to add a mechanism to prevent this from happening by having a maximum threshold value that can be passed on to blastn based on the number available CPUs. Other tools used in bakta like diamond seem to handle this situation much better.

Cheers,
Luis

@oschwengers
Copy link
Owner

Hi @LuisFF, thanks a lot for reporting and many thanks for this PR.
As this could potentially affect all 3rd party tools, I tend to catch this situation further upstream in the arg parsing & configuration. It's not much of an issue and I'll fix that as soon as possible after vacation ;-)
I'll leave this PR open until this is fixed and refer to it, accordingly.
Thanks again for bringing this up!

@oschwengers oschwengers added the bug Something isn't working label Jul 9, 2021
@oschwengers oschwengers self-assigned this Jul 9, 2021
@LuisFF
Copy link
Author

LuisFF commented Jul 9, 2021

Hi @oschwengers, thanks for the quick response.

Another suggestion from my side is to allow the users to specify an alternative location for AMRFinderPlus databases. This would be extremely helpful when working with container environments. I posted this idea in the discussions https://github.com/oschwengers/bakta/discussions/64 .

Enjoy your vacations!

@oschwengers
Copy link
Owner

oschwengers commented Jul 14, 2021

Hi @LuisFF ,
this should be solved by #65. Thank you very much again bringing this up and reporting!
I'll address the AMRFinderPlus db path idea later and close this by now.

@LuisFF LuisFF deleted the feature/set_blastn_thread_threshold branch October 18, 2022 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants