Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BlastN "no high coverage hits" #22

Closed
M-K1 opened this issue Jan 12, 2022 · 2 comments
Closed

BlastN "no high coverage hits" #22

M-K1 opened this issue Jan 12, 2022 · 2 comments

Comments

@M-K1
Copy link

M-K1 commented Jan 12, 2022

Hello Mike,

I'm using Cenote-Taker2 to identify viral contig and detect certain virus species. I've been working with your test data in order to see if I could get your tool working on the server but have one problem with the results. With the default parameters, as you advised on the wiki, the organism name is something I can't really work with. You provide the option to perform blastn to get a more specific result, but when I look at these results, the blast result is always "no high coverage hits" when using my own data or your provided test data. I've looked into this problem and came across issue #15 but still couldn't get BLASTN_INFO to display anything else besides the aforementioned result. I've read in your paper that the pipeline:

marks contigs with at least 90 per cent average nucleotide identity to existing database entries.

Looking at the blastn results in the intermediate files only shows % identities over 90%, so I am wondering whether I am doing something wrong. Could you elaborate on how Cenote-Taker2 uses blastn?

My command
python run_cenote-taker2.py -c testcontigs_DNA_ct2.fasta -r test_DNA_ct_3 -p True -m 16 -t 16 --known_strains blast_knowns --blastn_db /lustre/BIF/nobackup/kon001/thesis/Databases/NCBI_NT/nt | tee test_DNA_ct_3_output.log

Log file
test_DNA_ct_3_output.log

Thx in advance,

Matthijs

@mtisza1
Copy link
Owner

mtisza1 commented Jan 28, 2022

Hi Matthijs,

Thank you for opening this issue.

First, let me apologize for the delay in replying. I've been extremely busy lately, and I've had to decide to not reply to Cenote-Taker 2 issues temporarily. I will be "back" to quick responses and updates(!) at the end of February.

I looked at your log and I can't see anything funny going on. Based on what you said, BLASTN ran and produced the appropriate alignments. I think the "no high coverage hits" could be occurring if your installation of the krona databases didn't work or if efetch is not properly connecting to the NCBI server.

To check if the krona database is installed, activate the cenote-taker2_env and find any file ending in .blastn_intraspecific.out (e.g. in the DTR_contigs_with_viral_domain of your output). Input a command like so:

ktClassifyBLAST -o test1.tab test_blastn_1ct2.blastn_intraspecific.out

If this doesn't work, you'll have to install/update the krona databases. Change to the main Cenote-Taker2 directory and use these commands (This requires at least 4 CPUs for some reason and will take 20-40 minutes, so please have those resources available):

KRONA_DIR=$( which python | sed 's/bin\/python/opt\/krona/g' )
cd ${KRONA_DIR}
sh updateTaxonomy.sh
cd ${KRONA_DIR}
sh updateAccessions.sh

To check efetch, activate the cenote-taker2_env and input this command:

efetch -db taxonomy -id 133704 -format xml | xtract -pattern Taxon -block "*/Taxon" -tab "\n" -element TaxId,ScientificName,Rank

Other explanations are possible, however, and you can email a compressed file of the output directory of the test run to inspect.

best,

Mike

@M-K1
Copy link
Author

M-K1 commented Feb 17, 2022

Hey Mike,

Thanks for your advise, running updateTaxonomy.sh and updateAccessions.sh of the cenote-taker2 conda environment allowed me to correctly get BLASTn outputs. Another thing I'm wondering is what the ORGANISM_NAME is based on, as I have searched the usual databases and I couldn't find a match. Can you tell me what these names are based on?

Thx,

Matthijs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants