Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom protein database #56

Open
shlomobl opened this issue Jun 16, 2022 · 4 comments
Open

custom protein database #56

shlomobl opened this issue Jun 16, 2022 · 4 comments
Labels
enhancement New feature or request upgrade required

Comments

@shlomobl
Copy link

Hi,
I am analyzing multiple bacterial genomes with very little programing knowledge. The way tormes parses and summarizes the results from all the genomes in tabular files is very helpful!
Tormes has now an option to query the genomes with a custom nucleotide database. But what I have is a protein database... is there anyway to do this with tormes? Any other suggestion? In the end, I'd really need a genome X protein sort of table...
Thanks!

@nmquijada
Copy link
Owner

Hi @shlomobl

I am afraid that in the current version of tormes, only custom nucleotide databases for gene search are possible as an integrated option. We have included the chance of custom amino acid database searches in the ongoing development version of the tool, that we hope to release after summer. I will keep you posted.

In the meantime, if you want to use an amino acid database I can guide you to do so by using blastp and by taking advantage of tormes hierarchy of files. Would that be an option for you?
The predicted proteins of your genomes would be in the gene_prediction or annotation directories (depending the option you used for run the pipeline)

Additionally, you can add those proteins to the database that is used for annotation with prokka and to look for them in the annotation results.

@shlomobl
Copy link
Author

Hi,
Yes, please, I appreciate it!
Especially if results can be summarized in a presence/absence table with all genomes, similar to VFs/AMR.
I guess it is easier to generate a table from BLAST than by adding these genes to annotation?
Thanks!
S.

@nmquijada
Copy link
Owner

Hi @shlomobl

Sorry for the late reply. Both doing a BLAST or adding the proteins to the annotation files for the analyses are straightforward processes. However, from the latter you might retrieve back the information from the genes you are looking for.

If you would like the results to appear in the tormes report, it would require some expertise with r-markdown language, which is the one used for the generation of that report. If you don't have experience with this, I would encourage you to wait a bit until we release the next version, which will allow the usage of protein databases for direct "blasting".

In the meantime, if you would like to look for some proteins in your dataset with BLAST, you need to make a blast-formatted database first:

makeblastdb -in my_proteins.faa -title my_prot -out my_db/my_prot -dbtype prot -hash_index

Then, you can run BLASTP over the predicted protein file performed by prodigal (and/or annotated with prokka). For instance:

blastp -query tormes_output/annotation/genome_01_annotation/genome_01.faa -db my_db/my_prot -out blastp_output.txt -max_target_seqs 1000 -culling_limit <culling limit to be used (>1)> -evalue 1e-25 -num_threads <num of CPUs> -outfmt "6 qseqid sseqid length qstart qend sstart send mismatch gaps pident evalue bitscore slen"

#you can add a header to the file with the description of the fields, for instance:
sed -i "qseqid\tsseqid\tlength\tqstart\tqend\tsstart\tsend\tmismatch\tgaps\tpident\tevalue\tbitscore\tslen" blastp_output.txt

As I said, I hope we can release the next version soon.
I hope this helps in the meantime and you can do some searches of proteins of your interest!

Best,
Narciso

@shlomobl
Copy link
Author

shlomobl commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request upgrade required
Projects
None yet
Development

No branches or pull requests

2 participants