New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
too few hits #454
Comments
We don't (/can't) generate similar k-mers for nucleotide searches, so the sensitivity parameter doesn't really affect anything in this case. You could try reducing the k-mer size a bit, that might result in more hits getting passed through the prefilter. |
Thanks a lot for your response. I tried reducing with "-k 5" which improved a bit and then got an error when trying "-k 4". |
Sorry you are right, I misread the initial thread. Maybe try reducing the |
My query is a reverse transcriptase (RT) domain. For instance
My target is a plant genome. The genome is too big to share (1Gb). It is from here https://www.ncbi.nlm.nih.gov/Traces/wgs/?val=APLD01. |
|
I get over 37k hits, but I also split the genome db creation from the search:
|
OK, maybe I over simplified the question because I was actually counting merged hits with a minimum size of 520 nt (empirical, a bit less than RT size). |
The ORF extraction step might be a bit too naive to deal well with a plant. I guess BLAST is able to extract longer fragments for some reason, but I don't know how. |
Thanks a lot for your help, see you soon with another question, I am exploring still |
Dear Soeding lab members,
I was trying to use mmseqs for tblastn-like analysis.
With a given aa query db and nt targetdb, I have results of blastall and blast+ giving approx 11k and 14k hits (evalue 0.001), respectively.
Aiming at speeding this process, I wanted to use easy-search.
At first, I was getting only about 500-700 hits with default settings and varying sensitivity.
I then tried out the "--max-seqs" option
--max-seqs INT Maximum results per query sequence allowed to pass the prefilter (affects sensitivity) [0.000]
It seems that by default the max seq number is set to 300. When I switched this to --max-seqs 100000, I am now getting 2.5K hits.
mmseqs easy-search prt.fa genome.fa out /tmp/ -s 7 --max-seqs 100000
That's much better but still far from blast.
Would you have any suggestion to address this discrepancy?
Thanks a lot for your help !
The text was updated successfully, but these errors were encountered: