We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The two fasta files depicted below are identical except for the deflines:
pass.fasta
>zzsomething MPELRRVLANGVELNVALCGSGPAVLLLHGFPHTWELWTDVMADLSGRYRVIAPDLRGFGASGRAASGYDAGTLAEDAAALLAALGVSSATVVGIDAGTAPAFLLALRHPGLVRRLVVMESLLGRLPGAEDFLAEGPPWWFGFHSAAPSLAETVLEGHEAAYVDWFLSAGTLGDGVRPALRDAFVRAYTGRQALSCAFSYYRALPKSAVQIEQAVATARLTVPTMALGARPVGAALERQLRPVTDDLTGHVIDDCGHIIPLHRPHALLALLHPFLAGEDAKAA >zzsomethingelse MPELRRVLANGVELNVALCGSGPAVLLLHGFPHTWELWTDVMADLSGRYRVIAPDLRGFGASGRAASGYDAGTLAEDAAALLAALGVSSATVVGIDAGTAPAFLLALRHPGLVRRLVVMESLLGRLPGAEDFLAEGPPWWFGFHSAAPSLAETVLEGHEAAYVDWFLSAGTLGDGVRPALRDAFVRAYTGRQALSCAFSYYRALPKSAVQIEQAVATARLTVPTMALGARPVGAALERQLRPVTDDLTGHVIDDCGHIIPLHRPHALLALLHPFLAGEDAKAA
fail.fasta
>ucsomething MPELRRVLANGVELNVALCGSGPAVLLLHGFPHTWELWTDVMADLSGRYRVIAPDLRGFGASGRAASGYDAGTLAEDAAALLAALGVSSATVVGIDAGTAPAFLLALRHPGLVRRLVVMESLLGRLPGAEDFLAEGPPWWFGFHSAAPSLAETVLEGHEAAYVDWFLSAGTLGDGVRPALRDAFVRAYTGRQALSCAFSYYRALPKSAVQIEQAVATARLTVPTMALGARPVGAALERQLRPVTDDLTGHVIDDCGHIIPLHRPHALLALLHPFLAGEDAKAA >ucsomethingelse MPELRRVLANGVELNVALCGSGPAVLLLHGFPHTWELWTDVMADLSGRYRVIAPDLRGFGASGRAASGYDAGTLAEDAAALLAALGVSSATVVGIDAGTAPAFLLALRHPGLVRRLVVMESLLGRLPGAEDFLAEGPPWWFGFHSAAPSLAETVLEGHEAAYVDWFLSAGTLGDGVRPALRDAFVRAYTGRQALSCAFSYYRALPKSAVQIEQAVATARLTVPTMALGARPVGAALERQLRPVTDDLTGHVIDDCGHIIPLHRPHALLALLHPFLAGEDAKAA
Running easy-cluster on these two files:
# rm was run, but commented out for placing on github # rm -rf ./tmp mmseqs \ easy-cluster \ fail.fasta \ 'mmseqs2_fail' \ ./tmp \ --threads 24 # rm was run, but commented out for placing on github # rm -rf ./tmp mmseqs \ easy-cluster \ pass.fasta \ 'mmseqs2_pass' \ ./tmp \ --threads 24
results in the correct output for mmseqs2_pass_cluster.tsv:
mmseqs2_pass_cluster.tsv
zzsomethingelse zzsomethingelse zzsomethingelse zzsomething
but removes the 'uc' from the defline in mmseqs2_fail_cluster.tsv
mmseqs2_fail_cluster.tsv
somethingelse somethingelse somethingelse something
This seems to be the case for any deflines that start with 'uc'
The FASTA files also have duplicate defline entries, where one of the duplicates doesn't contain a sequence:
mmseqs2_fail_all_seqs.fasta
>somethingelse >ucsomethingelse MPELRRVLANGVELNVALCGSGPAVLLLHGFPHTWELWTDVMADLSGRYRVIAPDLRGFGASGRAASGYDAGTLAEDAAALLAALGVSSATVVGIDAGTAPAFLLALRHPGLVRRLVVMESLLGRLPGAEDFLAEGPPWWFGFHSAAPSLAETVLEGHEAAYVDWFLSAGTLGDGVRPALRDAFVRAYTGRQALSCAFSYYRALPKSAVQIEQAVATARLTVPTMALGARPVGAALERQLRPVTDDLTGHVIDDCGHIIPLHRPHALLALLHPFLAGEDAKAA >ucsomething MPELRRVLANGVELNVALCGSGPAVLLLHGFPHTWELWTDVMADLSGRYRVIAPDLRGFGASGRAASGYDAGTLAEDAAALLAALGVSSATVVGIDAGTAPAFLLALRHPGLVRRLVVMESLLGRLPGAEDFLAEGPPWWFGFHSAAPSLAETVLEGHEAAYVDWFLSAGTLGDGVRPALRDAFVRAYTGRQALSCAFSYYRALPKSAVQIEQAVATARLTVPTMALGARPVGAALERQLRPVTDDLTGHVIDDCGHIIPLHRPHALLALLHPFLAGEDAKAA
mmseqs2_pass_all_seqs.fasta
>zzsomethingelse >zzsomethingelse MPELRRVLANGVELNVALCGSGPAVLLLHGFPHTWELWTDVMADLSGRYRVIAPDLRGFGASGRAASGYDAGTLAEDAAALLAALGVSSATVVGIDAGTAPAFLLALRHPGLVRRLVVMESLLGRLPGAEDFLAEGPPWWFGFHSAAPSLAETVLEGHEAAYVDWFLSAGTLGDGVRPALRDAFVRAYTGRQALSCAFSYYRALPKSAVQIEQAVATARLTVPTMALGARPVGAALERQLRPVTDDLTGHVIDDCGHIIPLHRPHALLALLHPFLAGEDAKAA >zzsomething MPELRRVLANGVELNVALCGSGPAVLLLHGFPHTWELWTDVMADLSGRYRVIAPDLRGFGASGRAASGYDAGTLAEDAAALLAALGVSSATVVGIDAGTAPAFLLALRHPGLVRRLVVMESLLGRLPGAEDFLAEGPPWWFGFHSAAPSLAETVLEGHEAAYVDWFLSAGTLGDGVRPALRDAFVRAYTGRQALSCAFSYYRALPKSAVQIEQAVATARLTVPTMALGARPVGAALERQLRPVTDDLTGHVIDDCGHIIPLHRPHALLALLHPFLAGEDAKAA
https://gist.github.com/chasemc/c0cccd804ac0ff78291e43ae10837c42 https://gist.github.com/chasemc/d8157a581c833406f15442e8b9ee4e81
Conda installed: MMseqs2 Version: 13.45111 Happy to give system info if needed
The text was updated successfully, but these errors were encountered:
This is beyond me but it seems is might stem from here:
MMseqs2/src/commons/Util.cpp
Line 131 in f651879
Is there an option to skip checking/removing these identifiers?
Sorry, something went wrong.
I'll update #565 when we have a solution.
No branches or pull requests
Expected Behavior
The two fasta files depicted below are identical except for the deflines:
pass.fasta
fail.fasta
Current Behavior / Steps to Reproduce (for bugs)
Running easy-cluster on these two files:
results in the correct output for
mmseqs2_pass_cluster.tsv
:but removes the 'uc' from the defline in
mmseqs2_fail_cluster.tsv
This seems to be the case for any deflines that start with 'uc'
The FASTA files also have duplicate defline entries, where one of the duplicates doesn't contain a sequence:
mmseqs2_fail_all_seqs.fasta
mmseqs2_pass_all_seqs.fasta
MMseqs Output (for bugs)
https://gist.github.com/chasemc/c0cccd804ac0ff78291e43ae10837c42
https://gist.github.com/chasemc/d8157a581c833406f15442e8b9ee4e81
Your Environment
Conda installed: MMseqs2 Version: 13.45111
Happy to give system info if needed
The text was updated successfully, but these errors were encountered: