Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with clustering of fasta consisting of duplicates #180

Closed
matveykolesnik opened this issue Mar 28, 2019 · 1 comment
Closed

Problem with clustering of fasta consisting of duplicates #180

matveykolesnik opened this issue Mar 28, 2019 · 1 comment

Comments

@matveykolesnik
Copy link

Expected Behavior

Producing a cluster

Current Behavior

Error of clustering

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
mmseqs createdb test.faa test.db
mmseqs cluster test.db test.clu tmp --min-seq-id 0.9
test.faa is attached in the gist

MMseqs Output (for bugs)

Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.
https://gist.github.com/matveykolesnik/43a90e7404e11881c29e2c80d79c5fec

Context

Providing context helps us come up with a solution and improve our documentation for the future.
I am using mmseq2 for clustering of protein sequences in automatic pipeline, and I faced with the problem when mmseq2 fails to process fasta files that consist of repeated same sequences, like this:

record1
MALYNISEKILTTLEKTSFTIERLQERYDLQEAIKKNIDIVAPGCLVISEEFSDWEDSRR
record2
MALYNISEKILTTLEKTSFTIERLQERYDLQEAIKKNIDIVAPGCLVISEEFSDWEDSRR
...
Perhaps it would be worth to add handling of such fasta files.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
  • Operating system and version:
@martin-steinegger
Copy link
Member

Same error as #181

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants