Clustering using a batch system

Hi,

  I am trying to use MMseqs2 to cluster a large protein database, splitting the work into batch jobs. I followed the search example from https://github.com/soedinglab/mmseqs2/wiki#how-to-run-mmseqs2-on-multiple-servers-using-batch-systems, searching batches of the database against the whole database. Then I am trying to use search results to compute clusters with the `clust` subcommand. Here is my script:
```
$MMSEQS createdb $INFASTA $DB
$MMSEQS splitdb $DB ${DB}_split --split $NUM_SPLITS

for i in $(ls ${DB}_split_*_$NUM_SPLITS) ; do
      $MMSEQS search $i $DB ${i}_search tmp
done

$MMSEQS mergedbs ${DB}_split_0_${NUM_SPLITS}_search ${DB}_search $(awk 'BEGIN {for (i=1;i < '$NUM_SPLITS';i++) printf("'$DB'_split_%d_'$NUM_SPLITS'_search ", i);}')

$MMSEQS clust ${DB} ${DB}_search ${DB}_clust 
```
`mmseqs clust` gives `Sequence db size != result db size` error.

Is there a way to combine the search results into one results database or compute clusters for each of my database batch and merge them, or any other way do clustering on a batch system (without MPI)? 

## Your Environment
Linux CentOs.
MMseqs2 Release 14-7e284: https://github.com/soedinglab/MMseqs2/releases/download/14-7e284/mmseqs-linux-avx2.tar.gz


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clustering using a batch system #764

Your Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clustering using a batch system #764

Description

Your Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions