Skip to content

2. Command line option

Jeremy edited this page Mar 31, 2025 · 10 revisions

General parameters

  • --help

Show help.

  • -w / --workdir <path>, default: /path/to/lagoon-mcl/wordir

Path to pipeline working directory.

  • --projectName <str>, default: lagoon-mcl

Project name. Used to name the working directory.

  • outdir <path>, default: results

Directory containing LAGOON-MCL results.

Ressources

  • --max_cpus <int>, default: 200

Maximum number of CPUs that can be used by a process. \ If the maximum CPU is exceeded, pipeline execution is killed.

  • --max_memory <flaot.GB>, default: 750.GB

Nombre de RAM maximum pouvant être utilisé par un process
If the maximum RAM is exceeded, pipeline execution is killed.

  • --max_time <int.h>, default: 350.h

Maximum execution time for a process.
If the maximum time is exceeded, pipeline execution is killed.

  • -resume

To be specified when restarting the pipeline if there was a problem during execution. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it stopped.

Input parameter

Fasta files

  • --fasta <file>, default: null

Path to fasta files. \ If several fasta files are used, the quotes " are mandatory.

--fasta "path/to/your/fasta/*.fasta"

Annotation files

  • --annotation, default: ${projectDir}/data-test/annotationsheet.csv

If you provide annotation files (functions, taxonomies, ...), they can be given to workflows with this parameter. A file must be supplied for each annotation type, and each file must contain two columns, the first column containing the sequence identifier and the second column the annotation (e.g. Pfam).

--annotation_files 

ℹ️ See Input files for more information on the format of the file.

Scan database

Pfam

If you don't have a functional annotation, LAGOON-MCL can scan the Pfam database with MMseqs2.

  • --scan_pfam, default: true

True: LAGOON-MCL uses MMseqs2 to scan Pfam. False: LAGOON-MCL does not scan Pfam.

  • --pfam_path, default: ${projectDir}/database/pfamDB

Path to Pfam database. It must have been created and indexed with MMseqs2.

  • pfam_name, default: pfamDB

Database name.

AlphaFold database

  • --alphafold_path, default: ${projectDir}/database/alphafoldDB

  • --alphafold_name, default: alphafoldDB

  • --uniprot, default: ${projectDir}/database/uniprot_function.json

Sequence Similarity Network

Sequences are aligned using Diamond BLASTp.

  • --alignment_file <file>, default: null

If you already have an alignment file for your sequences, you can specify it with this command. If this option is not null, then BLASTp alignment will not be performed.

For the file format, please refer to the input section.

--alignment_file path/to/your/alignment/alignment.tsv
  • --sensitivity, default: very-sensitive

Alignment sensitivity mode. You can choose between: fast, mid-sensitive, sensitive, more-sensitive, very-sensitive, ultra-sensitive.

  • --matrix, default: BLOSUM62

Score matrix. You can use: BLOSUM45, BLOSUM50, BLOSUM80, BLOSUM90, PAM250, PAM70, PAM30.

  • --diamond_evalue, default: 0.001

Maximum expected value to report an alignment

ℹ️ For more information on how to use Diamond BLASTp's specific parameters, please consult the documentation.

Clustering

The network is clustered using the Markov CLustering algorithm.

  • --I <list>, default: 1.4,2

The inflation parameter is MCL's main means of influencing clustering granularity. It is generally chosen in a range between 1.2 and 5.0. 1.2 will produce coarse clustering. 5.0 will produce fine clustering.

  • --max_weight, default: 200

The evalue is used as a weight for the edges. It is transformed into logarithm base 1 to negative. This parameter can be used to set a maximum weight.

  • --cluster_size, default: 3

Keeps only clusters with a minimum size.

ℹ️ For more information on how to use MCL, please consult the documentation.

Clone this wiki locally