Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error #8

Closed
mvtejesvi opened this issue Sep 27, 2021 · 4 comments
Closed

Error #8

mvtejesvi opened this issue Sep 27, 2021 · 4 comments

Comments

@mvtejesvi
Copy link

snakemake: error: unrecognized arguments: --threads=8 --mem=60 --large_mem=250 --large_threads=8 --assembly_threads=8 --assembly_memory=250 --tmpdir=/local_scratch/student198 --database_dir=/scratch/project_2004930/databases --data_type=metagenome --interleaved_fastqs=False --deduplicate=True --duplicates_only_optical=False --duplicates_allow_substitutions=2 --preprocess_adapters=/scratch/project_2004930/databases/adapters.fa --preprocess_minimum_base_quality=10 --preprocess_minimum_passing_read_length=51 --preprocess_minimum_base_frequency=0.05 --preprocess_adapter_min_k=8 --preprocess_allowable_kmer_mismatches=1 --preprocess_reference_kmer_match_length=27 --error_correction_overlapping_pairs=True --contaminant_max_indel=20 --contaminant_min_ratio=0.65 --contaminant_kmer_length=13 --contaminant_minimum_hits=1 --contaminant_ambiguous=best --error_correction_before_assembly=True --merge_pairs_before_assembly=True --merging_k=62 --merging_extend2=40 --merging_flags=ecct iterations=5 --assembler=spades --megahit_min_count=2 --megahit_k_min=21 --megahit_k_max=121 --megahit_k_step=20 --megahit_merge_level=20,0.98 --megahit_prune_level=2 --megahit_low_local_ratio=0.2 --megahit_preset=default --spades_skip_BayesHammer=True --spades_use_scaffolds=True --spades_k=auto --spades_preset=meta --spades_extra= --longread_type=none --filter_contigs=True --prefilter_minimum_contig_length=300 --contig_trim_bp=0 --minimum_average_coverage=1 --minimum_percent_covered_bases=20 --minimum_mapped_reads=0 --minimum_contig_length=500 --contig_min_id=0.9 --contig_map_paired_only=True --contig_max_distance_between_pairs=1000 --maximum_counted_map_sites=10 --final_binner=DASTool --binner=metabat --binner=maxbin --metabat={'sensitivity': 'sensitive', 'min_contig_length': 1500} --maxbin={'max_iteration': 50, 'prob_threshold': 0.9, 'min_contig_length': 1000} --DASTool={'search_engine': 'diamond', 'score_threshold': 0.5} --genome_dereplication={'ANI': 0.95, 'overlap': 0.6, 'opt_parameters': '', 'filter': {'noFilter': False, 'length': 5000, 'completeness': 50, 'contamination': 10}, 'score': {'completeness': 1, 'contamination': 5, 'N50': 0.5, 'length': 0}} --rename_mags_contigs=True --annotations=gtdb_tree --annotations=gtdb_taxonomy --annotations=genes --genecatalog={'source': 'contigs', 'clustermethod': 'linclust', 'minlength_nt': 100, 'minid': 0.95, 'coverage': 0.9, 'extra': '', 'SubsetSize': 500000} --eggNOG_use_virtual_disk=False --virtual_disk=/dev/shm assembly
[2021-09-27 14:20 CRITICAL] Command 'snakemake --snakefile /scratch/project_2004930/miniconda3/envs/atlasenv/lib/python3.8/site-packages/atlas/Snakefile --directory /users/student198/First_Run --jobs 40 --rerun-incomplete --configfile '/users/student198/First_Run/config.yaml' --nolock --profile cluster --use-conda --conda-prefix /scratch/project_2004930/databases/conda_envs --scheduler greedy assembly ' returned non-zero exit status 2.
(atlasenv) [student198@puhti-login1 First_Run]$

@SilasK
Copy link
Member

SilasK commented Sep 27, 2021

Can you send me the content of ~/.config/snakemake/cluster/config.yaml

And ~/.config/snakemake/cluster/cluster_config.yaml

@mvtejesvi
Copy link
Author

###################################################################

_______ _ _____

/\ |__ __| | | /\ / ____|

/ \ | | | | / \ | (___

/ /\ \ | | | | / /\ \ ___ \

/ ____ \ | | | |____ / ____ \ ____) |

// _\ || || // _\ |/

###################################################################

For more details about the config values see:

https://metagenome-atlas.rtfd.io

########################

Execution parameters

########################

threads and memory (GB) for most jobs especially from BBtools, which are memory demanding

threads: 8
mem: 60

threads and memory for jobs needing high amount of memory. e.g GTDB-tk,checkm or assembly

large_mem: 250
large_threads: 8
assembly_threads: 8
assembly_memory: 250

#Runtime only for cluster execution
runtime: #in h
default: 5
assembly: 48
long: 24

Local directory for temp files, useful for cluster execution without shared file system

tmpdir: /local_scratch/student198

directory where databases are downloaded with 'atlas download'

database_dir: /scratch/project_2004930/databases

########################

Quality control

########################
data_type: metagenome # metagenome or metatranscriptome
interleaved_fastqs: false

remove (PCR)-duplicated reads using clumpify

deduplicate: true
duplicates_only_optical: false
duplicates_allow_substitutions: 2

used to trim adapters from reads and read ends

preprocess_adapters: /scratch/project_2004930/databases/adapters.fa
preprocess_minimum_base_quality: 10
preprocess_minimum_passing_read_length: 51

0.05 requires at least 5 percent of each nucleotide per sequence

preprocess_minimum_base_frequency: 0.05
preprocess_adapter_min_k: 8
preprocess_allowable_kmer_mismatches: 1
preprocess_reference_kmer_match_length: 27

error correction where PE reads overlap

error_correction_overlapping_pairs: true
#contamination references can be added such that -- key: /path/to/fasta
contaminant_references:
PhiX: /scratch/project_2004930/databases/phiX174_virus.fa
human: /scratch/project_2004930/databases/human_genome.fasta
contaminant_max_indel: 20
contaminant_min_ratio: 0.65
contaminant_kmer_length: 13
contaminant_minimum_hits: 1
contaminant_ambiguous: best

########################

Pre-assembly-processing

########################

error_correction_before_assembly: true

join R1 and R2 at overlap; unjoined reads are still utilized

merge_pairs_before_assembly: true
merging_k: 62

extend reads while merging to this many nucleotides

merging_extend2: 40

Iterations are performed until extend2 x iterations

merging_flags: ecct iterations=5

########################

Assembly

########################

megahit OR spades

assembler: spades

Megahit

#-----------

2 is for metagenomes, 3 for genomes with 30x coverage

megahit_min_count: 2
megahit_k_min: 21
megahit_k_max: 121
megahit_k_step: 20
megahit_merge_level: 20,0.98
megahit_prune_level: 2
megahit_low_local_ratio: 0.2

['default','meta-large','meta-sensitive']

megahit_preset: default

Spades

#------------
spades_skip_BayesHammer: true
spades_use_scaffolds: true # if false use contigs
#Comma-separated list of k-mer sizes to be used (all values must be odd, less than 128 and listed in ascending order).
spades_k: auto
spades_preset: meta # meta, ,normal, rna single end libraries doesn't work for metaspades
spades_extra: ''
longread_type: none # [none,"pacbio", "nanopore", "sanger", "trusted-contigs", "untrusted-contigs"]

Preprocessed long reads can be defined in the sample table with 'longreads' , for more info see the spades manual

Filtering

#------------

filter out assembled noise

this is more important for assemblys from megahit

filter_contigs: true
prefilter_minimum_contig_length: 300

trim contig tips

contig_trim_bp: 0

require contigs to have read support

minimum_average_coverage: 1
minimum_percent_covered_bases: 20
minimum_mapped_reads: 0

after filtering

minimum_contig_length: 500

########################

Quantification

########################

Mapping reads to contigs

#--------------------------
contig_min_id: 0.9
contig_map_paired_only: true
contig_max_distance_between_pairs: 1000
maximum_counted_map_sites: 10

########################

Binning

########################

final_binner: DASTool # [DASTool or one of the binner, e.g. maxbin]

binner: # If DASTool is used as final_binner, use predictions of this binners

  • metabat
  • maxbin

metabat:
sensitivity: sensitive
min_contig_length: 1500 # metabat needs >1500

maxbin:
max_iteration: 50
prob_threshold: 0.9
min_contig_length: 1000

DASTool:
search_engine: diamond
score_threshold: 0.5 #Score threshold until selection algorithm will keep selecting bins [0..1].

genome_dereplication:
ANI: 0.95
overlap: 0.6
opt_parameters: ''
filter:
noFilter: false
length: 5000
completeness: 50
contamination: 10
score:
completeness: 1
contamination: 5
N50: 0.5
length: 0

rename_mags_contigs: true #Rename contigs of representative MAGs

########################

Annotations

#######################

annotations:

  • gtdb_tree
  • gtdb_taxonomy
  • genes

- checkm_taxonomy

- checkm_tree

########################

Gene catalog

#######################
genecatalog:
source: contigs # [contigs, genomes] Predict genes from all contigs or only from the representative genomes
clustermethod: linclust # [cd-hit-est or mmseqs or linclust] see mmseqs for more details
minlength_nt: 100
minid: 0.95 # min id for gene clustering for the main gene catalog used for annotation
coverage: 0.9
extra: ''
SubsetSize: 500000

eggNOG_use_virtual_disk: false # coping the eggNOG DB to a virtual disk can sppeed up the annotation
virtual_disk: /dev/shm # But you need 37G extra ram

@mvtejesvi
Copy link
Author

This is a yaml file, defining options for specific rules or by default.

The '#' defines a comment.

the two spaces at the beginning of rows below rulenames are important.

## For more information see https://snakemake.readthedocs.io/en/stable/executing/cluster-cloud.html#cluster-execution

Overwrite/Define arguments for all rules

default:
account: project_2004930

rename_contigs:
threads: 2

queue: normal

You can overwrite values for specific rules

rulename:
queue: long
account: ""
time_min: # min
threads:

@preckrasna
Copy link

What is this error about?

snakemake.exceptions.WorkflowError: Config file is not valid JSON or YAML. In case of YAML, make sure to not mix whitespace and tab indentation.
[2021-09-27 15:12 CRITICAL] Command 'snakemake --snakefile /scratch/project_2004930/miniconda3/envs/atlasenv/lib/python3.8/site-packages/atlas/Snakefile --directory /users/student215/First_Run --jobs 40 --rerun-incomplete --configfile '/users/student215/First_Run/config.yaml' --nolock --profile cluster --use-conda --conda-prefix /scratch/project_2004930/databases/conda_envs --scheduler greedy assembly ' returned non-zero exit status 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants