7. Output files

Table of content

Structure du répertoire
LAGOON-MCL output

Directory layout

Descriptions of the various files generated by LAGOON-MCL. Directory layout with results.

lagoon-mcl/
└── results/                                                            #
    ├── lagoon-mcl_output/                                              #
    │   │
    │   ├── abundance_matrix/                                           #
    │   │   └── network_I[inflation]_[annotation]_abundance_matrix.json #
    │   │
    │   ├── diamond/                                                    #
    │   │   ├── diamond_alignment.filter.tsv                            #
    │   │   ├── diamond_alignments.tsv                                  #
    │   │   ├── mmseqs2_alpahfold_clusters_alignments.m8                #
    │   │   ├── mmseqs2_alpahfold_clusters_alignments.selection.tsv     #
    │   │   └── mmseqs2_pfam_database_alignments.m8                    #
    │   │
    │   ├── network_I[inflation]                                        #
    │   │   ├── clusters/                                               #
    │   │   │   ├── network_I[inflation]_clusters_annotations.tsv       #
    │   │   │   └── network_I[inflation]_clusters_metrics.tsv           #
    │   │   │
    │   │   ├── edges/                                                  #
    │   │   │   └── network_I[inflation]_edges.tsv                      #
    │   │   │
    │   │   └── sequences/                                              #
    │   │       ├── network_I[inflation]_sequences_annotations.tsv      #
    │   │       └── network_I[inflation]_sequences_metrics.tsv          #
    │   │
    │   └── reports/                                                    #
    │       ├── network_I[inflation]_figures/                           #
    │       │   ├── clusters_caracteristics_[label].png                 #
    │       │   ├── clusters_metrics.png                                #
    │       │   ├── homogeneity_score_[label].png                       #
    │       │   ├── sequence_label_num_[label]_id.png                   #
    │       │   └── sequence_length_centrality.png                      #
    │       └── network_I[inflation]_report.html                        #
    │
    └── nextflow_reports/                                               #

LAGOON-MCL output

Abundance matrix

`network_I[inflation]_[annotation]_abundance_matrix.json`

An abundance file is generated for each annotation type and each inflation parameter. This file contains the abundance of annotations within each cluster.

{
    "0": {"A0A183PR16": 1, "A0A6G0J3D9": 1, "A0A388LF28": 2, "A0A812QWC6": 1, "A0A7J6Q934": 1, "A0A813FM05": 2, "A0A812WK55": 1, "U6N5Z9": 1, "A0A7J6PV29": 1, "A0A086KJH4": 1, "H2YF95": 1, "A0A813BA63": 1, "A0A813JSA0": 2, "A0A812TQ29": 2, "A0A0M0K2E2": 1, "U6GFX1": 1, "A0A0G4ECY9": 1, "A0A812K4G7": 1, "A0A812PKL1": 3, "A0A812VMD3": 1, "A0A812NTR3": 1, "A0A1Q9DHQ4": 1, "A0A183R3Q5": 1, "A0A7J6T9M1": 1, "A0A813I2X5": 1, "A0A553QIE6": 1, "A0CL99": 1, "A0A7J5ZD72": 1, "A0A6I9NLT8": 1, "A0A7J6T346": 1, "A0A3Q2Z2J4": 1, "A0A7J6RR91": 1, "A0A7J7XUX5": 1, "A0A7J6TQJ7": 1, "A0A7J6T231": 1, "A0CHQ5": 1, "A0A2T6J476": 1, "A0A813C3W0": 1, "A0A7R9I977": 1, "A0A7J5ZFK4": 1, "A0A086M5R1": 1, "A0A2D4BNR7": 1, "A0A7S4QQJ2": 1, "A0A812JA43": 1, "A0A6P6FF32": 1, "A0A813HBU8": 1, "A0A7S1PA39": 1, "A0A2G8YAX1": 1},
    "1": {"A0A0D2GK41": 1, "A0A1I8M4E7": 1, "A0A7S1A3C8": 1, "R9P6Z5": 1, "A0A846EAF9": 1, "A0A6G1CQL2": 1, "A0A1Q9DW77": 1, "A0A388KAS7": 2, "A0A7S3TGY1": 1, "G0R2F6": 1, "A0A250XJJ8": 1, "A0A507D151": 1, "A0A0L0FTB9": 1, "G0QSF0": 1, "A0A150G7W3": 1, "A0A7S1N0A9": 1, "A0A2H2I0E3": 1, "A0A0G4EL96": 1, "J9JBL8": 1, "A0A6P8V8G4": 1, "A0A813C6N7": 2, "G0R0T4": 1, "A0A2C6KY70": 1, "A0A150GXZ4": 1, "A0A4W3JHJ4": 1, "A0A2U9BJ95": 1, "A0A812SDT4": 1, "A0A182YG02": 1, "A0A812WWN6": 1, "I3S6R2": 1, "A0A0C4ER12": 1, "A0A1R2AW38": 1, "A0A812I5G3": 1, "C5KS90": 1, "A0A3P1BA92": 1, "A0A384K564": 1, "A0A3B3RIU9": 1, "A0A0P1A9W9": 1, "G0QN77": 1, "B2Q6C2": 1, "A0A0K0DJP8": 1, "A0A834RCZ5": 1, "A0A7S4SK49": 1, "G3B601": 1, "X6NRD4": 1, "A0A6P8NQ06": 1, "A0A812RYH2": 1, "Q238S0": 1, "A0A7S1RQD1": 1},
    "2": {"A0A068SF61": 7, "A0A833RLM5": 9, "F4RVK9": 1, "A0A098VSY4": 11, "A0A0L9UDJ6": 2, "A0A1Q3B7R3": 1, "A0A6A1VJU8": 1, "A0A0N5BXW6": 1, "J9LA01": 3, "B3SDD1": 1, "A0A183TTJ3": 4, "A0A6L2LKW4": 1, "A0A287E8N8": 1, "M8AH53": 1, "A0A1S8VI42": 2, "A0A4U5N899": 1, "A0A068XZD3": 1, "X1X2U2": 1},
    "3": {"A0A1Z5KAI0": 1, "A0A672PKK1": 1, "A0A4U5VA95": 1, "A0A2G2XRV4": 1, "A0A6P7ULQ5": 1, "H3AFC1": 2, "A0A814MWY7": 1, "A0A7M3QUJ9": 1, "C5KXZ8": 1, "A0A1Y2DIJ2": 1, "A0A7R8ZNR6": 1, "A0A3C1RZF5": 1, "A0A4V6XW70": 1, "A0A669CUH2": 1, "A0A016T7J7": 1, "A0A1Y1M3Z3": 1, "A0A812LTJ5": 1, "A0A7J6NWD6": 1, "A0A812YF94": 1, "A0A177U5G8": 3, "A0A7M3QCM4": 1, "A0A833P9X4": 1, "A0A812U720": 1, "B7GCE1": 2, "A0A812Z0K7": 1, "A0A7S3PJB6": 2, "A0A7M3Q470": 1, "A0A811WF77": 1},
    "4": {"A0A075AVM1": 1, "A0A6L2J4U7": 1, "A0A6L2NVQ2": 1, "A0A438IVD6": 1, "A0A151TJN2": 1, "A0A1Q3DX31": 1, "A0A484MYA7": 1, "A0A5B0P3I1": 1, "A0A7S2PGJ2": 1, "A0A6L2LGG4": 1, "A0A0K0ETN2": 1, "A0A498NPX3": 1, "A0A178U585": 3, "A0A6L2JSC2": 1, "A0A484LZ95": 2, "Q7XP10": 1, "R7QGR4": 1, "A0A4C1ZQC8": 2, "A0A4S4L2M5": 1, "A0A6H5HI70": 5, "A0A0J7KB27": 1, "A0A177U0N4": 1, "A0A7D9LFH4": 1},
    "5": {"A0A1Y3N171": 1, "A0A5J4WSY7": 2, "A0A2P5WJH5": 1, "A0A6J5VDK5": 15, "A0A336N1I0": 2, "A0A4Y7K5M2": 9, "A0A0V0W1L7": 2, "A0A328DAL0": 2}
}

It is possible to convert the JSON file into a TSV or CSV table using the Python script tool-kit/scripts/convert_annotation_file.py.

./convert_annotation_file.py -a [JSON file] -d [Delimiter, default is \t] -o [Output file]

he script uses only Python3 and does not require the installation of any additional modules.

Alignments

This directory contains the output files generated by Diamond and MMseqs2.

`diamond_alignments.tsv`

qseqid	qlen	qstart	qend	sseqid	slen	sstart	send	length	pident	ppos	score	evalue	bitscore
MALV-I-01_sp_EP00398\|sequence00001	475	1	475	MALV-I-01_sp_EP00398\|sequence00001	475	1	475	475	100	100	2448	0.0	947
MALV-I-01_sp_EP00398\|sequence00001	475	197	475	MALV-II-16_sp_EP00396\|sequence00239	290	16	268	279	56.6	71.3	812	5.11e-107	317
MALV-I-01_sp_EP00398\|sequence00001	475	128	207	MALV-II-16_sp_EP00396\|sequence00238	83	5	82	82	46.3	67.1	195	2.66e-19	79.7
MALV-I-01_sp_EP00398\|sequence00002	28	1	28	MALV-I-01_sp_EP00398\|sequence00002	28	1	28	28	100	100	134	1.52e-14	56.2

The Diamond BLASTp output file contains the following fields:

qseqid: Query sequence identifier
qlen: Length of the query sequence
qstart: Start position of the query sequence in the alignment
qend: End position of the query sequence in the alignment
sseqid: Subject sequence identifier
slen: Length of the subject sequence
sstart: Start position of the subject sequence in the alignment
send: End position of the subject sequence in the alignment
length: Length of the alignment
pident: Percentage of identical matches between query and subject sequences
ppos: Percentage of positive matches between query and subject sequences
score: Raw alignment score
evalue: Expectation value (E-value) of the alignment
bitscore: Normalized alignment score, representing the statistical significance

`diamond_alignments.filter.tsv`

qseqid	qlen	qstart	qend	sseqid	slen	sstart	send	length	pident	ppos	score	evalue	bitscore
MALV-I-01_sp_EP00398 \|sequence00004	305	157	266	MALV-I_sp_EP00400 \|sequence00851	557	360	471	114	25.4	50.0	102	1.10e-05	43.9
MALV-I-01_sp_EP00398 \|sequence00006	554	65	550	MALV-II-16_sp_EP00396 \|sequence00175	713	94	593	513	34.7	49.9	607	7.26e-71	238
MALV-I-01_sp_EP00398 \|sequence00007	863	339	433	MALV-I-01_sp_EP00398 \|sequence00434	919	397	472	95	32.6	48.4	100	1.06e-04	43.1
MALV-I-01_sp_EP00398 \|sequence00007	863	692	766	MALV-I-01_sp_EP00398 \|sequence00085	10025	661	713	75	33.3	48.0	938.31e-04	40.4

The Diamond BLASTp output file contains a single alignment for each pair of sequences and includes the following fields:

qseqid: Query sequence identifier
qlen: Length of the query sequence
qstart: Start position of the query sequence in the alignment
qend: End position of the query sequence in the alignment
sseqid: Subject sequence identifier
slen: Length of the subject sequence
sstart: Start position of the subject sequence in the alignment
send: End position of the subject sequence in the alignment
length: Length of the alignment
pident: Percentage of identical matches between query and subject sequences
ppos: Percentage of positive matches between query and subject sequences
score: Raw alignment score
evalue: Expectation value (E-value) of the alignment
bitscore: Normalized alignment score, representing the statistical significance

`mmseqs2_alpahfold_clusters_alignments.m8`

query	target	fident	alnlen	mismatch	qstart	qend	qlen	tstart	tend	tlen	evalue
MALV-I-01_sp_EP00398\|sequence00001	AFDB:AF-A0A812PGI6-F1	0.600	471	177	4	474	475	308	750	772	3.040E-171549
MALV-I-01_sp_EP00398\|sequence00001	AFDB:AF-A0A7S2CY24-F1	0.594	436	165	40	475	475	1	408	431	3.754E-155503
MALV-I-01_sp_EP00398\|sequence00001	AFDB:AF-A0A7J6MHE2-F1	0.524	472	214	4	475	475	1	451	470	4.516E-142465
MALV-I-01_sp_EP00398\|sequence00001	AFDB:AF-A0A0G4GLT6-F1	0.519	472	211	4	475	475	1	439	598	1.945E-140460

The output file of the MMseqs2 search against the AlphaFold clusters sequence database contains the following fields:

query: Query sequence identifier
target: Target sequence identifier (subject in the alignment)
fident: Fraction of identical residues between the query and target sequences in the alignment
alnlen: Length of the alignment (number of aligned residues)
mismatch: Number of mismatched residues between the query and target sequences in the alignment
gapopen: Number of gap openings in the alignment
qstart: Start position of the query sequence in the alignment
qend: End position of the query sequence in the alignment
qlen: Length of the query sequence
tstart: Start position of the target sequence in the alignment
tend: End position of the target sequence in the alignment
tlen: Length of the target sequence
evalue: Expectation value (E-value), representing the statistical significance of the alignment
bits: Normalized alignment score in bits (often used to assess the quality of the alignment)

`mmseqs2_alpahfold_clusters_alignments.selection.tsv`

query	target	fident	alnlen	mismatch	qstart	qend	qlen	tstart	tend	tlen	evalue	bits	coverageIndex
MALV-I-01_sp_EP00398\|sequence00001	AFDB:AF-A0A813LCP4-F1	0.646	297	96	179	475	475	2	272	288	7.850E-113380	0.7831176900584795	0.31570906432748536
MALV-I-01_sp_EP00398\|sequence00033	AFDB:AF-B6K6N4-F1	0.442	64	34	6	67	88	346	409	470	7.892E-0754	0.420357833655706	0.5683752417794972
MALV-I-01_sp_EP00398\|sequence00065	AFDB:AF-A0A812PAI8-F1	0.282	369	176	3	248	374	417	785	796	1.642E-36152	0.5606609249455835	0.19418617149920725
MALV-I-01_sp_EP00398\|sequence00098	AFDB:AF-A0A813J6D5-F1	0.487	590	294	4	593	730	127	700	867	1.018E-159528	0.735136117299458	0.1461661215654675

The output file of the MMseqs2 search against the AlphaFold clusters sequence database contains the following fields:

query: Query sequence identifier
target: Target sequence identifier (subject in the alignment)
fident: Fraction of identical residues between the query and target sequences in the alignment
alnlen: Length of the alignment (number of aligned residues)
mismatch: Number of mismatched residues between the query and target sequences in the alignment
gapopen: Number of gap openings in the alignment
qstart: Start position of the query sequence in the alignment
qend: End position of the query sequence in the alignment
qlen: Length of the query sequence
tstart: Start position of the target sequence in the alignment
tend: End position of the target sequence in the alignment
tlen: Length of the target sequence
evalue: Expectation value (E-value), representing the statistical significance of the alignment
bits: Normalized alignment score in bits (used to assess the quality of the alignment)
coverageIndex: A measure of the overall coverage between the query and target sequences. Calculated as the average of query and subject sequence coverage.
disparityIndex: Measures the balance of coverage between the query and target sequences. The closer to 0, the more balanced the coverage; the closer to 1, the more unbalanced.

`mmseqs2_pfam_database_alignments.m8`

query	target	fident	alnlen	mismatch	qstart	qend	qlen	tstart	tend	tlen	evalue	bits
MALV-I-01_sp_EP00398\|sequence00001	PF00587.30	0.294	261	161	196	456	475	140	368	675	4.137E-27	116
MALV-I-01_sp_EP00398\|sequence00033	PF00226.36	0.374	55	33	11	63	88	32	86	147	1.457E-04	38
MALV-I-01_sp_EP00398\|sequence00098	PF18198.7	0.430	170	93	282	445	730	6	175	248	2.382E-31	132
MALV-I-01_sp_EP00398\|sequence00098	PF03028.21	0.442	124	60	152	275	730	55	163	217	7.833E-22	101

The output file of the MMseqs2 search against the Pfam profile database contains the following fields:

query: Query sequence identifier
target: Target sequence identifier (subject in the alignment)
fident: Fraction of identical residues between the query and target sequences in the alignment
alnlen: Length of the alignment (number of aligned residues)
mismatch: Number of mismatched residues between the query and target sequences in the alignment
gapopen: Number of gap openings in the alignment
qstart: Start position of the query sequence in the alignment
qend: End position of the query sequence in the alignment
qlen: Length of the query sequence
tstart: Start position of the target sequence in the alignment
tend: End position of the target sequence in the alignment
tlen: Length of the target sequence
evalue: Expectation value (E-value), representing the statistical significance of the alignment
bits: Normalized alignment score in bits (used to assess the quality of the alignment)

Clusters files

This directory contains files specific to the clusters present in the various networks.

`network_I[inflation]_clusters_annotations.tsv`

cluster_id	alphafold_clusters	alphafold_sequences	gene3d	funfam	tmhmm	alphafold_pfam	pfamDB
245	A0A154P1T2;A0A7S3WA45	A0A7X3ZJX2;A0A369S7U3	G3DSA:1.25.40.20	NA	NA	PF00023;PF12796;PF13637;PF13857	PF12796;PF13637;PF13857
246	A0A7J5C144;A0A7M7K2D1	A0A2H6KD85;A0A1Y5ICJ0	NA	NA	NA	NA	NA
247	A0A0G4HX23;A0A839LJV4	A0A813JRF4;A0A0G4E9P6	G3DSA:2.60.120.10;G3DSA:1.10.1300.10	NA	NA	PF00520;PF02678;PF00233;PF05726	PF02678;PF00233;PF10175;PF05726
248	NA	NA	NA	NA	NA	NA	NA

This TSV file contains the annotations associated with each cluster. The columns are as follows:

cluster_id : Unique identifier for the cluster.
alphafold_clusters : AlphaFold cluster identifiers associated with the sequences in the cluster.
alphafold_sequences : Sequence identifiers from the AlphaFold database linked to the cluster.
gene3d : Gene3D annotations linked to the sequences in the cluster.
funfam : FunFam annotations linked to the sequences in the cluster.
tmhmm : Transmembrane helix annotations linked to the sequences in the cluster.
alphafold_pfam : Pfam annotations derived from AlphaFold sequences for the cluster.
pfamDB : Pfam annotations from the Pfam database linked to the sequences in the cluster.

`network_I[inflation]_clusters_metrics.tsv`

cluster_id	cluster_size	diameter	alphafold_clusters_homogeneity_score	alphafold_clusters_sequence	alphafold_clusters_numbre_labels	alphafold_sequences_homogeneity_score	alphafold_sequences_sequence	alphafold_sequences_numbre_labels	gene3d_homogeneity_score	gene3d_sequence	gene3d_numbre_labels	funfam_homogeneity_score	funfam_sequence	funfam_numbre_labels	tmhmm_homogeneity_score	tmhmm_sequence	tmhmm_numbre_labels	alphafold_pfam_homogeneity_score	alphafold_pfam_sequence	alphafold_pfam_numbre_labels	pfamDB_homogeneity_score	pfamDB_sequence	pfamDB_numbre_labels
0	54	3	0.40740740740740744	54	32	0.2407407407407407	54	41	0.6666666666666667	52	18	0.40740740740740744	32	32	1	2	1	0.7222222222222222	54	15	0.6851851851851851	50	17
1	53	4	0.13207547169811318	50	46	0.13207547169811318	50	46	0.7358490566037736	50	14	0.8679245283018868	6	7	1	3	1	0.7169811320754718	46	15	0.7169811320754718	46	15
2	49	4	0.6938775510204082	49	15	0.6734693877551021	49	16	NA	0	0	NA	0	0	131	0.9591836734693877	13	2	NA	0	0
3	45	5	0.4666666666666667	33	24	0.4222222222222223	33	26	0.8666666666666667	16	6	1	211	3	1	0.7333333333333334	24	12	0.9333333333333333	2	3

TSV file containing the metrics of the clusters present in a network. One file per inflation parameter.

cluster_id: Unique identifier for the cluster.
cluster_size: Size of the cluster (number of sequences it contains).
diameter: Diameter of the cluster, representing the shortest path between two sequences.
alphafold_clusters_homogeneity_score: Homogeneity score for AlphaFold clusters, calculated from the number of unique AlphaFold cluster annotations in the cluster.
alphafold_clusters_sequence: Number of sequences in the cluster linked to AlphaFold clusters.
alphafold_clusters_number_labels: Number of unique AlphaFold cluster labels found in a cluster.
alphafold_sequences_homogeneity_score: Homogeneity score for AlphaFold sequences, calculated from the number of unique AlphaFold sequence annotations in the cluster.
alphafold_sequences_sequence: Number of sequences in the cluster linked to AlphaFold sequences.
alphafold_sequences_number_labels: Number of unique AlphaFold sequence labels found in a cluster
gene3d_homogeneity_score: Homogeneity score for Gene3D annotations, calculated from the number of unique Gene3D annotations in the cluster.
gene3d_sequence: Number of sequences in the cluster linked to Gene3D annotations.
gene3d_number_labels: Number of unique Gene3D labels found in a cluster.
funfam_homogeneity_score: Homogeneity score for FunFam annotations, calculated from the number of unique FunFam annotations in the cluster.
funfam_sequence: Number of sequences in the cluster linked to FunFam annotations.
funfam_number_labels: Number of unique FunFam labels found in a cluster.
tmhmm_homogeneity_score: Homogeneity score for TMHMM annotations, calculated from the number of unique TMHMM annotations in the cluster.
tmhmm_sequence: Number of sequences in the cluster linked to TMHMM annotations.
tmhmm_number_labels: Number of unique TMHMM labels found in a cluster.
alphafold_pfam_homogeneity_score: Homogeneity score for AlphaFold Pfam annotations, calculated from the number of unique AlphaFold Pfam annotations in the cluster.
alphafold_pfam_sequence: Number of sequences in the cluster linked to AlphaFold Pfam annotations.
alphafold_pfam_number_labels: Number of unique AlphaFold Pfam labels found in a cluster.
pfamDB_homogeneity_score: Homogeneity score for PfamDB annotations, calculated from the number of unique PfamDB annotations in the cluster.
pfamDB_sequence: Number of sequences in the cluster linked to PfamDB annotations.
pfamDB_number_labels: Number of unique PfamDB labels found in a cluster.

Sequences files

This directory contains files specific to the sequences present in the various networks.

`network_I[inflation]_sequences_annotations.tsv`

sequence_id	gene3d	alphafold_pfam	funfam	alphafold_clusters	pfamDB	tmhmm	alphafold_sequences
MALV-II-16_sp_EP00396 \|sequence00182	NA	NA	NA	NA	NA	NA	NA
MALV-II-16_sp_EP00396 \|sequence00266	G3DSA:3.40.50.300;G3DSA:3.40.50.1240	PF00300;PF01591	NA	A0A6A6FSM6	PF00300;PF01591	NA	A0A7S3C007
MALV-I_sp_EP00400 \|sequence01089	G3DSA:3.40.50.300;G3DSA:3.40.50.1240	PF00300;PF01591	G3DSA:3.40.50.300:FF:000644	A0A1Y3ENH8	PF00300;PF01591	NAA0A7S3JYS1
MALV-II-16_sp_EP00396 \|sequence00273	NA	PF04515	NA	A0A7S2LD46	PF04515	TMhelix	A0A813I9P1

A TSV file containing all annotations associated with sequences in a network, with one file generated for each inflation parameter.

sequence_id: Unique identifier for each sequence in the network.
gene3d: Gene3D annotations linked to the sequence.
alphafold_pfam: AlphaFold-Pfam annotations linked to the sequence.
funfam: FunFam annotations linked to the sequence.
alphafold_clusters: AlphaFold clusters annotations linked to the sequence.
pfamDB: Pfam DB annotations linked to the sequence.
tmhmm: TMHMM annotations linked to the sequence.
alphafold_sequences: AlphaFold sequence annotations linked to the sequence.

`network_I[inflation]_sequences_metrics.tsv`

sequence_id	sequence_length	eigenvector_centrality	num_gene3d_id	num_alphafold_pfam_id	num_funfam_id	num_alphafold_clusters_id	num_pfamDB_id	num_alphafold_sequences_id
MALV-I-01_sp_EP00398\|sequence00096	99	0.4071553450680936	1	5	0	1	1	1
MALV-I-01_sp_EP00398\|sequence00097	906	0.5067939313056563	4	5	4	1	3	1
MALV-II-16_sp_EP00396\|sequence00405	1326	0.6539145129685043	5	4	5	1	5	1
MALV-I_sp_EP00400\|sequence00152	668	0.5272095731360656	4	4	3	1	3	1

Here’s a markdown description for the TSV files with sequence metrics (length, centrality, etc.):

sequence_id: Unique identifier for each sequence in the network.
cluster_id: Identifier of the cluster to which the sequence belongs.
sequence_length: Length (in amino acids) of the sequence.
eigenvector_centrality: Eigenvector centrality value for the sequence, indicating its importance within the network.
num_gene3d_id: Number of unique Gene3D annotations linked to the sequence.
num_alphafold_pfam_id: Number of unique AlphaFold-Pfam annotations linked to the sequence.
num_funfam_id: Number of unique FunFam annotations linked to the sequence.
num_alphafold_clusters_id: Number of unique AlphaFold clusters linked to the sequence.
num_pfamDB_id: Number of unique Pfam DB annotations linked to the sequence.
num_tmhmm_id: Number of unique TMHMM annotations linked to the sequence.
num_alphafold_sequences_id: Number of unique AlphaFold sequence identifiers linked to the sequence.

Edges files

`network_I[inflation]_edges.tsv`

qseqid	qlen	qstart	qend	sseqid	slen	sstart	send	length	pident	ppos	score	evalue	bitscore	cluster_id
MALV-I-01_sp_EP00398\|sequence00004	305	157	266	MALV-I_sp_EP00400\|sequence00851	557	360	471	114	25.4	50.0	102	1.10e-05	15.96	234
MALV-I-01_sp_EP00398\|sequence00006	554	65	550	MALV-II-16_sp_EP00396\|sequence00175	713	94	593	513	34.7	49.9	607	7.26e-71	17.6	235
MALV-I-01_sp_EP00398\|sequence00008	81	34	73	MALV-I-01_sp_EP00398\|sequence00853	602	552	591	40	35.0	60.0	758.91e-04	42.74	17
MALV-I-01_sp_EP00398\|sequence00011	1061	238	773	MALV-I-01_sp_EP00398\|sequence00593	540	1	536	536	100	100	2590	0.0	48.74	45

TSV file with alignments used to reconstruct clusters in each network. One file per inflation parameter.

qseqid: Query sequence identifier (ID of the query sequence in the alignment).
qlen: Length of the query sequence (in amino acids).
qstart: Start position of the alignment on the query sequence.
qend: End position of the alignment on the query sequence.
sseqid: Subject sequence identifier (ID of the subject sequence in the alignment).
slen: Length of the subject sequence (in amino acids).
sstart: Start position of the alignment on the subject sequence.
send: End position of the alignment on the subject sequence.
length: Length of the alignment.
pident: Percentage of identical matches between the query and subject sequences.
ppos: Percentage of positive matches (including identical and similar residues).
score: Alignment score calculated by the alignment algorithm.
evalue: E-value (expectation value) of the alignment, representing the number of hits expected by chance.
bitscore: Bit score of the alignment, a measure of the alignment’s quality.
cluster_id: Identifier of the cluster to which the aligned sequences belong.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7. Output files

Table of content

Directory layout

LAGOON-MCL output

Abundance matrix

`network_I[inflation]_[annotation]_abundance_matrix.json`

Alignments

`diamond_alignments.tsv`

`diamond_alignments.filter.tsv`

`mmseqs2_alpahfold_clusters_alignments.m8`

`mmseqs2_alpahfold_clusters_alignments.selection.tsv`

`mmseqs2_pfam_database_alignments.m8`

Clusters files

`network_I[inflation]_clusters_annotations.tsv`

`network_I[inflation]_clusters_metrics.tsv`

Sequences files

`network_I[inflation]_sequences_annotations.tsv`

`network_I[inflation]_sequences_metrics.tsv`

Edges files

`network_I[inflation]_edges.tsv`

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally