Skip to content

7. Output files

Jérémy Rousseau edited this page Jan 16, 2026 · 4 revisions

Table of content

Directory layout

Descriptions of the various files generated by LAGOON-MCL. Directory layout with results.

lagoon-mcl/
└── results/                                                            #
    ├── lagoon-mcl_output/                                              #
    │   │
    │   ├── abundance_matrix/                                           #
    │   │   └── network_I[inflation]_[annotation]_abundance_matrix.json #
    │   │
    │   ├── diamond/                                                    #
    │   │   ├── diamond_alignment.filter.tsv                            #
    │   │   ├── diamond_alignments.tsv                                  #
    │   │   ├── mmseqs2_alpahfold_clusters_alignments.m8                #
    │   │   ├── mmseqs2_alpahfold_clusters_alignments.selection.tsv     #
    │   │   └── mmseqs2_pfam_database_alignments.m8                    #
    │   │
    │   ├── network_I[inflation]                                        #
    │   │   ├── clusters/                                               #
    │   │   │   ├── network_I[inflation]_clusters_annotations.tsv       #
    │   │   │   └── network_I[inflation]_clusters_metrics.tsv           #
    │   │   │
    │   │   ├── edges/                                                  #
    │   │   │   └── network_I[inflation]_edges.tsv                      #
    │   │   │
    │   │   └── sequences/                                              #
    │   │       ├── network_I[inflation]_sequences_annotations.tsv      #
    │   │       └── network_I[inflation]_sequences_metrics.tsv          #
    │   │
    │   └── reports/                                                    #
    │       ├── network_I[inflation]_figures/                           #
    │       │   ├── clusters_caracteristics_[label].png                 #
    │       │   ├── clusters_metrics.png                                #
    │       │   ├── homogeneity_score_[label].png                       #
    │       │   ├── sequence_label_num_[label]_id.png                   #
    │       │   └── sequence_length_centrality.png                      #
    │       └── network_I[inflation]_report.html                        #
    │
    └── nextflow_reports/                                               #

LAGOON-MCL output

Abundance matrix

network_I[inflation]_[annotation]_abundance_matrix.json

An abundance file is generated for each annotation type and each inflation parameter. This file contains the abundance of annotations within each cluster.

{
    "0": {"A0A183PR16": 1, "A0A6G0J3D9": 1, "A0A388LF28": 2, "A0A812QWC6": 1, "A0A7J6Q934": 1, "A0A813FM05": 2, "A0A812WK55": 1, "U6N5Z9": 1, "A0A7J6PV29": 1, "A0A086KJH4": 1, "H2YF95": 1, "A0A813BA63": 1, "A0A813JSA0": 2, "A0A812TQ29": 2, "A0A0M0K2E2": 1, "U6GFX1": 1, "A0A0G4ECY9": 1, "A0A812K4G7": 1, "A0A812PKL1": 3, "A0A812VMD3": 1, "A0A812NTR3": 1, "A0A1Q9DHQ4": 1, "A0A183R3Q5": 1, "A0A7J6T9M1": 1, "A0A813I2X5": 1, "A0A553QIE6": 1, "A0CL99": 1, "A0A7J5ZD72": 1, "A0A6I9NLT8": 1, "A0A7J6T346": 1, "A0A3Q2Z2J4": 1, "A0A7J6RR91": 1, "A0A7J7XUX5": 1, "A0A7J6TQJ7": 1, "A0A7J6T231": 1, "A0CHQ5": 1, "A0A2T6J476": 1, "A0A813C3W0": 1, "A0A7R9I977": 1, "A0A7J5ZFK4": 1, "A0A086M5R1": 1, "A0A2D4BNR7": 1, "A0A7S4QQJ2": 1, "A0A812JA43": 1, "A0A6P6FF32": 1, "A0A813HBU8": 1, "A0A7S1PA39": 1, "A0A2G8YAX1": 1},
    "1": {"A0A0D2GK41": 1, "A0A1I8M4E7": 1, "A0A7S1A3C8": 1, "R9P6Z5": 1, "A0A846EAF9": 1, "A0A6G1CQL2": 1, "A0A1Q9DW77": 1, "A0A388KAS7": 2, "A0A7S3TGY1": 1, "G0R2F6": 1, "A0A250XJJ8": 1, "A0A507D151": 1, "A0A0L0FTB9": 1, "G0QSF0": 1, "A0A150G7W3": 1, "A0A7S1N0A9": 1, "A0A2H2I0E3": 1, "A0A0G4EL96": 1, "J9JBL8": 1, "A0A6P8V8G4": 1, "A0A813C6N7": 2, "G0R0T4": 1, "A0A2C6KY70": 1, "A0A150GXZ4": 1, "A0A4W3JHJ4": 1, "A0A2U9BJ95": 1, "A0A812SDT4": 1, "A0A182YG02": 1, "A0A812WWN6": 1, "I3S6R2": 1, "A0A0C4ER12": 1, "A0A1R2AW38": 1, "A0A812I5G3": 1, "C5KS90": 1, "A0A3P1BA92": 1, "A0A384K564": 1, "A0A3B3RIU9": 1, "A0A0P1A9W9": 1, "G0QN77": 1, "B2Q6C2": 1, "A0A0K0DJP8": 1, "A0A834RCZ5": 1, "A0A7S4SK49": 1, "G3B601": 1, "X6NRD4": 1, "A0A6P8NQ06": 1, "A0A812RYH2": 1, "Q238S0": 1, "A0A7S1RQD1": 1},
    "2": {"A0A068SF61": 7, "A0A833RLM5": 9, "F4RVK9": 1, "A0A098VSY4": 11, "A0A0L9UDJ6": 2, "A0A1Q3B7R3": 1, "A0A6A1VJU8": 1, "A0A0N5BXW6": 1, "J9LA01": 3, "B3SDD1": 1, "A0A183TTJ3": 4, "A0A6L2LKW4": 1, "A0A287E8N8": 1, "M8AH53": 1, "A0A1S8VI42": 2, "A0A4U5N899": 1, "A0A068XZD3": 1, "X1X2U2": 1},
    "3": {"A0A1Z5KAI0": 1, "A0A672PKK1": 1, "A0A4U5VA95": 1, "A0A2G2XRV4": 1, "A0A6P7ULQ5": 1, "H3AFC1": 2, "A0A814MWY7": 1, "A0A7M3QUJ9": 1, "C5KXZ8": 1, "A0A1Y2DIJ2": 1, "A0A7R8ZNR6": 1, "A0A3C1RZF5": 1, "A0A4V6XW70": 1, "A0A669CUH2": 1, "A0A016T7J7": 1, "A0A1Y1M3Z3": 1, "A0A812LTJ5": 1, "A0A7J6NWD6": 1, "A0A812YF94": 1, "A0A177U5G8": 3, "A0A7M3QCM4": 1, "A0A833P9X4": 1, "A0A812U720": 1, "B7GCE1": 2, "A0A812Z0K7": 1, "A0A7S3PJB6": 2, "A0A7M3Q470": 1, "A0A811WF77": 1},
    "4": {"A0A075AVM1": 1, "A0A6L2J4U7": 1, "A0A6L2NVQ2": 1, "A0A438IVD6": 1, "A0A151TJN2": 1, "A0A1Q3DX31": 1, "A0A484MYA7": 1, "A0A5B0P3I1": 1, "A0A7S2PGJ2": 1, "A0A6L2LGG4": 1, "A0A0K0ETN2": 1, "A0A498NPX3": 1, "A0A178U585": 3, "A0A6L2JSC2": 1, "A0A484LZ95": 2, "Q7XP10": 1, "R7QGR4": 1, "A0A4C1ZQC8": 2, "A0A4S4L2M5": 1, "A0A6H5HI70": 5, "A0A0J7KB27": 1, "A0A177U0N4": 1, "A0A7D9LFH4": 1},
    "5": {"A0A1Y3N171": 1, "A0A5J4WSY7": 2, "A0A2P5WJH5": 1, "A0A6J5VDK5": 15, "A0A336N1I0": 2, "A0A4Y7K5M2": 9, "A0A0V0W1L7": 2, "A0A328DAL0": 2}
}

It is possible to convert the JSON file into a TSV or CSV table using the Python script tool-kit/scripts/convert_annotation_file.py.

./convert_annotation_file.py -a [JSON file] -d [Delimiter, default is \t] -o [Output file]

he script uses only Python3 and does not require the installation of any additional modules.

Alignments

This directory contains the output files generated by Diamond and MMseqs2.

diamond_alignments.tsv

qseqid qlen qstart qend sseqid slen sstart send length pident ppos score evalue bitscore
MALV-I-01_sp_EP00398|sequence00001 475 1 475 MALV-I-01_sp_EP00398|sequence00001 475 1 475 475 100 100 2448 0.0 947
MALV-I-01_sp_EP00398|sequence00001 475 197 475 MALV-II-16_sp_EP00396|sequence00239 290 16 268 279 56.6 71.3 812 5.11e-107 317
 MALV-I-01_sp_EP00398|sequence00001 475 128 207 MALV-II-16_sp_EP00396|sequence00238 83 5 82 82 46.3 67.1 195 2.66e-19 79.7 
MALV-I-01_sp_EP00398|sequence00002 28 1 28 MALV-I-01_sp_EP00398|sequence00002 28 1 28 28 100 100 134 1.52e-14 56.2

The Diamond BLASTp output file contains the following fields:

  • qseqid: Query sequence identifier
  • qlen: Length of the query sequence
  • qstart: Start position of the query sequence in the alignment
  • qend: End position of the query sequence in the alignment
  • sseqid: Subject sequence identifier
  • slen: Length of the subject sequence
  • sstart: Start position of the subject sequence in the alignment
  • send: End position of the subject sequence in the alignment
  • length: Length of the alignment
  • pident: Percentage of identical matches between query and subject sequences
  • ppos: Percentage of positive matches between query and subject sequences
  • score: Raw alignment score
  • evalue: Expectation value (E-value) of the alignment
  • bitscore: Normalized alignment score, representing the statistical significance

diamond_alignments.filter.tsv

qseqid qlen qstart qend sseqid slen sstart send length pident ppos score evalue bitscore
MALV-I-01_sp_EP00398 |sequence00004 305 157 266 MALV-I_sp_EP00400 |sequence00851 557 360 471 114 25.4 50.0 102 1.10e-05 43.9
MALV-I-01_sp_EP00398 |sequence00006 554 65 550 MALV-II-16_sp_EP00396 |sequence00175 713 94 593 513 34.7 49.9 607 7.26e-71 238
MALV-I-01_sp_EP00398 |sequence00007 863 339 433 MALV-I-01_sp_EP00398 |sequence00434 919 397 472 95 32.6 48.4 100 1.06e-04 43.1
MALV-I-01_sp_EP00398 |sequence00007 863 692 766 MALV-I-01_sp_EP00398 |sequence00085 10025 661 713 75 33.3 48.0 938.31e-04 40.4

The Diamond BLASTp output file contains a single alignment for each pair of sequences and includes the following fields:

  • qseqid: Query sequence identifier
  • qlen: Length of the query sequence
  • qstart: Start position of the query sequence in the alignment
  • qend: End position of the query sequence in the alignment
  • sseqid: Subject sequence identifier
  • slen: Length of the subject sequence
  • sstart: Start position of the subject sequence in the alignment
  • send: End position of the subject sequence in the alignment
  • length: Length of the alignment
  • pident: Percentage of identical matches between query and subject sequences
  • ppos: Percentage of positive matches between query and subject sequences
  • score: Raw alignment score
  • evalue: Expectation value (E-value) of the alignment
  • bitscore: Normalized alignment score, representing the statistical significance

mmseqs2_alpahfold_clusters_alignments.m8

query  target fident alnlen mismatch gapopen qstart qend qlen tstart tend tlen evalue bits
MALV-I-01_sp_EP00398|sequence00001 AFDB:AF-A0A812PGI6-F1 0.600 471 177 0 4 474 475 308 750 772 3.040E-171549
MALV-I-01_sp_EP00398|sequence00001 AFDB:AF-A0A7S2CY24-F1 0.594 436 165 0 40 475 475 1 408 431 3.754E-155503
MALV-I-01_sp_EP00398|sequence00001 AFDB:AF-A0A7J6MHE2-F1 0.524 472 214 0 4 475 475 1 451 470 4.516E-142465
 MALV-I-01_sp_EP00398|sequence00001 AFDB:AF-A0A0G4GLT6-F1 0.519 472 211 0 4 475 475 1 439 598 1.945E-140460

The output file of the MMseqs2 search against the AlphaFold clusters sequence database contains the following fields:

  • query: Query sequence identifier
  • target: Target sequence identifier (subject in the alignment)
  • fident: Fraction of identical residues between the query and target sequences in the alignment
  • alnlen: Length of the alignment (number of aligned residues)
  • mismatch: Number of mismatched residues between the query and target sequences in the alignment
  • gapopen: Number of gap openings in the alignment
  • qstart: Start position of the query sequence in the alignment
  • qend: End position of the query sequence in the alignment
  • qlen: Length of the query sequence
  • tstart: Start position of the target sequence in the alignment
  • tend: End position of the target sequence in the alignment
  • tlen: Length of the target sequence
  • evalue: Expectation value (E-value), representing the statistical significance of the alignment
  • bits: Normalized alignment score in bits (often used to assess the quality of the alignment)

mmseqs2_alpahfold_clusters_alignments.selection.tsv

query  target fident alnlen mismatch gapopen qstart qend qlen tstart tend tlen evalue bits coverageIndex disparityIndex
MALV-I-01_sp_EP00398|sequence00001 AFDB:AF-A0A813LCP4-F1 0.646 297 96 0 179 475 475 2 272 288 7.850E-113380 0.7831176900584795 0.31570906432748536
MALV-I-01_sp_EP00398|sequence00033 AFDB:AF-B6K6N4-F1 0.442 64 34 0 6 67 88 346 409 470 7.892E-0754 0.420357833655706 0.5683752417794972
MALV-I-01_sp_EP00398|sequence00065 AFDB:AF-A0A812PAI8-F1 0.282 369 176 0 3 248 374 417 785 796 1.642E-36152 0.5606609249455835 0.19418617149920725
MALV-I-01_sp_EP00398|sequence00098 AFDB:AF-A0A813J6D5-F1 0.487 590 294 0 4 593 730 127 700 867 1.018E-159528 0.735136117299458 0.1461661215654675

The output file of the MMseqs2 search against the AlphaFold clusters sequence database contains the following fields:

  • query: Query sequence identifier
  • target: Target sequence identifier (subject in the alignment)
  • fident: Fraction of identical residues between the query and target sequences in the alignment
  • alnlen: Length of the alignment (number of aligned residues)
  • mismatch: Number of mismatched residues between the query and target sequences in the alignment
  • gapopen: Number of gap openings in the alignment
  • qstart: Start position of the query sequence in the alignment
  • qend: End position of the query sequence in the alignment
  • qlen: Length of the query sequence
  • tstart: Start position of the target sequence in the alignment
  • tend: End position of the target sequence in the alignment
  • tlen: Length of the target sequence
  • evalue: Expectation value (E-value), representing the statistical significance of the alignment
  • bits: Normalized alignment score in bits (used to assess the quality of the alignment)
  • coverageIndex: A measure of the overall coverage between the query and target sequences. Calculated as the average of query and subject sequence coverage.
  • disparityIndex: Measures the balance of coverage between the query and target sequences. The closer to 0, the more balanced the coverage; the closer to 1, the more unbalanced.

mmseqs2_pfam_database_alignments.m8

query  target fident alnlen mismatch gapopen qstart qend qlen tstart tend tlen evalue bits
MALV-I-01_sp_EP00398|sequence00001 PF00587.30 0.294 261 161 0 196 456 475 140 368 675 4.137E-27 116
MALV-I-01_sp_EP00398|sequence00033 PF00226.36 0.374 55 33 0 11 63 88 32 86 147 1.457E-04 38
MALV-I-01_sp_EP00398|sequence00098 PF18198.7 0.430 170 93 0 282 445 730 6 175 248 2.382E-31 132
MALV-I-01_sp_EP00398|sequence00098 PF03028.21 0.442 124 60 0 152 275 730 55 163 217 7.833E-22 101

The output file of the MMseqs2 search against the Pfam profile database contains the following fields:

  • query: Query sequence identifier
  • target: Target sequence identifier (subject in the alignment)
  • fident: Fraction of identical residues between the query and target sequences in the alignment
  • alnlen: Length of the alignment (number of aligned residues)
  • mismatch: Number of mismatched residues between the query and target sequences in the alignment
  • gapopen: Number of gap openings in the alignment
  • qstart: Start position of the query sequence in the alignment
  • qend: End position of the query sequence in the alignment
  • qlen: Length of the query sequence
  • tstart: Start position of the target sequence in the alignment
  • tend: End position of the target sequence in the alignment
  • tlen: Length of the target sequence
  • evalue: Expectation value (E-value), representing the statistical significance of the alignment
  • bits: Normalized alignment score in bits (used to assess the quality of the alignment)

Clusters files

This directory contains files specific to the clusters present in the various networks.

network_I[inflation]_clusters_annotations.tsv

cluster_id alphafold_clusters alphafold_sequences gene3d funfam tmhmm alphafold_pfam pfamDB
245 A0A154P1T2;A0A7S3WA45 A0A7X3ZJX2;A0A369S7U3 G3DSA:1.25.40.20 NA NA PF00023;PF12796;PF13637;PF13857 PF12796;PF13637;PF13857
246 A0A7J5C144;A0A7M7K2D1 A0A2H6KD85;A0A1Y5ICJ0 NA NA NA NA NA
247 A0A0G4HX23;A0A839LJV4 A0A813JRF4;A0A0G4E9P6 G3DSA:2.60.120.10;G3DSA:1.10.1300.10 NA NA PF00520;PF02678;PF00233;PF05726 PF02678;PF00233;PF10175;PF05726
248 NA NA NA NA NA NA NA 

This TSV file contains the annotations associated with each cluster. The columns are as follows:

  • cluster_id : Unique identifier for the cluster.
  • alphafold_clusters : AlphaFold cluster identifiers associated with the sequences in the cluster.
  • alphafold_sequences : Sequence identifiers from the AlphaFold database linked to the cluster.
  • gene3d : Gene3D annotations linked to the sequences in the cluster.
  • funfam : FunFam annotations linked to the sequences in the cluster.
  • tmhmm : Transmembrane helix annotations linked to the sequences in the cluster.
  • alphafold_pfam : Pfam annotations derived from AlphaFold sequences for the cluster.
  • pfamDB : Pfam annotations from the Pfam database linked to the sequences in the cluster.

network_I[inflation]_clusters_metrics.tsv

cluster_id cluster_size diameter alphafold_clusters_homogeneity_score alphafold_clusters_sequence alphafold_clusters_numbre_labels alphafold_sequences_homogeneity_score alphafold_sequences_sequence alphafold_sequences_numbre_labels gene3d_homogeneity_score gene3d_sequence gene3d_numbre_labels funfam_homogeneity_score funfam_sequence funfam_numbre_labels tmhmm_homogeneity_score tmhmm_sequence tmhmm_numbre_labels alphafold_pfam_homogeneity_score alphafold_pfam_sequence alphafold_pfam_numbre_labels pfamDB_homogeneity_score pfamDB_sequence pfamDB_numbre_labels
0 54 3 0.40740740740740744 54 32 0.2407407407407407 54 41 0.6666666666666667 52 18 0.40740740740740744 32 32 1 2 1 0.7222222222222222 54 15 0.6851851851851851 50 17
1 53 4 0.13207547169811318 50 46 0.13207547169811318 50 46 0.7358490566037736 50 14 0.8679245283018868 6 7 1 3 1 0.7169811320754718 46 15 0.7169811320754718 46 15 
 2 49 4 0.6938775510204082 49 15 0.6734693877551021 49 16 NA 0 0 NA 0 0 131 0.9591836734693877 13 2 NA 0 0
 3 45 5 0.4666666666666667 33 24 0.4222222222222223 33 26 0.8666666666666667 16 6 1 211 3 1 0.7333333333333334 24 12 0.9333333333333333 2

TSV file containing the metrics of the clusters present in a network. One file per inflation parameter.

  • cluster_id: Unique identifier for the cluster.
  • cluster_size: Size of the cluster (number of sequences it contains).
  • diameter: Diameter of the cluster, representing the shortest path between two sequences.
  • alphafold_clusters_homogeneity_score: Homogeneity score for AlphaFold clusters, calculated from the number of unique AlphaFold cluster annotations in the cluster.
  • alphafold_clusters_sequence: Number of sequences in the cluster linked to AlphaFold clusters.
  • alphafold_clusters_number_labels: Number of unique AlphaFold cluster labels found in a cluster.
  • alphafold_sequences_homogeneity_score: Homogeneity score for AlphaFold sequences, calculated from the number of unique AlphaFold sequence annotations in the cluster.
  • alphafold_sequences_sequence: Number of sequences in the cluster linked to AlphaFold sequences.
  • alphafold_sequences_number_labels: Number of unique AlphaFold sequence labels found in a cluster
  • gene3d_homogeneity_score: Homogeneity score for Gene3D annotations, calculated from the number of unique Gene3D annotations in the cluster.
  • gene3d_sequence: Number of sequences in the cluster linked to Gene3D annotations.
  • gene3d_number_labels: Number of unique Gene3D labels found in a cluster.
  • funfam_homogeneity_score: Homogeneity score for FunFam annotations, calculated from the number of unique FunFam annotations in the cluster.
  • funfam_sequence: Number of sequences in the cluster linked to FunFam annotations.
  • funfam_number_labels: Number of unique FunFam labels found in a cluster.
  • tmhmm_homogeneity_score: Homogeneity score for TMHMM annotations, calculated from the number of unique TMHMM annotations in the cluster.
  • tmhmm_sequence: Number of sequences in the cluster linked to TMHMM annotations.
  • tmhmm_number_labels: Number of unique TMHMM labels found in a cluster.
  • alphafold_pfam_homogeneity_score: Homogeneity score for AlphaFold Pfam annotations, calculated from the number of unique AlphaFold Pfam annotations in the cluster.
  • alphafold_pfam_sequence: Number of sequences in the cluster linked to AlphaFold Pfam annotations.
  • alphafold_pfam_number_labels: Number of unique AlphaFold Pfam labels found in a cluster.
  • pfamDB_homogeneity_score: Homogeneity score for PfamDB annotations, calculated from the number of unique PfamDB annotations in the cluster.
  • pfamDB_sequence: Number of sequences in the cluster linked to PfamDB annotations.
  • pfamDB_number_labels: Number of unique PfamDB labels found in a cluster.

Sequences files

This directory contains files specific to the sequences present in the various networks.

network_I[inflation]_sequences_annotations.tsv

sequence_id gene3d alphafold_pfam funfam alphafold_clusters pfamDB tmhmm alphafold_sequences
 MALV-II-16_sp_EP00396 |sequence00182 NA NA NA NA NA NA NA
MALV-II-16_sp_EP00396 |sequence00266 G3DSA:3.40.50.300;G3DSA:3.40.50.1240 PF00300;PF01591 NA A0A6A6FSM6 PF00300;PF01591 NA A0A7S3C007
 MALV-I_sp_EP00400 |sequence01089 G3DSA:3.40.50.300;G3DSA:3.40.50.1240 PF00300;PF01591 G3DSA:3.40.50.300:FF:000644 A0A1Y3ENH8 PF00300;PF01591 NAA0A7S3JYS1
MALV-II-16_sp_EP00396 |sequence00273 NA PF04515 NA A0A7S2LD46 PF04515 TMhelix A0A813I9P1

A TSV file containing all annotations associated with sequences in a network, with one file generated for each inflation parameter.

  • sequence_id: Unique identifier for each sequence in the network.
  • gene3d: Gene3D annotations linked to the sequence.
  • alphafold_pfam: AlphaFold-Pfam annotations linked to the sequence.
  • funfam: FunFam annotations linked to the sequence.
  • alphafold_clusters: AlphaFold clusters annotations linked to the sequence.
  • pfamDB: Pfam DB annotations linked to the sequence.
  • tmhmm: TMHMM annotations linked to the sequence.
  • alphafold_sequences: AlphaFold sequence annotations linked to the sequence.

network_I[inflation]_sequences_metrics.tsv

sequence_id cluster_id sequence_length eigenvector_centrality num_gene3d_id num_alphafold_pfam_id num_funfam_id num_alphafold_clusters_id num_pfamDB_id num_tmhmm_id num_alphafold_sequences_id
MALV-I-01_sp_EP00398|sequence00096 0 99 0.4071553450680936 1 5 0 1 1 0 1
MALV-I-01_sp_EP00398|sequence00097 0 906 0.5067939313056563 4 5 4 1 3 0 1
MALV-II-16_sp_EP00396|sequence00405 0 1326 0.6539145129685043 5 4 5 1 5 0 1
MALV-I_sp_EP00400|sequence00152 0 668 0.5272095731360656 4 4 3 1 3 0 1

Here’s a markdown description for the TSV files with sequence metrics (length, centrality, etc.):

  • sequence_id: Unique identifier for each sequence in the network.
  • cluster_id: Identifier of the cluster to which the sequence belongs.
  • sequence_length: Length (in amino acids) of the sequence.
  • eigenvector_centrality: Eigenvector centrality value for the sequence, indicating its importance within the network.
  • num_gene3d_id: Number of unique Gene3D annotations linked to the sequence.
  • num_alphafold_pfam_id: Number of unique AlphaFold-Pfam annotations linked to the sequence.
  • num_funfam_id: Number of unique FunFam annotations linked to the sequence.
  • num_alphafold_clusters_id: Number of unique AlphaFold clusters linked to the sequence.
  • num_pfamDB_id: Number of unique Pfam DB annotations linked to the sequence.
  • num_tmhmm_id: Number of unique TMHMM annotations linked to the sequence.
  • num_alphafold_sequences_id: Number of unique AlphaFold sequence identifiers linked to the sequence.

Edges files

network_I[inflation]_edges.tsv

qseqid qlen qstart qend sseqid slen sstart send length pident ppos score evalue bitscore cluster_id
MALV-I-01_sp_EP00398|sequence00004 305 157 266 MALV-I_sp_EP00400|sequence00851 557 360 471 114 25.4 50.0 102 1.10e-05 15.96 234
MALV-I-01_sp_EP00398|sequence00006 554 65 550 MALV-II-16_sp_EP00396|sequence00175 713 94 593 513 34.7 49.9 607 7.26e-71 17.6 235
MALV-I-01_sp_EP00398|sequence00008 81 34 73 MALV-I-01_sp_EP00398|sequence00853 602 552 591 40 35.0 60.0 758.91e-04 42.74 17
MALV-I-01_sp_EP00398|sequence00011 1061 238 773 MALV-I-01_sp_EP00398|sequence00593 540 1 536 536 100 100 2590 0.0 48.74 45

TSV file with alignments used to reconstruct clusters in each network. One file per inflation parameter.

  • qseqid: Query sequence identifier (ID of the query sequence in the alignment).
  • qlen: Length of the query sequence (in amino acids).
  • qstart: Start position of the alignment on the query sequence.
  • qend: End position of the alignment on the query sequence.
  • sseqid: Subject sequence identifier (ID of the subject sequence in the alignment).
  • slen: Length of the subject sequence (in amino acids).
  • sstart: Start position of the alignment on the subject sequence.
  • send: End position of the alignment on the subject sequence.
  • length: Length of the alignment.
  • pident: Percentage of identical matches between the query and subject sequences.
  • ppos: Percentage of positive matches (including identical and similar residues).
  • score: Alignment score calculated by the alignment algorithm.
  • evalue: E-value (expectation value) of the alignment, representing the number of hits expected by chance.
  • bitscore: Bit score of the alignment, a measure of the alignment’s quality.
  • cluster_id: Identifier of the cluster to which the aligned sequences belong.

Clone this wiki locally