-
Notifications
You must be signed in to change notification settings - Fork 0
7. Output files
- Structure du répertoire
- LAGOON-MCL output
Descriptions of the various files generated by LAGOON-MCL. Directory layout with results.
lagoon-mcl/
└── results/ #
├── lagoon-mcl_output/ #
│ │
│ ├── abundance_matrix/ #
│ │ └── network_I[inflation]_[annotation]_abundance_matrix.json #
│ │
│ ├── diamond/ #
│ │ ├── diamond_alignment.filter.tsv #
│ │ ├── diamond_alignments.tsv #
│ │ ├── mmseqs2_alpahfold_clusters_alignments.m8 #
│ │ ├── mmseqs2_alpahfold_clusters_alignments.selection.tsv #
│ │ └── mmseqs2_pfam_database_alignments.m8 #
│ │
│ ├── network_I[inflation] #
│ │ ├── clusters/ #
│ │ │ ├── network_I[inflation]_clusters_annotations.tsv #
│ │ │ └── network_I[inflation]_clusters_metrics.tsv #
│ │ │
│ │ ├── edges/ #
│ │ │ └── network_I[inflation]_edges.tsv #
│ │ │
│ │ └── sequences/ #
│ │ ├── network_I[inflation]_sequences_annotations.tsv #
│ │ └── network_I[inflation]_sequences_metrics.tsv #
│ │
│ └── reports/ #
│ ├── network_I[inflation]_figures/ #
│ │ ├── clusters_caracteristics_[label].png #
│ │ ├── clusters_metrics.png #
│ │ ├── homogeneity_score_[label].png #
│ │ ├── sequence_label_num_[label]_id.png #
│ │ └── sequence_length_centrality.png #
│ └── network_I[inflation]_report.html #
│
└── nextflow_reports/ #An abundance file is generated for each annotation type and each inflation parameter. This file contains the abundance of annotations within each cluster.
{
"0": {"A0A183PR16": 1, "A0A6G0J3D9": 1, "A0A388LF28": 2, "A0A812QWC6": 1, "A0A7J6Q934": 1, "A0A813FM05": 2, "A0A812WK55": 1, "U6N5Z9": 1, "A0A7J6PV29": 1, "A0A086KJH4": 1, "H2YF95": 1, "A0A813BA63": 1, "A0A813JSA0": 2, "A0A812TQ29": 2, "A0A0M0K2E2": 1, "U6GFX1": 1, "A0A0G4ECY9": 1, "A0A812K4G7": 1, "A0A812PKL1": 3, "A0A812VMD3": 1, "A0A812NTR3": 1, "A0A1Q9DHQ4": 1, "A0A183R3Q5": 1, "A0A7J6T9M1": 1, "A0A813I2X5": 1, "A0A553QIE6": 1, "A0CL99": 1, "A0A7J5ZD72": 1, "A0A6I9NLT8": 1, "A0A7J6T346": 1, "A0A3Q2Z2J4": 1, "A0A7J6RR91": 1, "A0A7J7XUX5": 1, "A0A7J6TQJ7": 1, "A0A7J6T231": 1, "A0CHQ5": 1, "A0A2T6J476": 1, "A0A813C3W0": 1, "A0A7R9I977": 1, "A0A7J5ZFK4": 1, "A0A086M5R1": 1, "A0A2D4BNR7": 1, "A0A7S4QQJ2": 1, "A0A812JA43": 1, "A0A6P6FF32": 1, "A0A813HBU8": 1, "A0A7S1PA39": 1, "A0A2G8YAX1": 1},
"1": {"A0A0D2GK41": 1, "A0A1I8M4E7": 1, "A0A7S1A3C8": 1, "R9P6Z5": 1, "A0A846EAF9": 1, "A0A6G1CQL2": 1, "A0A1Q9DW77": 1, "A0A388KAS7": 2, "A0A7S3TGY1": 1, "G0R2F6": 1, "A0A250XJJ8": 1, "A0A507D151": 1, "A0A0L0FTB9": 1, "G0QSF0": 1, "A0A150G7W3": 1, "A0A7S1N0A9": 1, "A0A2H2I0E3": 1, "A0A0G4EL96": 1, "J9JBL8": 1, "A0A6P8V8G4": 1, "A0A813C6N7": 2, "G0R0T4": 1, "A0A2C6KY70": 1, "A0A150GXZ4": 1, "A0A4W3JHJ4": 1, "A0A2U9BJ95": 1, "A0A812SDT4": 1, "A0A182YG02": 1, "A0A812WWN6": 1, "I3S6R2": 1, "A0A0C4ER12": 1, "A0A1R2AW38": 1, "A0A812I5G3": 1, "C5KS90": 1, "A0A3P1BA92": 1, "A0A384K564": 1, "A0A3B3RIU9": 1, "A0A0P1A9W9": 1, "G0QN77": 1, "B2Q6C2": 1, "A0A0K0DJP8": 1, "A0A834RCZ5": 1, "A0A7S4SK49": 1, "G3B601": 1, "X6NRD4": 1, "A0A6P8NQ06": 1, "A0A812RYH2": 1, "Q238S0": 1, "A0A7S1RQD1": 1},
"2": {"A0A068SF61": 7, "A0A833RLM5": 9, "F4RVK9": 1, "A0A098VSY4": 11, "A0A0L9UDJ6": 2, "A0A1Q3B7R3": 1, "A0A6A1VJU8": 1, "A0A0N5BXW6": 1, "J9LA01": 3, "B3SDD1": 1, "A0A183TTJ3": 4, "A0A6L2LKW4": 1, "A0A287E8N8": 1, "M8AH53": 1, "A0A1S8VI42": 2, "A0A4U5N899": 1, "A0A068XZD3": 1, "X1X2U2": 1},
"3": {"A0A1Z5KAI0": 1, "A0A672PKK1": 1, "A0A4U5VA95": 1, "A0A2G2XRV4": 1, "A0A6P7ULQ5": 1, "H3AFC1": 2, "A0A814MWY7": 1, "A0A7M3QUJ9": 1, "C5KXZ8": 1, "A0A1Y2DIJ2": 1, "A0A7R8ZNR6": 1, "A0A3C1RZF5": 1, "A0A4V6XW70": 1, "A0A669CUH2": 1, "A0A016T7J7": 1, "A0A1Y1M3Z3": 1, "A0A812LTJ5": 1, "A0A7J6NWD6": 1, "A0A812YF94": 1, "A0A177U5G8": 3, "A0A7M3QCM4": 1, "A0A833P9X4": 1, "A0A812U720": 1, "B7GCE1": 2, "A0A812Z0K7": 1, "A0A7S3PJB6": 2, "A0A7M3Q470": 1, "A0A811WF77": 1},
"4": {"A0A075AVM1": 1, "A0A6L2J4U7": 1, "A0A6L2NVQ2": 1, "A0A438IVD6": 1, "A0A151TJN2": 1, "A0A1Q3DX31": 1, "A0A484MYA7": 1, "A0A5B0P3I1": 1, "A0A7S2PGJ2": 1, "A0A6L2LGG4": 1, "A0A0K0ETN2": 1, "A0A498NPX3": 1, "A0A178U585": 3, "A0A6L2JSC2": 1, "A0A484LZ95": 2, "Q7XP10": 1, "R7QGR4": 1, "A0A4C1ZQC8": 2, "A0A4S4L2M5": 1, "A0A6H5HI70": 5, "A0A0J7KB27": 1, "A0A177U0N4": 1, "A0A7D9LFH4": 1},
"5": {"A0A1Y3N171": 1, "A0A5J4WSY7": 2, "A0A2P5WJH5": 1, "A0A6J5VDK5": 15, "A0A336N1I0": 2, "A0A4Y7K5M2": 9, "A0A0V0W1L7": 2, "A0A328DAL0": 2}
}It is possible to convert the JSON file into a TSV or CSV table using the Python script tool-kit/scripts/convert_annotation_file.py.
./convert_annotation_file.py -a [JSON file] -d [Delimiter, default is \t] -o [Output file]
he script uses only Python3 and does not require the installation of any additional modules.
This directory contains the output files generated by Diamond and MMseqs2.
| qseqid | qlen | qstart | qend | sseqid | slen | sstart | send | length | pident | ppos | score | evalue | bitscore |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MALV-I-01_sp_EP00398|sequence00001 | 475 | 1 | 475 | MALV-I-01_sp_EP00398|sequence00001 | 475 | 1 | 475 | 475 | 100 | 100 | 2448 | 0.0 | 947 |
| MALV-I-01_sp_EP00398|sequence00001 | 475 | 197 | 475 | MALV-II-16_sp_EP00396|sequence00239 | 290 | 16 | 268 | 279 | 56.6 | 71.3 | 812 | 5.11e-107 | 317 |
| MALV-I-01_sp_EP00398|sequence00001 | 475 | 128 | 207 | MALV-II-16_sp_EP00396|sequence00238 | 83 | 5 | 82 | 82 | 46.3 | 67.1 | 195 | 2.66e-19 | 79.7 |
| MALV-I-01_sp_EP00398|sequence00002 | 28 | 1 | 28 | MALV-I-01_sp_EP00398|sequence00002 | 28 | 1 | 28 | 28 | 100 | 100 | 134 | 1.52e-14 | 56.2 |
The Diamond BLASTp output file contains the following fields:
- qseqid: Query sequence identifier
- qlen: Length of the query sequence
- qstart: Start position of the query sequence in the alignment
- qend: End position of the query sequence in the alignment
- sseqid: Subject sequence identifier
- slen: Length of the subject sequence
- sstart: Start position of the subject sequence in the alignment
- send: End position of the subject sequence in the alignment
- length: Length of the alignment
- pident: Percentage of identical matches between query and subject sequences
- ppos: Percentage of positive matches between query and subject sequences
- score: Raw alignment score
- evalue: Expectation value (E-value) of the alignment
- bitscore: Normalized alignment score, representing the statistical significance
| qseqid | qlen | qstart | qend | sseqid | slen | sstart | send | length | pident | ppos | score | evalue | bitscore |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MALV-I-01_sp_EP00398 |sequence00004 | 305 | 157 | 266 | MALV-I_sp_EP00400 |sequence00851 | 557 | 360 | 471 | 114 | 25.4 | 50.0 | 102 | 1.10e-05 | 43.9 |
| MALV-I-01_sp_EP00398 |sequence00006 | 554 | 65 | 550 | MALV-II-16_sp_EP00396 |sequence00175 | 713 | 94 | 593 | 513 | 34.7 | 49.9 | 607 | 7.26e-71 | 238 |
| MALV-I-01_sp_EP00398 |sequence00007 | 863 | 339 | 433 | MALV-I-01_sp_EP00398 |sequence00434 | 919 | 397 | 472 | 95 | 32.6 | 48.4 | 100 | 1.06e-04 | 43.1 |
| MALV-I-01_sp_EP00398 |sequence00007 | 863 | 692 | 766 | MALV-I-01_sp_EP00398 |sequence00085 | 10025 | 661 | 713 | 75 | 33.3 | 48.0 | 938.31e-04 | 40.4 |
The Diamond BLASTp output file contains a single alignment for each pair of sequences and includes the following fields:
- qseqid: Query sequence identifier
- qlen: Length of the query sequence
- qstart: Start position of the query sequence in the alignment
- qend: End position of the query sequence in the alignment
- sseqid: Subject sequence identifier
- slen: Length of the subject sequence
- sstart: Start position of the subject sequence in the alignment
- send: End position of the subject sequence in the alignment
- length: Length of the alignment
- pident: Percentage of identical matches between query and subject sequences
- ppos: Percentage of positive matches between query and subject sequences
- score: Raw alignment score
- evalue: Expectation value (E-value) of the alignment
- bitscore: Normalized alignment score, representing the statistical significance
| query | target | fident | alnlen | mismatch | gapopen | qstart | qend | qlen | tstart | tend | tlen | evalue | bits |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MALV-I-01_sp_EP00398|sequence00001 | AFDB:AF-A0A812PGI6-F1 | 0.600 | 471 | 177 | 0 | 4 | 474 | 475 | 308 | 750 | 772 | 3.040E-171549 | |
| MALV-I-01_sp_EP00398|sequence00001 | AFDB:AF-A0A7S2CY24-F1 | 0.594 | 436 | 165 | 0 | 40 | 475 | 475 | 1 | 408 | 431 | 3.754E-155503 | |
| MALV-I-01_sp_EP00398|sequence00001 | AFDB:AF-A0A7J6MHE2-F1 | 0.524 | 472 | 214 | 0 | 4 | 475 | 475 | 1 | 451 | 470 | 4.516E-142465 | |
| MALV-I-01_sp_EP00398|sequence00001 | AFDB:AF-A0A0G4GLT6-F1 | 0.519 | 472 | 211 | 0 | 4 | 475 | 475 | 1 | 439 | 598 | 1.945E-140460 |
The output file of the MMseqs2 search against the AlphaFold clusters sequence database contains the following fields:
- query: Query sequence identifier
- target: Target sequence identifier (subject in the alignment)
- fident: Fraction of identical residues between the query and target sequences in the alignment
- alnlen: Length of the alignment (number of aligned residues)
- mismatch: Number of mismatched residues between the query and target sequences in the alignment
- gapopen: Number of gap openings in the alignment
- qstart: Start position of the query sequence in the alignment
- qend: End position of the query sequence in the alignment
- qlen: Length of the query sequence
- tstart: Start position of the target sequence in the alignment
- tend: End position of the target sequence in the alignment
- tlen: Length of the target sequence
- evalue: Expectation value (E-value), representing the statistical significance of the alignment
- bits: Normalized alignment score in bits (often used to assess the quality of the alignment)
| query | target | fident | alnlen | mismatch | gapopen | qstart | qend | qlen | tstart | tend | tlen | evalue | bits | coverageIndex | disparityIndex |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MALV-I-01_sp_EP00398|sequence00001 | AFDB:AF-A0A813LCP4-F1 | 0.646 | 297 | 96 | 0 | 179 | 475 | 475 | 2 | 272 | 288 | 7.850E-113380 | 0.7831176900584795 | 0.31570906432748536 | |
| MALV-I-01_sp_EP00398|sequence00033 | AFDB:AF-B6K6N4-F1 | 0.442 | 64 | 34 | 0 | 6 | 67 | 88 | 346 | 409 | 470 | 7.892E-0754 | 0.420357833655706 | 0.5683752417794972 | |
| MALV-I-01_sp_EP00398|sequence00065 | AFDB:AF-A0A812PAI8-F1 | 0.282 | 369 | 176 | 0 | 3 | 248 | 374 | 417 | 785 | 796 | 1.642E-36152 | 0.5606609249455835 | 0.19418617149920725 | |
| MALV-I-01_sp_EP00398|sequence00098 | AFDB:AF-A0A813J6D5-F1 | 0.487 | 590 | 294 | 0 | 4 | 593 | 730 | 127 | 700 | 867 | 1.018E-159528 | 0.735136117299458 | 0.1461661215654675 |
The output file of the MMseqs2 search against the AlphaFold clusters sequence database contains the following fields:
- query: Query sequence identifier
- target: Target sequence identifier (subject in the alignment)
- fident: Fraction of identical residues between the query and target sequences in the alignment
- alnlen: Length of the alignment (number of aligned residues)
- mismatch: Number of mismatched residues between the query and target sequences in the alignment
- gapopen: Number of gap openings in the alignment
- qstart: Start position of the query sequence in the alignment
- qend: End position of the query sequence in the alignment
- qlen: Length of the query sequence
- tstart: Start position of the target sequence in the alignment
- tend: End position of the target sequence in the alignment
- tlen: Length of the target sequence
- evalue: Expectation value (E-value), representing the statistical significance of the alignment
- bits: Normalized alignment score in bits (used to assess the quality of the alignment)
- coverageIndex: A measure of the overall coverage between the query and target sequences. Calculated as the average of query and subject sequence coverage.
- disparityIndex: Measures the balance of coverage between the query and target sequences. The closer to 0, the more balanced the coverage; the closer to 1, the more unbalanced.
| query | target | fident | alnlen | mismatch | gapopen | qstart | qend | qlen | tstart | tend | tlen | evalue | bits |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MALV-I-01_sp_EP00398|sequence00001 | PF00587.30 | 0.294 | 261 | 161 | 0 | 196 | 456 | 475 | 140 | 368 | 675 | 4.137E-27 | 116 |
| MALV-I-01_sp_EP00398|sequence00033 | PF00226.36 | 0.374 | 55 | 33 | 0 | 11 | 63 | 88 | 32 | 86 | 147 | 1.457E-04 | 38 |
| MALV-I-01_sp_EP00398|sequence00098 | PF18198.7 | 0.430 | 170 | 93 | 0 | 282 | 445 | 730 | 6 | 175 | 248 | 2.382E-31 | 132 |
| MALV-I-01_sp_EP00398|sequence00098 | PF03028.21 | 0.442 | 124 | 60 | 0 | 152 | 275 | 730 | 55 | 163 | 217 | 7.833E-22 | 101 |
The output file of the MMseqs2 search against the Pfam profile database contains the following fields:
- query: Query sequence identifier
- target: Target sequence identifier (subject in the alignment)
- fident: Fraction of identical residues between the query and target sequences in the alignment
- alnlen: Length of the alignment (number of aligned residues)
- mismatch: Number of mismatched residues between the query and target sequences in the alignment
- gapopen: Number of gap openings in the alignment
- qstart: Start position of the query sequence in the alignment
- qend: End position of the query sequence in the alignment
- qlen: Length of the query sequence
- tstart: Start position of the target sequence in the alignment
- tend: End position of the target sequence in the alignment
- tlen: Length of the target sequence
- evalue: Expectation value (E-value), representing the statistical significance of the alignment
- bits: Normalized alignment score in bits (used to assess the quality of the alignment)
This directory contains files specific to the clusters present in the various networks.
| cluster_id | alphafold_clusters | alphafold_sequences | gene3d | funfam | tmhmm | alphafold_pfam | pfamDB |
|---|---|---|---|---|---|---|---|
| 245 | A0A154P1T2;A0A7S3WA45 | A0A7X3ZJX2;A0A369S7U3 | G3DSA:1.25.40.20 | NA | NA | PF00023;PF12796;PF13637;PF13857 | PF12796;PF13637;PF13857 |
| 246 | A0A7J5C144;A0A7M7K2D1 | A0A2H6KD85;A0A1Y5ICJ0 | NA | NA | NA | NA | NA |
| 247 | A0A0G4HX23;A0A839LJV4 | A0A813JRF4;A0A0G4E9P6 | G3DSA:2.60.120.10;G3DSA:1.10.1300.10 | NA | NA | PF00520;PF02678;PF00233;PF05726 | PF02678;PF00233;PF10175;PF05726 |
| 248 | NA | NA | NA | NA | NA | NA | NA |
This TSV file contains the annotations associated with each cluster. The columns are as follows:
- cluster_id : Unique identifier for the cluster.
- alphafold_clusters : AlphaFold cluster identifiers associated with the sequences in the cluster.
- alphafold_sequences : Sequence identifiers from the AlphaFold database linked to the cluster.
- gene3d : Gene3D annotations linked to the sequences in the cluster.
- funfam : FunFam annotations linked to the sequences in the cluster.
- tmhmm : Transmembrane helix annotations linked to the sequences in the cluster.
- alphafold_pfam : Pfam annotations derived from AlphaFold sequences for the cluster.
- pfamDB : Pfam annotations from the Pfam database linked to the sequences in the cluster.
| cluster_id | cluster_size | diameter | alphafold_clusters_homogeneity_score | alphafold_clusters_sequence | alphafold_clusters_numbre_labels | alphafold_sequences_homogeneity_score | alphafold_sequences_sequence | alphafold_sequences_numbre_labels | gene3d_homogeneity_score | gene3d_sequence | gene3d_numbre_labels | funfam_homogeneity_score | funfam_sequence | funfam_numbre_labels | tmhmm_homogeneity_score | tmhmm_sequence | tmhmm_numbre_labels | alphafold_pfam_homogeneity_score | alphafold_pfam_sequence | alphafold_pfam_numbre_labels | pfamDB_homogeneity_score | pfamDB_sequence | pfamDB_numbre_labels |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 54 | 3 | 0.40740740740740744 | 54 | 32 | 0.2407407407407407 | 54 | 41 | 0.6666666666666667 | 52 | 18 | 0.40740740740740744 | 32 | 32 | 1 | 2 | 1 | 0.7222222222222222 | 54 | 15 | 0.6851851851851851 | 50 | 17 |
| 1 | 53 | 4 | 0.13207547169811318 | 50 | 46 | 0.13207547169811318 | 50 | 46 | 0.7358490566037736 | 50 | 14 | 0.8679245283018868 | 6 | 7 | 1 | 3 | 1 | 0.7169811320754718 | 46 | 15 | 0.7169811320754718 | 46 | 15 |
| 2 | 49 | 4 | 0.6938775510204082 | 49 | 15 | 0.6734693877551021 | 49 | 16 | NA | 0 | 0 | NA | 0 | 0 | 131 | 0.9591836734693877 | 13 | 2 | NA | 0 | 0 | ||
| 3 | 45 | 5 | 0.4666666666666667 | 33 | 24 | 0.4222222222222223 | 33 | 26 | 0.8666666666666667 | 16 | 6 | 1 | 211 | 3 | 1 | 0.7333333333333334 | 24 | 12 | 0.9333333333333333 | 2 | 3 |
TSV file containing the metrics of the clusters present in a network. One file per inflation parameter.
- cluster_id: Unique identifier for the cluster.
- cluster_size: Size of the cluster (number of sequences it contains).
- diameter: Diameter of the cluster, representing the shortest path between two sequences.
- alphafold_clusters_homogeneity_score: Homogeneity score for AlphaFold clusters, calculated from the number of unique AlphaFold cluster annotations in the cluster.
- alphafold_clusters_sequence: Number of sequences in the cluster linked to AlphaFold clusters.
- alphafold_clusters_number_labels: Number of unique AlphaFold cluster labels found in a cluster.
- alphafold_sequences_homogeneity_score: Homogeneity score for AlphaFold sequences, calculated from the number of unique AlphaFold sequence annotations in the cluster.
- alphafold_sequences_sequence: Number of sequences in the cluster linked to AlphaFold sequences.
- alphafold_sequences_number_labels: Number of unique AlphaFold sequence labels found in a cluster
- gene3d_homogeneity_score: Homogeneity score for Gene3D annotations, calculated from the number of unique Gene3D annotations in the cluster.
- gene3d_sequence: Number of sequences in the cluster linked to Gene3D annotations.
- gene3d_number_labels: Number of unique Gene3D labels found in a cluster.
- funfam_homogeneity_score: Homogeneity score for FunFam annotations, calculated from the number of unique FunFam annotations in the cluster.
- funfam_sequence: Number of sequences in the cluster linked to FunFam annotations.
- funfam_number_labels: Number of unique FunFam labels found in a cluster.
- tmhmm_homogeneity_score: Homogeneity score for TMHMM annotations, calculated from the number of unique TMHMM annotations in the cluster.
- tmhmm_sequence: Number of sequences in the cluster linked to TMHMM annotations.
- tmhmm_number_labels: Number of unique TMHMM labels found in a cluster.
- alphafold_pfam_homogeneity_score: Homogeneity score for AlphaFold Pfam annotations, calculated from the number of unique AlphaFold Pfam annotations in the cluster.
- alphafold_pfam_sequence: Number of sequences in the cluster linked to AlphaFold Pfam annotations.
- alphafold_pfam_number_labels: Number of unique AlphaFold Pfam labels found in a cluster.
- pfamDB_homogeneity_score: Homogeneity score for PfamDB annotations, calculated from the number of unique PfamDB annotations in the cluster.
- pfamDB_sequence: Number of sequences in the cluster linked to PfamDB annotations.
- pfamDB_number_labels: Number of unique PfamDB labels found in a cluster.
This directory contains files specific to the sequences present in the various networks.
| sequence_id | gene3d | alphafold_pfam | funfam | alphafold_clusters | pfamDB | tmhmm | alphafold_sequences |
|---|---|---|---|---|---|---|---|
| MALV-II-16_sp_EP00396 |sequence00182 | NA | NA | NA | NA | NA | NA | NA |
| MALV-II-16_sp_EP00396 |sequence00266 | G3DSA:3.40.50.300;G3DSA:3.40.50.1240 | PF00300;PF01591 | NA | A0A6A6FSM6 | PF00300;PF01591 | NA | A0A7S3C007 |
| MALV-I_sp_EP00400 |sequence01089 | G3DSA:3.40.50.300;G3DSA:3.40.50.1240 | PF00300;PF01591 | G3DSA:3.40.50.300:FF:000644 | A0A1Y3ENH8 | PF00300;PF01591 | NAA0A7S3JYS1 | |
| MALV-II-16_sp_EP00396 |sequence00273 | NA | PF04515 | NA | A0A7S2LD46 | PF04515 | TMhelix | A0A813I9P1 |
A TSV file containing all annotations associated with sequences in a network, with one file generated for each inflation parameter.
- sequence_id: Unique identifier for each sequence in the network.
- gene3d: Gene3D annotations linked to the sequence.
- alphafold_pfam: AlphaFold-Pfam annotations linked to the sequence.
- funfam: FunFam annotations linked to the sequence.
- alphafold_clusters: AlphaFold clusters annotations linked to the sequence.
- pfamDB: Pfam DB annotations linked to the sequence.
- tmhmm: TMHMM annotations linked to the sequence.
- alphafold_sequences: AlphaFold sequence annotations linked to the sequence.
| sequence_id | cluster_id | sequence_length | eigenvector_centrality | num_gene3d_id | num_alphafold_pfam_id | num_funfam_id | num_alphafold_clusters_id | num_pfamDB_id | num_tmhmm_id | num_alphafold_sequences_id |
|---|---|---|---|---|---|---|---|---|---|---|
| MALV-I-01_sp_EP00398|sequence00096 | 0 | 99 | 0.4071553450680936 | 1 | 5 | 0 | 1 | 1 | 0 | 1 |
| MALV-I-01_sp_EP00398|sequence00097 | 0 | 906 | 0.5067939313056563 | 4 | 5 | 4 | 1 | 3 | 0 | 1 |
| MALV-II-16_sp_EP00396|sequence00405 | 0 | 1326 | 0.6539145129685043 | 5 | 4 | 5 | 1 | 5 | 0 | 1 |
| MALV-I_sp_EP00400|sequence00152 | 0 | 668 | 0.5272095731360656 | 4 | 4 | 3 | 1 | 3 | 0 | 1 |
Here’s a markdown description for the TSV files with sequence metrics (length, centrality, etc.):
- sequence_id: Unique identifier for each sequence in the network.
- cluster_id: Identifier of the cluster to which the sequence belongs.
- sequence_length: Length (in amino acids) of the sequence.
- eigenvector_centrality: Eigenvector centrality value for the sequence, indicating its importance within the network.
- num_gene3d_id: Number of unique Gene3D annotations linked to the sequence.
- num_alphafold_pfam_id: Number of unique AlphaFold-Pfam annotations linked to the sequence.
- num_funfam_id: Number of unique FunFam annotations linked to the sequence.
- num_alphafold_clusters_id: Number of unique AlphaFold clusters linked to the sequence.
- num_pfamDB_id: Number of unique Pfam DB annotations linked to the sequence.
- num_tmhmm_id: Number of unique TMHMM annotations linked to the sequence.
- num_alphafold_sequences_id: Number of unique AlphaFold sequence identifiers linked to the sequence.
| qseqid | qlen | qstart | qend | sseqid | slen | sstart | send | length | pident | ppos | score | evalue | bitscore | cluster_id |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MALV-I-01_sp_EP00398|sequence00004 | 305 | 157 | 266 | MALV-I_sp_EP00400|sequence00851 | 557 | 360 | 471 | 114 | 25.4 | 50.0 | 102 | 1.10e-05 | 15.96 | 234 |
| MALV-I-01_sp_EP00398|sequence00006 | 554 | 65 | 550 | MALV-II-16_sp_EP00396|sequence00175 | 713 | 94 | 593 | 513 | 34.7 | 49.9 | 607 | 7.26e-71 | 17.6 | 235 |
| MALV-I-01_sp_EP00398|sequence00008 | 81 | 34 | 73 | MALV-I-01_sp_EP00398|sequence00853 | 602 | 552 | 591 | 40 | 35.0 | 60.0 | 758.91e-04 | 42.74 | 17 | |
| MALV-I-01_sp_EP00398|sequence00011 | 1061 | 238 | 773 | MALV-I-01_sp_EP00398|sequence00593 | 540 | 1 | 536 | 536 | 100 | 100 | 2590 | 0.0 | 48.74 | 45 |
TSV file with alignments used to reconstruct clusters in each network. One file per inflation parameter.
- qseqid: Query sequence identifier (ID of the query sequence in the alignment).
- qlen: Length of the query sequence (in amino acids).
- qstart: Start position of the alignment on the query sequence.
- qend: End position of the alignment on the query sequence.
- sseqid: Subject sequence identifier (ID of the subject sequence in the alignment).
- slen: Length of the subject sequence (in amino acids).
- sstart: Start position of the alignment on the subject sequence.
- send: End position of the alignment on the subject sequence.
- length: Length of the alignment.
- pident: Percentage of identical matches between the query and subject sequences.
- ppos: Percentage of positive matches (including identical and similar residues).
- score: Alignment score calculated by the alignment algorithm.
- evalue: E-value (expectation value) of the alignment, representing the number of hits expected by chance.
- bitscore: Bit score of the alignment, a measure of the alignment’s quality.
- cluster_id: Identifier of the cluster to which the aligned sequences belong.