# Preliminary Quantitative Assessment of Viral Diversity Using Ecological Indices and Taxonomic Comparison of Viromes in Oligotrophic and Enriched Aquatic Environments of a Desert Oasis



---
---

## 1.Introduction
Viruses are fundamental components of microbial ecosystems, playing key roles in host-virus interactions, horizontal gene transfer, and biogeochemical cycles. Studying viral diversity across contrasting environments provides crucial insights into microbial dynamics and the impacts of environmental disturbances.

This project focuses on the Cuatro Ciénegas Basin, a unique oligotrophic desert oasis in northern Mexico, renowned for its endemic microbial biodiversity. Within the framework of a 32-day mesocosm experiment, two water samples were collected: JC1A (control pond, non-fertilized) and JP4D (fertilized pond, nutrient-enriched).

The primary research question addressed is: How does nutrient enrichment affect the taxonomic composition and diversity of viral communities in these aquatic ecosystems?

To answer this, we will first perform a detailed taxonomic comparison of the viromes using Kraken2 classification to identify qualitative differences in viral composition. Subsequently, a quantitative analysis employing established ecological diversity indices such as Shannon and Simpson, as well as dissimilarity measures like Bray-Curtis distance, will assess the complexity, heterogeneity, and compositional differences between viral communities.

This integrated approach aims to deepen our understanding of nutrient enrichment effects on aquatic virome structure and diversity, thereby contributing valuable insights into microbial responses to environmental perturbations in fragile ecosystems. The findings are expected to inform broader ecological and epidemiological perspectives relevant to the management of vulnerable microbial habitats.

---

## 2.Methodology
This study employs an integrated viral metagenomics approach to assess the impact of nutrient enrichment on the diversity and taxonomic composition of aquatic viromes in the oligotrophic Cuatro Ciénegas Basin.

1. Data Processing and Quality Control
Raw sequence data generated by high-throughput sequencing are subjected to stringent quality control using FastQC to detect and remove sequencing artifacts, technical biases, and low-quality reads, thereby ensuring the integrity of the analytical dataset.

2. Taxonomic Classification with Optimized Kraken2
A precise taxonomic classification is performed using Kraken2, leveraging a carefully curated viral database designed to maximize accuracy while minimizing memory consumption, tailored to the limited computational resources available (8 GB RAM). This step assigns each read to a specific viral taxon, producing comprehensive and reliable taxonomic profiles for each sample.

3. Comparative Analysis of Viral Composition
Taxonomic profiles are rigorously compared between control (JC1A) and nutrient-enriched (JP4D) conditions. Qualitative variations in viral community structure are explored through statistical analyses and graphical visualizations, highlighting microbial responses to environmental perturbations.

4. Quantification of Ecological Diversity
Alpha diversity of viral communities is assessed using robust ecological indices, including the Shannon index (measuring species richness and evenness) and the Simpson index (quantifying dominance and the likelihood that two randomly selected individuals belong to the same species). Additionally, Bray-Curtis dissimilarity is calculated to quantify compositional differences between the two environments, enabling nuanced interpretation of virome heterogeneity.

5. Technical Considerations and Reproducibility
Methodological choices account for technical constraints related to memory capacity, ensuring a reproducible, rigorous, and scientifically valid analysis. Results are interpreted cautiously, acknowledging potential limitations due to the reduced database size and the oligotrophic nature of the studied environment.

---

## 3. Results and Interpretation
#### **3.1 Taxonomic Composition of Viral Communities**
The comparative analysis of Kraken2 classifications revealed marked differences in viral taxon composition between the control sample (JC1A) and the nutrient-enriched sample (JP4D).
In JP4D, Sinsheimervirus phiX174 overwhelmingly dominated the community (exceeding 50% relative abundance), while in JC1A, BeAn 58058 virus was the most abundant taxon (nearly 40%).
Other taxa, such as Influenza B virus (B/Lee/1940), Enterobacteria phage P7, and Erwinia phage vB_EamM_Asesino, were also detected but in different proportions depending on the treatment.
These differences suggest that nutrient enrichment reshaped the viral assemblage, promoting the proliferation of specific dominant viruses in the enriched environment.

#### **3.2 Ecological Diversity Indices**
Alpha diversity metrics provided further insights into community complexity.
The Shannon index was higher in JP4D (3.581) compared to JC1A (2.400), indicating greater taxonomic richness and evenness in the nutrient-enriched condition.
Conversely, the Simpson index indicated slightly higher evenness in JC1A (0.803 vs. 0.733), reflecting the strong dominance of Sinsheimervirus phiX174 in JP4D despite overall higher diversity.
These results suggest that nutrient enrichment may expand viral diversity while simultaneously enabling certain taxa to dominate.

#### **3.3 Bray-Curtis Dissimilarity and Heatmap Visualization**
The Bray-Curtis dissimilarity between JC1A and JP4D was 0.8231, indicating substantial divergence in viral community composition.
The heatmap of relative abundances confirmed this pattern, visually highlighting pronounced differences in the prevalence of several viral taxa between samples.
This quantitative assessment reinforces the observation that nutrient inputs significantly reshape the taxonomic structure of aquatic viromes.

---

## **4. Discussion – Limitations and Perspectives**
This preliminary assessment sheds light on how nutrient enrichment can reshape viral community composition and diversity in oligotrophic aquatic environments. The observed increase in Shannon diversity, together with a high Bray-Curtis dissimilarity between control and fertilized conditions, suggests that nutrient inputs promote both greater richness and marked community turnover. The detection of dominant viral taxa specific to each condition underscores the dynamic nature of viral assemblages in response to environmental disturbances.

Nevertheless, several limitations must be acknowledged. First, taxonomic classification relied on a reduced viral database optimized to fit memory constraints (8 GB RAM), potentially leading to incomplete detection of rare or poorly characterized viruses. Second, the absence of biological replicates prevents robust statistical validation of the observed patterns. Third, sequencing depth and technical biases in library preparation could have influenced abundance estimates. Finally, no formal hypothesis testing (e.g., PERMANOVA or differential abundance analysis) was performed to assess statistical significance.

Despite these constraints, the workflow demonstrates a rigorous, reproducible approach to taxonomic profiling and diversity quantification in viral metagenomics. Future work could integrate larger reference databases, replicate sampling designs, and statistical hypothesis testing to confirm and refine these findings. Such extensions would strengthen our understanding of viral ecology in nutrient-enriched environments and enhance the methodological rigor expected in advanced bioinformatics training.

---

## 5. General Conclusion
This preliminary study provided an initial insight into the impact of nutrient enrichment on the diversity and taxonomic composition of viral communities in the unique oligotrophic ecosystem of Cuatro Ciénegas.

The combination of taxonomic classification tools (Kraken2), ecological diversity indices (Shannon, Simpson), and dissimilarity measures (Bray-Curtis) revealed that nutrient inputs are associated with:

- an increase in taxonomic richness,

- a marked reorganization of the relative abundances of several viral taxa,

- significant divergence of communities compared to the control condition.

These findings, although exploratory, illustrate the potential of a metagenomic approach to describe viral dynamics in response to environmental disturbances. They form a solid methodological foundation for further investigations, including more advanced statistical analyses and functional exploration of viral repertoires.

This approach reflects a rigorous scientific perspective and demonstrates the commitment to contributing to a better understanding of microbial interactions in vulnerable ecosystems, in alignment with the standards of an advanced training program in bioinformatics of infectious diseases.

---

## References
[1] Lee, Z. M. P., Steger, C. E., Corman, J. R., Neveu, M., Poret-Peterson, A. T., Souza, V., & Shade, A. (2017). Nutrient stoichiometry shapes microbial diversity in an oligotrophic desert oasis. Frontiers in Microbiology, 8, 1425. https://doi.org/10.3389/fmicb.2017.01425

[2] Peimbert, M., Alcaraz, L. D., Bonilla-Rosso, G., Olmedo-Álvarez, G., García-Oliva, F., Segovia, L., ... & Eguiarte, L. E. (2012). Comparative metagenomics of two microbial mats at Cuatro Ciénegas Basin II: community structure and composition in oligotrophic environments. Astrobiology, 12(7), 659–673. https://doi.org/10.1089/ast.2011.0690

[3] Desnues, C., Rodriguez-Brito, B., Rayhawk, S., Kelley, S., Tran, T., Haynes, M., ... & Rohwer, F. (2008). Biodiversity and biogeography of phages in modern stromatolites and thrombolites. Nature, 452(7185), 340–343. https://doi.org/10.1038/nature06735

[4] Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20, 257. https://doi.org/10.1186/s13059-019-1891-0

[5] Ondov, B. D., Bergman, N. H., & Phillippy, A. M. (2011). Interactive metagenomic visualization in a web browser. BMC Bioinformatics, 12, 385. https://doi.org/10.1186/1471-2105-12-385

[6] Zenodo. (2023). Cuatro Ciénegas metagenomic sequencing data [Data set]. https://zenodo.org/record/7871630

[7] Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

[8] Johns Hopkins University, Center for Computational Biology. (2019). Kraken2: Metagenomic classification. https://ccb.jhu.edu/software/kraken2/

[9] Ondov, B. (2016). Krona Tools for Metagenomic Visualization. https://github.com/marbl/Krona/wiki

<hr style="margin-top: 50px;">

<footer style="text-align: center; font-size: small; color: gray;">
  © 2025 Mbock Mbock Georges Christian 
  <br> This notebook was Independently prepared as part of my application for the MSc in Bioinformatics of Infectious Diseases and Pathogen Genomics at    <br> Stellenbosch University (African STARS Program).  
    <br>All analyses are preliminary and intended to demonstrate methodological skills and scientific reasoning.
</footer>

---