Skip to content

Latest commit

 

History

History
73 lines (57 loc) · 4.68 KB

File metadata and controls

73 lines (57 loc) · 4.68 KB

logo

Simulating and estimating the effect of genetransfer on bacterial pangenomes

Note

  • Master Thesis Bioinformatics at the University of Tübingen
  • Thesis period: 01.12.2023 - 01.06.2024

Horizontal gene transfer (HGT) plays a significant role in shaping the genetic landscape of bacterial populations. In contrast to the more common vertical gene transfer, horizontal gene transfer allows the lateral exchange of genes. To study the impact of HGT on bacterial gene frequency spectra, we have extended existing mutation models within the open-source software msprime 1 2 by incorporating a gene gain and loss model using the Infinitely Many Genes model 3 approach. The ancestry and mutation simulation is then extended to support HGT events. Additionally, the model is adjusted to fix its otherwise random ancestry simulation to specified trees, which is essential for parameter estimation and fitting the simulation to real data. We then develop an innovative simulation-based testing framework to determine whether a gene frequency spectrum results from neutral evolution. Finally, this framework is validated, and real-world parameters are estimated using pangenome data.

Tip

A ready to use Jupyter Notebook with examples can be found here: example_usage.ipynb

Overview

The repository is structured as follows:

Filename Description
conda_env.yml Conda environment with all required software packages.
gene_model.py Main Code for the Gene Gain / Loss simulation.
gfs.py Utility function for analysing / modifying GFS.
hgt_mutations.py Extension of the msprime mutation simulation to support HGT.
hgt_sim_args.py Default simulation parameters.
hgt_simulation.py Extension of the msprime ancestry simulation to support HGT.
neutrality_test.py Neutrality test based on a $\chi^2$-like and direct approach.
optimisation.py Algorithm to fit the simulation to real world GFS.
example_usage.ipynb Jupyter Notebook with examples.
pangenome-gene-transfer-simulation.pdf Thesis
Dirname Description
data Simulated data and measurements.
gfs_analysis Impact of HGT and GC on the GFS of fixed trees.
minimal_site_count Impact of double gene gain events on the GFS.
panX Files generated by panX.
tex LaTeX source files.

License and Notes

Unless otherwise labelled this piece of software is published unter the GNU General Public License v3.0.

Permissions Conditions Limitations
✓ Commercial use Disclose source ✕ Liability
✓ Distribution License and copyright notice ✕ Warranty
✓ Modification Same license
✓ Patent use State changes
✓ Private use

Go to LICENSE.md to see the full version.

Logo

The logo is partially based on the output of tskit_arg_visualizer.

Footnotes

  1. https://tskit.dev/software/msprime.html

  2. Franz Baumdicker et al. "Efficient ancestry and mutation simulation with msprime 1.0". In: Genetics 220.3 (Dec. 2021). Ed. by S Browning.issn: 1943-2631. doi: 10.1093/genetics/iyab229. url: http://dx.doi.org/10.1093/genetics/iyab229

  3. Franz Baumdicker, Wolfgang R. Hess and Peter Pfaffelhuber. "The Infinitely Many Genes Model for the Distributed Genome of Bacteria". In: Genome Biology and Evolution 4.4 (2012), pp. 443–456. doi: 10.1093/gbe/evs016. url: http://dx.doi.org/10.1093/gbe/evs016