Skip to content

University of Cambridge PhD thesis

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
license.md
Notifications You must be signed in to change notification settings

rargelaguet/thesis

Repository files navigation

PhD thesis: statistical methods for the integrative analysis of single-cell multi-omics data

  • Author: Ricard Argelaguet
  • Supervisors: John Marioni and Oliver Stegle

Download

Introduction

Single-cell profiling techniques have provided an unprecedented opportunity to study cellular heterogeneity at the molecular level. This represents a remarkable advance over traditional bulk sequencing methods, particularly to study lineage diversification and cell fate commitment events in heterogeneous biological processes. While the large majority of single-cell studies are focused on quantifying RNA expression, transcriptomic readouts provide only a single dimension of cellular heterogeneity. Recently, technological advances have enabled multiple biological layers to be probed in parallel one cell at a time, unveiling a powerful approach for investigating multiple dimensions of cellular heterogeneity. However, the increasing availability of multi-modal data sets needs to be accompanied by the development of suitable integrative strategies to fully exploit the data generated. In this thesis I worked in collaboration with different research groups to introduce innovative experimental and computational strategies for the integrative study of multi-omics at single-cell resolution.

Contributions

The first contribution is the development of scNMT-seq, a protocol for the simultaneous profiling of RNA expression, DNA methylation and chromatin accessibility in single cells. We demonstrate how this assay provides a powerful approach for investigating regulatory relationships between the epigenome and the transcriptome within individual cells. Paper and github repository (outdated)

The second contribution is Multi-Omics Factor Analysis (MOFA), a statistical framework for the unsupervised integration of multi-omics data sets. MOFA is a Bayesian latent variable model that can be viewed as a statistically rigorous generalization of Principal Component Analysis to multi-omics data. The method provides a principled approach to retrieve, in an unsupervised manner, the underlying sources of sample heterogeneity while at the same time disentangling which axes of heterogeneity are shared across multiple modalities and which are specific to individual data modalities. Paper and webpage

The third contribution is the generation of a comprehensive molecular roadmap of mouse gastrulation at single-cell resolution using scNMT-seq. We employed scNMT-seq to simultaneously profile RNA expression, DNA methylation and chromatin accessibility for hundreds of cells, spanning multiple time points from the exit from pluripotency to primary germ layer specification. Using MOFA, and other tools, we performed an integrative analysis of the multi-modal measurements, revealing novel insights into the role of the epigenome in regulating this key developmental process. Paper and github repository

The fourth contribution is MOFA+, an extended formulation of the MOFA model tailored to the analysis of large-scale single-cell data with complex experimental designs. We extended the model to incorporate a flexible regularisation that enables the joint analysis of multiple omics as well as multiple sample groups (batches and/or experimental conditions). In addition, We implemented a GPU-accelerated stochastic variational inference framework, thus enabling the scalable analysis of potentially millions of samples. Paper and webpage

License

http://creativecommons.org/licenses/by/4.0/

Contact

ricard[dot]argelaguet[at]gmail.com

Acknowledgements

This thesis was written using the template created by Krishna Kumar

About

University of Cambridge PhD thesis

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
license.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages