Predicting susceptibility to tuberculosis based on gene expression profiling in dendritic cells
John D. Blischak, Ludovic Tailleux, Marsha Myrthil, Cécile Charlois, Emmanuel Bergot, Aurélien Dinh, Gloria Morizot, Olivia Chény, Cassandre Von Platen, Jean-Louis Herrmann, Roland Brosch, Luis B. Barreiro, Yoav Gilad
Tuberculosis (TB) is a deadly infectious disease, which kills millions of people every year. The causative pathogen, Mycobac- terium tuberculosis (MTB), is estimated to have infected up to a third of the world’s population; however, only approximately 10% of infected healthy individuals progress to active TB. Despite evidence for heritability, it is not currently possible to predict who may develop TB. To explore approaches to classify susceptibility to TB, we infected with MTB dendritic cells (DCs) from putatively resistant individuals diagnosed with latent TB, and from susceptible individuals that had recovered from active TB. We measured gene expression levels in infected and non-infected cells and found hundreds of differentially expressed genes between susceptible and resistant individuals in the non-infected cells. We further found that genetic polymorphisms nearby the differentially expressed genes between susceptible and resistant individuals are more likely to be associated with TB susceptibility in published GWAS data. Lastly, we trained a classifier based on the gene expression levels in the non-infected cells, and demonstrated decent performance on our data and an independent data set. Overall, our promising results from this small study suggest that training a classifier on a larger cohort may enable us to accurately predict TB susceptibility.
- Final publication (PubMed)
- Preprint on bioRxiv
- GSE94116 - The raw FASTQ files as well as some supplementary data files are available via at the Gene Expression Omnibus. To download all the raw FASTQ files, run the script code/download-fastq-sra.sh.
For accessing summary data, please see data/.
For reading and/or re-building the manuscript, please see paper/.
For the pipeline to convert raw sequencing data (fastq) to gene counts, please see Snakefile.
For the analsysis that produced Fig. 1, please see code/main-limma.R.
For the analsysis that produced Fig. 3, please see code/main-classifier.R.
To recreate a similar* computational environment, follow these instructions to use the conda package manager. Note that this will only work on a Linux-64 based architecture.
- Download and install Miniconda (instructions)
- Download environment file. If you cloned the Git repo, you already have the environment file.
conda install anaconda-client; anaconda download jdblischak/tb-suscept
- Create conda environment:
conda env create -n tb-suscept --file environment.yaml
- Activate conda environment:
- To activate:
source activate tb-suscept
- To deactivate:
source deactivate tb-suscept
- To activate:
* I started this project before I started heavily relying on
conda to manage my computational environment to make it more
reproducible. The version of R on the cluster I used was R
3.2.5. Unfortunately the available R packages from the official
conda repositories, as well as community provided packages from
bioconda and conda-forge, are not available for R 3.2.5. Thus
environment.yaml creates an environment with R 3.3.2.
Please feel free to open an Issue or send me an email directly if you need any assistantance using the code or data in this repository.