Skip to content

Analysis of genomic changes induced in Acanthamoeba castellanii C3 following infection by Legionella pneumophila

License

Notifications You must be signed in to change notification settings

pythseq/Acastellanii_legionella_infection

 
 

Repository files navigation

Genomic changes in A. castellanii during infection of amoeba by L. pneumophila

DOI

Background

This repository contains the analysis of Acanthamoeba castellanii infection by Legionella pneumophila. We investigate how the host genome is remodelled during infection by an intracellular bacterium. To investigate these changes, we use Hi-C and RNAseq to measure both 3D changes in chromatin and gene expression changes. We use two biological replicates of uninfected A. castellanii (strain C3) and two infected replicates at 5h post infection.

A frozen copy of this repository and its output data are available for download at the corresponding Zenodo record.

Dependencies

The pipeline is written using snakemake and has the following dependencies:

  • python >= 3.7
  • conda >= 4.8
  • snakemake >= 5.5

Each rule is encapsulated in a conda environment where its dependencies are managed automatically. Fastq files containing the Hi-C and RNA-seq reads are also downloaded automatically from SRA. Input files (genomes, annotations, ...) are automatically downloaded from the corresponding Zenodo record.

Installation

You need to have a working conda installation on your machine and install snakemake (>=5.5) via pip or conda.

Usage

You can then run the pipeline with:

snakemake -j6 --use-conda

And the pipeline should fetch required packages and data as it runs.

Configuration

Some metadata files are provided with the pipeline to help understand the design and modify parameters. The following files may be of interest:

  • samples.tsv: Samples used in analyses and associated informations
  • units.tsv: sequencing libraries used in the pipeline, file paths for the reads and metadata
  • config.yaml: path to key files and general parameters to control results of the pipeline.
  • cluster_slurm.json: Cluster resource requirements in the event that the pipeline is run on a HPC with the SLURM scheduler. In that case, the following command should be used to run the pipeline instead:
    • snakemake --rerun-incomplete --use-conda --cluster-config cluster_slurm.json --cluster "sbatch -n {cluster.ntasks} -c {cluster.ncpus} --mem {cluster.mem} --qos {cluster.queue}" --jobs 30

Pipeline

The pipeline is subdivided into submodules relating to the processing and downstream analysis of Hi-C and RNAseq data. It starts from fastq files to generate Hi-C matrices and differential expression results. It also computes statistics and does pattern detection on Hi-C contact map to generate figures and tables which will be used by tailored analyses in jupyter notebooks.

Here is a visual summary of pipeline steps and their dependencies:

For a more detailed visual summary showing input/output files, see the filegraph

Analyses

Analyses are described in jupyter notebooks located in the docs/notebooks folder. These notebooks are numbered to reflect the logical order in which analyses should be done. They should be executed in that order as some will generate files for the next notebook.

  • Notebook: Statistical exploration of chromatin loop changes
  • Notebook: Visual exploration of global contact changes during infection
  • Notebook: Analysis of interchromosomal contacts changes
  • Notebook: Detection and overview of chromatin insulation domains
  • Notebook: Analysis of the relationship between expression and contacts changes during infection
  • Notebook: Analysis of gene coexpression versus contact changes using lifted-over expression data from Li et al. 2020

About

Analysis of genomic changes induced in Acanthamoeba castellanii C3 following infection by Legionella pneumophila

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 83.9%
  • R 16.1%