RNA-VC

Overview

RNA-VC is a pipeline that can analyse any RNA sequencing (RNA-seq) raw data available in the Gene Expression Omnibus (GEO) and the Sequence Read Archive (SRA), yielding both variant calling data and gene/transcript expression data. It can also be run without the variant calling, if you are only interested in expression data. It is written with a combination of bash and python, all wrapped up in the snakemake workflow manager system.

RNA-VC differentiates itself from the other available RNA-seq variant calling pipelines in that you do not need to have the raw data downloaded before you start; RNA-VC takes care of that for you. All you need is a metadata file with the specified samples you want to analyse.

RNA-VC is something I created for my own use in order to automate analyses of publicly available RNA-seq. I share it here on GitHub on the off-chance that some other researcher would have a use for it, but I don't guarantee that it'll work for you. That being said, if you want to use RNA-VC and are having trouble getting to it run properly I'd be happy to help!

Usage

There are a number of bioinformatic software packages that need to be installed in order to run the pipeline:

Python 3 and Snakemake for running the pipeline
SRAtools for downloading raw FASTQ files
Salmon for estimating expression levels
STAR for aligning the reads
SAMtools for working with the resulting alignment files
PICARD for marking duplicate reads
GATK for performing variant calling
snpEFF for annotating the resulting variants

You must have all of these installed if you are to run the pipeline (if you are using a computer cluster they might already be installed, if you're lucky). After you have made sure they are all installed, you can clone this repository to get all the relevant files:

git clone https://github.com/fasterius/RNA-VC

You also need to provide RNA-VC with the metadata describing the GEO samples you want to analyse. This means that you need to provide, at the very least, SRR IDs, their corresponding GSE IDs, and their read layouts (listing them either as "SINGLE" or "PAIRED", respectively). You may optionally add a column for sample groups, which will be merged at the variant calling stage. This is useful for when you have sample replicates that you want to treat as a group, which is common for variant calling procedures. Such a metadata file might look something like this:

GSE	SRR	Layout	Group
GSE81194	SRR3479755	PAIRED	Sample_1
GSE81194	SRR3479758	PAIRED	Sample_2
GSE81194	SRR3479759	PAIRED	Sample_2

The config.yaml file provided can then be edited according to the structure of your metadata file, in addition to the locations of the references, indexes software paths required. I have provided an example of what this config file may look like, but you need to edit the paths to correspond to your directory structure before you can use it. You finally run the pipeline using snakemake:

snakemake --config LAYOUT=PAIRED

If you have mixed read layouts you have to run the pipeline twice: once for each read layout. Alternatively, if you are running the pipeline on a cluster (RNA-VC currently only supports SLURM) you can use the submit_snakemake.sh wrapper, which will automatically loop through both read layouts:

bash submit_snakemake.sh

If you do run it on a cluster, I have also provided an example of a cluster.yaml configuration file, which you will also need to edit. I have provided the example for the SLURM system, as that is what I am using; if you want to use some other cluster configuration, please see the Snakemake website for more information on how to do it.

License

The pipeline is available with a MIT licence. It is free software: you may redistribute it and/or modify it under the terms of the MIT license. For more information, please see the LICENCE file.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
scripts		scripts
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
Snakefile		Snakefile
example_cluster.yaml		example_cluster.yaml
example_config.yaml		example_config.yaml
submit_snakemake.sh		submit_snakemake.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA-VC

Overview

Usage

License

About

Releases

Packages

Languages

License

fasterius/RNA-VC

Folders and files

Latest commit

History

Repository files navigation

RNA-VC

Overview

Usage

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages