eCLIP Data Analyzer

This repository contains a series of python based modules to automate the analysis of sequencing data generated from eCLIP (enhanced UV crosslinking and immunoprecipitation) experiments. I highly recommend reading through this step-by-step manual carefully before you start analyzing your data. If you require a protocol for eCLIP library preparation please look at the Yeo Lab eCLIP library preparation protocol given in their original paper.

Requirements

The eCLIP data analyzer requires following tools to be installed for data analysis (see Step 2 of the User Guide).

FastQC : Quality assessment
Cutadapt : Adaptor trimming
UMI-tools : Extraction of UMIs and deduplication
TagDust2 : Removal of rRNA reads
STAR : Mapping reads to a given genome
QualiMap : Mapping quality assessment
SAMtools : Sorting and indexing of mapped reads
deepTools : Generate bigwig files for IGV visualization
Subread : Count features
MultiQC : Summarize logs
BedTools : Extraction of cross-linked nucleotides
PureCLIP : eCLIP-peak calling

We thank developers of these valueble tools!

Analysis process

All analyzed data will be saved to subdirectories inside the home directory where you have deposited the scripts directory. The pipeline first use FastQC to assess the quality of raw input files. Conventional 3'-eCLIP adaptors (i.e., based on the adaptor identity embedded in the file name) are removed from input sequence files using Cutadapt. Next, the pipeline utilize UMI-tools to extract 5' 10 nucleotide UMIs of R2 reads. These extracted UMIs are used for deduplication later. Subsequently, TagDust2 is being used to remove rRNA contaminants from individual datasets. A datamining module in the pipeline will summarize TagDust2 rRNA removal logs and deposit mined data onto a file named TagDust_summary.csv which will be saved onto a directory named summary_files. Subsequently, user defined genome and annotation files are downloaded and a reference genome is generated using STAR. Then, rRNA-depleted sequences are mapped to the given genome using STAR genome aligner and only uniquely mapped reads are kept. The pipeline use QualiMap to assess the quality of sequence alignment. The analysis scheme use SAMtools to coordinate sort and index alignment output files. Sorted bam files along with indices are then imported to UMI-tools to remove potential PCR duplicates. Then deduplicated bam files are used to generate (1) bigwig files for IGV visualization using deepTools, (2) count tables for user defined features using subread package featureCounts, (3) data related to crosslinking events and (4) peak calling output files using PureCLIP. Furthermore, the pipeline integrates MultiQC to generate summary files in an interactive manner. As mentioned above,output files of all these steps will be saved to subdirectories in your home directory.

Cross-link sites can be recovered using R2 reads of eCLIP data. Therefore, the pipeline allows analysis in two different modes: (1) Single-end mode where only R2 reads of the pair are used for analysis (we frequently use this mode for our analyzes) and (2) Paired-end mode where both R1/R2 reads are used at earlier steps of analysis, but extract only mapped R2 reads following deduplication.

Now that you know the general outline of the analysis process, go through the step-by-step guide given here to analyze your eCLIP data.

If you use eCLIP Data Analyzer, please cite: I-KCKT allows dissection-free RNA profiling of adult Drosophila intestinal progenitor cells

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
add_mat		add_mat
environment		environment
scripts		scripts
LICENSE		LICENSE
README.md		README.md
USERGUIDE.md		USERGUIDE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eCLIP Data Analyzer

Requirements

Analysis process

About

Releases 2

Packages

Languages

License

jkkbuddika/eCLIP-Data-Analyzer

Folders and files

Latest commit

History

Repository files navigation

eCLIP Data Analyzer

Requirements

Analysis process

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages