Skip to content

ryanccj/MutaCLASH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MutaCLASH

Description

The MutaCLASH project is designed to detect the coordinates of Crosslink Induced Mutation Sites (CIMS) in NGS data. It provides a comprehensive analysis pipeline for identifying mutation sites and binding sites in hybrid-reads derived from CLASH or iCLIP experiments.

Features

  • Uses ChiRA to identify suitable hybrid-reads.
  • Detects mutation information using Bowtie2 and BWA.
  • Utilizes algorithms such as pirScan, miRanda, and RNAup to identify binding sites.
  • Generates visualizations of the distribution of mutations.

Usage

To run the MutaCLASH pipeline, execute the following command:

sh run.sh <input file> <regulator file> <transcript file> <tool> <algorithm> <abundance analysis type>
  • input file: NGS data in FASTQ format.
  • regulator file: regulator file in FASTA format.
  • transcript file: transcript file in FASTA format.
  • tool: Tool used to detect hybrid reads, which can be chira.
  • algorithm: Algorithm used to predict binding sites, which can be pirScan, miRanda, RNAup.
  • abundance analysis type: Method used to analyze abundance, which can be abu, region, site, up refers to "mRNA abundance" (check more details about this in pipeline/add_abundance/abu_data/), and 22G-RNA abundance (WAGO-1 IP) in "CLASH identified region", "pirScan binding site", "RNAup binding site". If this parameter is not specified, abundance analysis will not be executed.

After executing the command, the pipeline will run and complete all the necessary steps. Please refer to the examples we provided.

Preprocessing Configuration

By default, the built-in Trim Galore tool within CLASH Analyst is utilized for quality trimming (Q=30) and adapter trimming (auto detect), with a default length range of 17-70nt. To modify the preprocessing settings, please adjust the configuration in preprocess.conf accordingly.

Output

The output files are stored in the data/output/ directory. The directory contains the following files:

  • CSV file: Contains results with all information fields.
  • Figures: The final generated figures are stored in the figure/ subdirectory.
  • Logs: Records commands, and summarizes the quantity, proportion, and distribution of various mutations, which are stored in the log/ subdirectory.
  • Intermediate Files: The intermediate files generated during the analysis are stored in various formats (.csv, etc.) and can be found in their respective tool directories.

Figures

The output figures generated by the MutaCLASH pipeline include:

  • Score Distribution: Presents the quantity and trend of different scores.

  • Mutation Distribution: Provides information on the distribution of mutations, including deletions and substitutions.

  • Pairing Ratio: Calculates and analyzes the pairing ratios at both global and individual coordinates.
    In statistical testing, ** and * indicate significant differences, with U-test P<0.05 and 0.10, respectively.

  • Abundance Analysis: Performs abundance analysis, comparing wild-type samples and fold-change measurements.

  • Cumulative Distribution Function (CDF): Calculates and visualizes the cumulative distribution function.

Requirements

Running MutaCLASH require Linux or MacOS. Other Unix environments will probably work but have not been tested. Windows users can use Windows Subsystem for Linux.

To install some necessary tools and packages, execute the following command:

$ apt-get install -y samtools bowtie2
$ pip install -r requirements.txt
  • SAMtools >= 0.1.19
  • Bowtie2 >= 2.4.0
  • Python >= 3.5
  • bcbio-gff >= 0.6.9
  • biopython >= 1.76
  • cutadapt >= 2.10
  • matplotlib >= 2.2.2
  • numpy >= 1.12.1
  • pandas >= 0.23.0
  • pysam >= 0.20.0
  • scipy >= 1.1.0
  • seaborn >= 0.9
  • statannot = 0.2.3
  • tqdm >= 4.64.0
  • xlrd >= 1.2.0

Docker

If you have any concerns about environment setup, feel free to use the Docker version directly.

$ docker pull ryanccj/mutaclash
$ docker run -it ryanccj/mutaclash

Or you can choose to built from this project.

$ docker build -t <image_name> .
$ docker run -it <image_name>

LICENSE

Please refer to our MIT license.

About

Identify Crosslink Induced Mutation Sites (CIMS) in CLASH or iCLIP NGS data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors