Skip to content
This repository has been archived by the owner on May 7, 2020. It is now read-only.

j1angvei/CSATK2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSATK (ChIP-Seq Analysis Toolkit)

Warning

This project is deprecated!!!

How to compile

  1. Download and unzip the source code.
  2. Run ./gradlew jar(unix based OS) or gradlew.bat jar(windows, not tested).
  3. The jar file is located at ./build/libs/CSATK2-yyMMdd.jar.

Introduction

CSATK is Java written bio-informatics software, the main goal of CSATK is to make ChIP-Seq analysis easier and more efficient.

By integrating common used bio-informatics software and website such as FastQC, BWA, SAMTools, Qualimap, MACS2, Homer and Panther, CSATK can not only run a ChIP-Seq analysis pipeline, but also run single function or multiple functions in order.

How to make it work? All you have to is create a input.json config file to tell CSATK your raw FASTQ data and relevant genome information.

CSATK has following advantages:

  1. High efficiency. Once the analysis starts, ChIP-Seq pipeline can automatically start next step without further command.
  2. Low error rate. Since all commands are generated by CSATK, there won't be any rookie mistake.
  3. Multiple functions. CSATK can do alignment, quality control, peak calling and so on,due to its integrating with the bio-informatics software mentioned above.
  4. Easy to use. CSATK has both command line support and GUI support for some functions.

Usage

Tasks (run analysis pipeline, reset and backup data, print help, species, peak type information.):

CMD: java -jar CSATK.jar

-h,	print help information and usage
-p,	ChIP-Seq analysis pipeline
-f,	run function(s) in order
-s,	run solely function with arguments
-i,	(re)install all software
-r,	reset project to original state
-b,	backup all file of last analysis
-t,	print broad,narrow,mix peak type information
-c,	print species code information

Functions (run specific function or multiple functions in order):

CMD: java -jar CSATK.jar -f <function1,function2,...>

gi,	build genome index using bwa
qr,	do quality control of raw reads using FastQC
ar,	parse key information from raw reads' qc result
tm,	filter adapter and bad quality reads using Trimmomatic
qc,	do quality control of filtered reads using FastQC
ac,	parse key information from filtered reads' qc result
al,	align reads using bwa(BWA-MEM for > 70 bp, BWA-ALN for < 70bp)
cs,	convert sam file to bam file using SAMTools
sb,	sort converted bam files using SAMTools
qb,	do quality control of sorted bam file using QualiMap
rb,	remove PCR amplified reads from bam file using SAMTools
ub,	filter reads those mapping quality >30 using SAMTools
pc,	do peak calling using MACS2
pa,	do peak annotation using Homer
gl,	get annotated gene from annotation bed file
gp,	do GO & Pathway analysis using PANTHERDB.org
mt,	find motifs using Homer
fs,	count reads in all bam files using SAMTools flagstat
st,	do a statistics from output
ht,	plot the statistic in HTML format

Sole function (run function with the single CSATK.jar file, no other structure needed):

CMD: java -jar CSATK.jar -s [arg1] [arg2] ...

1) GO & Pathway analysis:
	gp,	do GO & Pathway analysis using PANTHERDB.org [species code] [gene list] [output]

Tutorial

How to download and use CSATK

  1. Download CSATK compressed file, CSATK2 release page;
  2. Unzip the .tgz file, run java -jar CSATK.jar -r to restore CSATK structure;
  3. Run java -jar CSATK.jar -i to install all relevant software(FastQC, BWA, SAMTools, etc.);
  4. Place your raw data under input folder, genome reference file and annotation under genome folder;
  5. Create your input.json file and put it under config folder (there is a template of input.json under the same folder);
  6. Start the ChIP-Seq pipeline by running java -jar CSATK.jar -p.

Tips:

When execute CSATK2.jar with java -jar CSATK2.jar, but a error message showed up like Error occurred during initialization of VM, this is because JVM can not get enough RAM.

To resolve this error, add -Xmx256M to execute the Jar file,for example java -Xmx256M CSATK2.jar. Beware this will limit CSATK to use only 256M RAM doing analysis. You can increase to 1024 or higher.As explained here:

-Xmxn
Specifies the maximum size, in bytes, of the memory allocation pool. This value must a multiple of 1024 greater than 2 MB. Append the letter k or K to indicate kilobytes, or m or M to indicate megabytes. The default value is chosen at runtime based on system configuration.

For server deployments, -Xms and -Xmx are often set to the same value.

Examples:

-Xmx83886080
-Xmx81920k
-Xmx80m 

How to create your own input.json file

If you are familiar with JSON format, you can create it very easily using VIM (Linux) or Notepad (Windows). And all you need to is modify the input.json template in config folder.

In case you have no idea what JSON format is or you just don't want to write the boilerplate, CSATK can help you create the input.json file, just double click the JAR file or run java -jar CSATK.jar.

Under any environment with X Display support(such as Windows, MACS or Ubuntu Desktop) and at least [JRE 8] is installed (http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html)(JDK 8 is the same since JDK contains JRE), you can start CSATK GUI to help you create input.json.

Here is a simple illustration of using CSATK GUI to create input.json:

  1. Launch CSATK GUI by double clicking CSATK.jar or running java -jar CSATK.jar from console(Notice that only X display support devices can open CSATK GUI).
  2. Add all genome items in the table (you may need to remove the sample genome item).

3. If you have many genome items which is very similar, you can copy it and then edit.

4. Add all experiment items in the table (remember to remove the sample item).

5. Edit the experiment item and check if something is wrong.

6. Click _Generate_ button, a **input.json** will be created under the same location where you put the CSATK.jar.

Tips

  1. If you met problem like Error occurred during initialization of VM, you can solve this by adding parameter-Xmx256M, for example java -Xmx256M -jar CSATK.jar.

  2. Since GO & Pathway needs network connection, you better check the computed nodes are available connecting to network.

License