GitHub

A pipeline which could processing from raw fastq reads to peak analysis

First, before this pipeline in a server, make sure the required modules were installed. If not, running the following scripts for deploying.

mkdir install_packages

### install python anaconda 2.2.0
cd software/
wget https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda-2.2.0-Linux-x86_64.sh
bash Anaconda-2.2.0-Linux-x86_64.sh  # prefix=/path/for/anaconda
mv Anaconda-2.2.0-Linux-x86_64.sh install_packages

### install R 3.2.0
wget http://cran.r-project.org/src/base/R-3/R-3.2.0.tar.gz
tar -zxvf R-3.2.0.tar.gz
cd R-3.2.0
./configure --prefix ~/software/R-3.2.0
make
make install
cd ..
mv R-3.2.0.tar.gz install_packages

### install samtools 0.1.18
### using old version because the latest one could have somewhat trouble with 
### other software like tophat.
wget http://sourceforge.net/projects/samtools/files/samtools/0.1.18/samtools-0.1.18.tar.bz2
tar -jxvf samtools-0.1.18.tar.bz2
cd samtools-0.1.18
make
cd ..
mv samtools-0.1.18.tar.bz2 install_packages

### install bwa 0.7.5a
wget http://sourceforge.net/projects/bio-bwa/files/bwa-0.7.5a.tar.bz2
tar -jxvf bwa-0.7.5a.tar.bz2
cd bwa-0.7.5a
make
cd ..
mv bwa-0.7.5a.tar.bz2 install_packages

### install bedtools 2.24.0
git clone https://github.com/arq5x/bedtools2/
cd bedtools2
make

### install MACS2
wget https://pypi.python.org/packages/source/M/MACS2/MACS2-2.1.0.20150731.tar.gz
tar -zxvf MACS2-2.1.0.20150731.tar.gz
cd MACS2/
# sed -i 's/Ofast/O3/g' /data/Analysis/huboqiang/software/MACS/setup_w_cython.py if necessary
python setup_w_cython.py install

### install picard-tools
wget http://jaist.dl.sourceforge.net/project/picard/picard-tools/1.119/picard-tools-1.119.zip
unzip picard-tools-1.119.zip

### install tabix and bgzip
wget http://sourceforge.net/projects/samtools/files/tabix/tabix-0.2.6.tar.bz2
tar -jxvf tabix-0.2.6.tar.bz2
cd tabix-0.2.6
make
cd ..
mv tabix-0.2.6.tar.bz2 install_packages

### install igvtools
wget https://data.broadinstitute.org/igv/projects/downloads/igvtools_2.3.57.zip?
unzip igvtools_2.3.57.zip

### install idrcode and spp_package
wget http://www.broadinstitute.org/~anshul/softwareRepo/idrCode.tar.gz
wget http://www.broadinstitute.org/~anshul/softwareRepo/spp_package.tar.gz
tar -zxvf idrCode.tar.gz
tar -zxvf spp_package.tar.gz
cd spp_package
tar -zxvf spp_1.10.1.tar.gz

### install UCSC utilities
from http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/


### install required python modules:
pip install ngslib
pip install Mysql
pip install svgwrite
pip install seaborn
pip install pysam==0.8.3
pip install pybedtools==0.6.9
pip install sklearn

After that, download this script:

cd $PYTHONPATH  # path for put the python packages. path/to/anaconda/lib/python2.7/site-packages/ for default
git clone https://github.com/hubqoaing/ChIP

Secondly, go to the ./setting file, and change the following values to your own path:

self.Database            = "DIR/TO/DATABASE"                               #line 61
self.sftw_py             = "DIR/TO/SOFTWARE_EXE_FILE"  #line 77
self.sftw_pl             = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_java           = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_R              = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_MarkDup        = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_bwa            = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_samtools       = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_macs2          = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_macs14         = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_bedtools       = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_bgzip          = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_tabix          = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_igvtools       = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_batchIDR       = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_spp            = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_get_psudoCount = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_sort_bdg       = "DIR/TO/SOFTWARE_EXE_FILE"
self.sftw_ucsc_dir       = "DIR/TO/SOFTWARE_EXE_FILE"

Go to the analysis dictionary and copy the bin file here.

cd PATH/FOR/ANALYSIS   # go to 
copy $PYTHONPATH/ChIP/run_chipseq.py ./

Next, make the input files. You can download these files in UCSC or so on and then using own-scripts to merge the ERCC information, and generate files in this format.###TAB split

vim sample_input.xls
==> sample_input.xls <==
sample               stage   type        tissue      brief_name               merge_name             end_type   control
H3K27ac_mE105_brain  E105    H3K27ac     brain       H3K27ac_mE105_brain      H3K27ac_mE105_brain     SE        Input_mE105_brain
Input_mE105_brain1   E105    H3K27ac     brain       Input_mE105_brain1       Input_mE105_brain       SE        Input_mE105_brain
...

Notice that only NAME_FOR_RAW_FQ were required that this NAME should be the same as 00.0.raw_fq/NAME. NAME_FOR_PROCESSING will be the name for the rest analysis's results. NAME_FOR_READING will be the name for files in statinfo. stage and sample_group could be writen as anything. It was here only for make the downstream analysis easily.

Before running this pipeline, put the fastq reads in the ./00.0.raw_data dictionary.

mkdir 00.0.raw_data
for i in `tail -n +2 sample_input.xls | awk '{print $1}`
do
    mkdir 00.0.raw_data/$i && ln -s PATH/TO/RAW_DATA/$i/*gz 00.0.raw_data/$i
done

After that, running this pipeline:

python run_chipseq.py --ref YOUR_REF --TSS_genebody_up 5000 --TSS_genebody_down 5000 --TSS_promoter_up 5000  --TSS_promoter_down 5000 --Body_extbin_len 50 --Body_bincnt 100 --TSS_bin_len 1 --top_peak_idr 100000 sample_input.xls

Wait for the results. Notice if you have to run it in a cluster, please do not running this scripts directly. For example, if SGE system used, then:

Comments this command

        my_job.running_multi(cpu=8, is_debug = self.is_debug)

and using this command in modules in ./frame/*py

       my_job.running_SGE(vf="400m", maxjob=100, is_debug = self.is_debug)

Method for submit jobs in other system were still developing.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
bin		bin
frame		frame
settings		settings
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
add.sh		add.sh
run_chipseq.py		run_chipseq.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

huboqiang/ChIP

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages