Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
70 lines (52 sloc) 3.98 KB

DELTA

a Distal Enhancer Locating Tool based on AdaBoost and shape features of chromatin modifications

Introduction

Accurate identification of DNA regulatory elements becomes an urgent need in the post-genomic era. Recent genome-wide chromatin states mapping efforts revealed that DNA elements are associated with characteristic chromatin modification signatures, based on which several approaches have been developed to predict transcriptional enhancers. However, their practical application is limited by incomplete extraction of chromatin features and model inconsistency for predicting enhancers across different cell types. To address these issues, we define a set of non-redundant shape features of histone modifications, which shows high consistency across cell types and can greatly reduce the feature dimension. Integrating shape features with a machine-learning algorithm AdaBoost, we developed an enhancer predicting method, DELTA (Distal Enhancer Locating Tool based on AdaBoost). We show that DELTA significantly outperforms current enhancer prediction methods in prediction accuracy on different datasets and can predict enhancers in one cell type using models trained in other cell types without loss of accuracy. Overall, our study presents a novel framework for accurately identifying enhancers from epigenetic data across multiple cell types.

Install

Please check the file 'INSTALL' in the distribution.

Usage

Usage: delta.py [-c chip_files] [-P promoter_loci] [-E enhancer_loci] [options]

Example: delta.py -c H3K4me1.bed,H3K4me3.bed,H3K27ac.bed -E p300.bed -P tss.bed -g hg19

--version
									Show program's version number and exit
-h, --help
									Show this help message and exit
-c CHIP_BEDS, --chip_bed=CHIP_BEDS
									ChIP-seq bed file of histone modifications
-E ENHANCER, --enhancer=ENHANCER
									BED file containing the enhancer loci
-P PROMOTER, --promoter=PROMOTER
									BED file containing the promoter loci
-R, --read
									Read existing training and predicting data instead of 
									generate from ChIP-seq (default: False)
-g GENOME, --genome=GENOME
									Genome assembly should be one of the followings: dm3, 
									mm9, hg17, hg18, hg19
-b BIN_SIZE, --bin_size=BIN_SIZE
									Length of dividing bins (default: 100)
-w WIN_SIZE, --window_size=WIN_SIZE
									Length of sliding window, should be integer times of 
									bin size (default: 2000)
--iteration_number=ITER_NUM
									Number of iteration for AdaBoost (default: 100)
--pvalue_threshold=P_THRES
									P-value threshold for enhancer prediction (default: 
									0.5)
-o OUTPUT, --output=OUTPUT
									Output file name (default output file is 
									"predicted_enhancer.bed")

Parameters

-c / --chip_bed

ChIP-seq files contain chromatin modifications mapping data. User should provide ChIP-seq files separated by comma, e.g. H3K4me1.bed,H3K4me3.bed,H3K27ac.bed.

The BED format is defined in "http://genome.ucsc.edu/FAQ/FAQformat#format1".

-R / --read

The "-R" option lets user read existing training and predicting data instead generate them from ChIP-seq files, which would be a time consuming process. WARNING: Use with care!!!, wrong training and predicting data could be load.

--pvalue_threshold

P-value threshold for enhancer prediction. User could adjust number of predictions by tuning this parameter.

Output files

1.predicted_enhancer.bed is a BED format file containing the predicted enhancers. User should be aware that if the step size is smaller than window size, the predicted enhancers may be redundant. uniq command should be used in this situation to remove repetitive predictions.

2.adaboost.R is a R script generated by delta.py for executing AdaBoost algorithm.

3.tmp_dir is a directory contains temporary files created by delta.py. It should not be removed until the entire training and prediction is done.

License

Source code of DELTA is freely available for academic use. For commercial license please contact Dr. Chenggang Zhang (zhangcg@bmi.ac.cn).