The predictions for differents tissues and cancer with DeepG4 is available here
The code to generate the precision/recall curve is available here.
It’s based on Snakemake to manage the workflow and Docker to isolate the application and run it with the appropriate tool versions.
git clone https://github.com/morphos30/DeepG4ToolsComparison.git
cd DeepG4ToolsComparison
docker build . -t morphos30/g4docker -f Dockerfile/Dockerfile
docker run -it -v $(pwd):/DeepG4ToolsComparison morphos30/g4docker /bin/bash
Where $(pwd)
is the working directory of DeepG4ToolsComparison
on
your computer.
cd /DeepG4ToolsComparison
snakemake --use-conda -j 30
You have to set the option --use-conda
in order to install and run
each tool in its proper environment.
- DNA sequences into bed format, split into positive set and negative set, written into the bed directory.
Note : if you want add a new dataset, edit the Snakefile
file and
add the bed files in the dictionnary EXPERIMENTS
, without the .bed
extension. Example :
TestSet_Peaks_BG4_G4seq_HaCaT_GSE76688_hg19_201b_Ctrl_gkmSVM_0.8_42_Ctrl_gkmSVM.bed
TestSet_Peaks_BG4_G4seq_HaCaT_GSE76688_hg19_201b_Ctrl_gkmSVM_0.8_42.bed
EXPERIMENTS = {
"TestSet_Peaks_BG4_G4seq_HaCaT_GSE76688_hg19_201b_Ctrl_gkmSVM_0.8_42_Ctrl_gkmSVM":{"CTRL":"TestSet_Peaks_BG4_G4seq_HaCaT_GSE76688_hg19_201b_Ctrl_gkmSVM_0.8_42_Ctrl_gkmSVM","EXP":"TestSet_Peaks_BG4_G4seq_HaCaT_GSE76688_hg19_201b_Ctrl_gkmSVM_0.8_42"}
}
Where CTRL
is the negative set and EXP
is the positive set.
- DNA Accessibility (ATAC-seq/DNAse-seq/MNase-seq) in bigwig format or
directly the averaged value for each sequence in a
one-column tsv
file.
ATACFILE = {
"TestSet_Peaks_BG4_G4seq_HaCaT_GSE76688_hg19_201b_Ctrl_gkmSVM":["ATAC_entinostat_mean.bw"]
}
or one-column tsv
file in
fasta/{Experiment_name}/{Experiment_name}_atac_merged.tsv
. Example :
fasta/TestSet_Peaks_BG4_G4seq_HaCaT_GSE76688_hg19_201b_Ctrl_gkmSVM/TestSet_Peaks_BG4_G4seq_HaCaT_GSE76688_hg19_201b_Ctrl_gkmSVM_atac_merged.tsv
head TestSet_Peaks_BG4_G4seq_HaCaT_GSE76688_hg19_201b_Ctrl_gkmSVM_atac_merged.tsv
0.01628741641898675
0.028752257447422012
0.028878783223623482
0.055516399884055316
0.02825982069785745
0.03582923041809851
0.023904436394151577
0.07724288611280039
0.01740800116454673
0.05779605688479145
Outputs | Tools | Methods |
---|---|---|
ATACDeepG4_ATACnormBG | ATACDeepG4 | DeepG4 using accessibily (DeepG4 in paper) |
ATACDeepG4_classictuningOH5 | ATACDeepG4 | DeepG4 without accessibility (DeepG4* in paper) |
penguinn_retrained | penguinn | penguinn using custom model trained on BG4G4seq dataset |
penguinn | penguinn | penguinn using default model |
G4detector_retrained | G4detector | G4detector using custom model trained on BG4G4seq dataset |
G4detector | G4detector | G4detector using default model |
quadron_retrained | quadron | quadron using custom model trained on BG4G4seq dataset |
quadron_score | quadron | quadron using default model |