/
model-template.yaml
87 lines (87 loc) · 3.29 KB
/
model-template.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
type: keras
args:
arch:
url: {{ args_arch_url }}
md5: {{ args_arch_md5 }}
weights:
url: {{ args_weights_url }}
md5: {{ args_weights_md5 }}
backend: tensorflow
image_dim_ordering: tf
info:
authors:
- name: Haoyang Zeng
github: haoyangz
contributors:
- name: Roman Kreuzhuber
github: krrome
name: CpGenie
version: 0.1
trained_on: "Samples from Chromosome 1-9 and 14-22 were used for training, 12-13 were used for hyper-parameter tuning and model selection, and the rest of the data were held-out for testing. In the analysis of variant-prediction, only meQTL and allele-specific methylation data from the held-out chromosome 10 and 11 were used. The 50 RRBS datasets of immortal cell lines, including GM12878, and the WGBS dataset of GM12878 were downloaded from ENCODE website (https://www.encodeproject.org/)."
doc: >
Abstract:
DNA methylation plays a crucial role in the establishment of tissue-specific
gene expression and the regulation of key biological processes. However, our
present inability to predict the effect of genome sequence variation on DNA
methylation precludes a comprehensive assessment of the consequences of
non-coding variation. We introduce CpGenie, a sequence-based framework that
learns a regulatory code of DNA methylation using a deep convolutional neural
network and uses this network to predict the impact of sequence variation on
proximal CpG site DNA methylation. CpGenie produces allele-specific DNA methylation
prediction with single-nucleotide sensitivity that enables accurate prediction
of methylation quantitative trait loci (meQTL). We demonstrate that CpGenie
prioritizes validated GWAS SNPs, and contributes to the prediction of
functional non-coding variants, including expression quantitative trait
loci (eQTL) and disease-associated mutations. CpGenie is publicly available to
assist in identifying and interpreting regulatory non-coding variants.
cite_as: https://doi.org/10.1093/nar/gkx177
license: Apache License v2
trained_on: RRBS (restricted representation bisulfite sequencing) data from ENCODE (https://www.encodeproject.org/)
training_procedure: RMSprop
tags:
- DNA methylation
default_dataloader:
defined_as: kipoiseq.dataloaders.SeqIntervalDl
default_args:
auto_resize_len: 1001
alphabet_axis: 0
dummy_axis: 1
dependencies:
conda:
- python=3.6.12
- h5py=2.10.0
- tensorflow=1.10.0
- keras=1.2.2
- pysam=0.15.3
- pip=20.2.4
schema:
inputs:
name: seq
special_type: DNASeq
shape: (4, 1, 1001)
doc: DNA sequence
associated_metadata: ranges
targets:
name: methylation_prob
shape: (2, )
doc: Methylated and Unmethylated probabilities
column_labels:
- methylation_prob
- unmethylation_prob
postprocessing:
variant_effects:
seq_input:
- seq
use_rc: True
scoring_functions:
- type: diff
default: true
- type: logit
default: true
{% if model == 'GM19239_ENCSR000DGH' %}
test:
expect:
url: https://s3.eu-central-1.amazonaws.com/kipoi-models/predictions/14f9bf4b49e21c7b31e8f6d6b9fc69ed88e25f43/CpGenie/{{ model }}/predictions.h5
md5: 3727f6763f5c8cc2f8a40889c84cf123
precision_decimal: 6
{% endif %}