Accurate prediction of single-cell DNA methylation states using deep learning #39

cgreene · 2016-08-05T14:58:30Z

Published: https://doi.org/10.1186/s13059-017-1189-z
Preprint: https://dx.doi.org/10.1101/055715

gwaybio · 2016-08-24T14:41:46Z

Very well written article predicting binary methylation status (0: hypo, 1: hypermethation) in single cell bisulfite sequencing experiments (scBS-seq). A secondary goal is to visualize the DNA motifs contributing to methylation status and to cellular methylation heterogeneity.

Biology

The authors use scBS-seq data from 32 mouse embryonic stem cells to build their deep network. The features of the network are described in detail and consist of DNA sequence elements and nearby methylation states of the target cells and other experimental cells. Since there is only between 20-40% coverage in scBS-seq experiments because of low DNA yields, models that can impute methylation states in missing regions are extremely important. The authors also show variable predictive performance of their model depending on sequence context of the target CpG (e.g. TSS, Exon, promoter, CpG Island, etc.)

Computational Aspects

There are three deep networks in the model, all of which are convolutional neural networks (CNNs) with one fully connected hidden convolutional layer consisting of max pooling and ReLU activations. Some aspects of the architecture were difficult to decipher (e.g. stride of convolution, feature map size).

DNA module
- Uses sequence elements +/- 250 bp from given CpG
  - The authors did test shorter sequence lengths and report decreased performance
  - It is unclear if larger, or more biologically informed windows would improve performance
- Convolution in 1 dimension - akin to detecting position-specific scoring matrices (PSSM)
CpG module
- Binary methylation state +/- 25 neighboring CpG in target cell and in other assayed cells
- Convolution in 2 dimensions taking into account other cells who may have target CpG measured
Fusion module
- Recieves the CNN output from both the DNA and CpG module
- Fully connected with one output node
  - Sigmoid activation on output layer to predict binary hat{y} = {0, 1}

Model trained with dropout, glorot initialized weights, Adam adaptive learning with early stopping. What is especially nice is the availability of all code used to implement the model.

Why we should include it in our review

Deep learning for epigenetics - I buy this one more than Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks #68
Deep learning in single cells
1. This data is huge and will only continue to grow - one area where deep learning could have a more profound impact
Produce nice interpretations/visualizations (PSSM motifs) for what the DNA module is actually learning in the convolution (with added interpretations of heterogeneity of single cells)
1. One example of overcoming a black box (although blackbox remains for CpG and Fusion module)

I am tagging the first author of the article @cangermueller to make sure I didn't miss anything and/or to add on to this summary.

agitter · 2016-08-24T14:54:15Z

@gwaygenomics What did they do to create the 2D input for the CpG module if the single cells are initially unordered? Did they create a cell-cell similarity matrix? This relates to the discussion in #79.

gwaybio · 2016-08-24T15:02:12Z

@agitter Yeah, I stared at this bit for a while - still not sure if I'm understanding correctly. From the supplement:

The methylation state and distance of observed neighbouring CpG sites are inputs to a 2d-convolutional layer. Importantly, this layer convolves each cell separately with the same convolutional filters to unlink the number of model parameters from the number of cells, which can be large.

it looks like the convolutions are only at the single cell level, but weights are shared across cells. This makes more sense since any structure across cells would be artificially imposed.

agitter · 2016-08-25T17:20:24Z

@gwaygenomics You're right, and that makes a lot more sense. They say:

A 2d-convolution layer convolves the CpG neighbourhood of cells t independently at every position 𝑖 by using filters 𝑤_f of dimension 1 x L x D and length 𝐿

There is still something interesting that they are doing with the distances between neighboring CpG sites that I need to look at further.

cangermueller · 2016-08-29T06:51:48Z

Hi guys,

sorry for the late reply, I was on traveling. I am happy to hear that you want to review DeepCpG.

I did not use a 2d convolutional kernel of size C x L to learn dependencies between C cells and L CpG sites, since here the information flow between cells would depend on the ordering of rows (=cells) in the input tensor. Instead, I used a 2d convolution with kernel size 1 x L to only learn dependencies between CpG sites. Dependencies between cells are learnt afterwards by fusion modules, i.e. hidden layers that are connected to the all output neurons of the CpG module and the DNA module. This is the same as scanning the CpG neighbourhood of cells with 1d convolutions, sharing their weights, and connecting the resulting hidden layers. However, this would be slower. Does this make sense?

Concerning point '1. DNA module': Prediction performance only increased slightly by using a window wider than +/- 250bp. As a trade-off between compute costs and performance, I therefore decided to use +/- 250bp.

Concerning point 3. ‘Why we should include it’: I tried to make the model interpretable by

Visualizing DNA motifs (weights of convolutional filters)
Correlating activations of convolutional filters with predicted CpG methylation states
Using learnt DNA motifs to predict cell-to-cell variability
Quantifying the influence of base-pair mutations and neighboring CpG sites by gradient back-propation

Let me know if I can help you with anything else.

Best,
Christof

cgreene · 2016-08-29T13:04:04Z

@cangermueller Thank you for providing context for your paper! Regarding point 1, what kind of computational costs would have been required to go to a larger window (say 1k bp)? Are there any practical concerns (e.g. the examples become somewhat more unique with a larger window and thus are more training data required)?

I could easily see some discussion of the computational costs associated with scaling these methods discussed in the review. If you want to pitch in on the full review (via #2 and #88) we'd love to get your perspective.

cangermueller · 2016-08-29T15:30:05Z

Twice the window size means twice as much GPU memory and compute time. The main concern is the memory bottleneck of GPUs. E.g. the cluster I used only had GPUs with 4GBs.

I'll have a look at the entire review.

agitter · 2016-08-29T16:03:12Z

In the quest for common themes across papers, note that the authors of #24 also wrote that memory was a limiting factor.

@cangermueller if you do decide you want to contribute more, I'd be interested in your thoughts on what topics weren't covered in your recent review #47. We all thought that was an excellent overview and aim to provide a different perspective here, as described in #2 and #88.

agitter · 2017-02-17T21:51:13Z

As noted in #244, this preprint was updated this month. I haven't checked the differences, but there was mention of updated code at https://github.com/cangermueller/deepcpg/

We may consider highlighting the software as one example of a project that provides good documentation, IPython notebook examples, pre-trained models ("model zoo"), etc.

cangermueller · 2017-02-18T16:42:29Z

The main differences are:

Different model archicture
- DNA module has two instead of one conv layer
- DNA module operates on 1001bp window instead of 501bp window
- CpG module is bidirectional GRU instead of CNN
Extended evaluation
- Five instead of two cell types, including human and mouse cells
- Comparision scBS-seq vs. scRRBS-seq
- Evaluation predicted mutation effects on known mQTLs
Results
- New model architecture more accurate
- Performance gain highest for scRRBS-seq profiled cells
- Predicted mutation effects higher for known mQTLs

As you noted, I have also refactored the code-base of DeepCpG, provided pre-trained models, and notebooks. However, it is not yet perfect. I am still extending the documentation and notebooks.

Let me know if anything is still unclear!

cangermueller · 2017-02-18T16:47:22Z

What is not mentioned in the manuscript: Batch-normalization yielded worse results, such that it is not used. I also evaluated a couple of different architecture for the DNA module, including convolutional - recurrent models, ResNets, and dilated convolutionals. However, I quite simple CNN with two conv layer and one FC layer with 128 units performed best.

agitter · 2017-02-18T19:57:41Z

@cangermueller thanks for updating us here. It sounds like some major improvements.

I really like the runnable examples and effort to make the software reusable.

agitter · 2017-04-11T10:27:17Z

I edited the original post with the DOI of the published version.

cangermueller · 2017-04-11T15:05:22Z

Thanks!

Closes manubot/rootstock#38

cgreene added the paper label Aug 5, 2016

gwaybio added the supervised label Aug 24, 2016

gwaybio added this to the Initial review of primary literature milestone Aug 24, 2016

agitter mentioned this issue Oct 21, 2016

Who is ready to start writing? #116

Closed

agitter added the study label Nov 8, 2016

agitter mentioned this issue Feb 17, 2017

Accurate prediction of single-cell DNA methylation states using deep learning #244

Closed

dhimmel added a commit to dhimmel/deep-review that referenced this issue Nov 3, 2017

Set print margins (greenelab#39)

4be985c

Closes manubot/rootstock#38

cgreene closed this as completed Mar 12, 2018

jlevy44 mentioned this issue Dec 4, 2018

Call for Methylation Section #942

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accurate prediction of single-cell DNA methylation states using deep learning #39

Accurate prediction of single-cell DNA methylation states using deep learning #39

cgreene commented Aug 5, 2016 •

edited by agitter

Loading

gwaybio commented Aug 24, 2016

agitter commented Aug 24, 2016

gwaybio commented Aug 24, 2016 •

edited

Loading

agitter commented Aug 25, 2016

cangermueller commented Aug 29, 2016

cgreene commented Aug 29, 2016

cangermueller commented Aug 29, 2016

agitter commented Aug 29, 2016

agitter commented Feb 17, 2017

cangermueller commented Feb 18, 2017

cangermueller commented Feb 18, 2017

agitter commented Feb 18, 2017

agitter commented Apr 11, 2017

cangermueller commented Apr 11, 2017

Accurate prediction of single-cell DNA methylation states using deep learning #39

Accurate prediction of single-cell DNA methylation states using deep learning #39

Comments

cgreene commented Aug 5, 2016 • edited by agitter Loading

gwaybio commented Aug 24, 2016

Biology

Computational Aspects

Why we should include it in our review

agitter commented Aug 24, 2016

gwaybio commented Aug 24, 2016 • edited Loading

agitter commented Aug 25, 2016

cangermueller commented Aug 29, 2016

cgreene commented Aug 29, 2016

cangermueller commented Aug 29, 2016

agitter commented Aug 29, 2016

agitter commented Feb 17, 2017

cangermueller commented Feb 18, 2017

cangermueller commented Feb 18, 2017

agitter commented Feb 18, 2017

agitter commented Apr 11, 2017

cangermueller commented Apr 11, 2017

cgreene commented Aug 5, 2016 •

edited by agitter

Loading

gwaybio commented Aug 24, 2016 •

edited

Loading