-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accurate prediction of single-cell DNA methylation states using deep learning #39
Comments
Very well written article predicting binary methylation status (0: hypo, 1: hypermethation) in single cell bisulfite sequencing experiments (scBS-seq). A secondary goal is to visualize the DNA motifs contributing to methylation status and to cellular methylation heterogeneity. BiologyThe authors use scBS-seq data from 32 mouse embryonic stem cells to build their deep network. The features of the network are described in detail and consist of DNA sequence elements and nearby methylation states of the target cells and other experimental cells. Since there is only between 20-40% coverage in scBS-seq experiments because of low DNA yields, models that can impute methylation states in missing regions are extremely important. The authors also show variable predictive performance of their model depending on sequence context of the target CpG (e.g. TSS, Exon, promoter, CpG Island, etc.) Computational AspectsThere are three deep networks in the model, all of which are convolutional neural networks (CNNs) with one fully connected hidden convolutional layer consisting of max pooling and ReLU activations. Some aspects of the architecture were difficult to decipher (e.g. stride of convolution, feature map size).
Model trained with dropout, glorot initialized weights, Adam adaptive learning with early stopping. What is especially nice is the availability of all code used to implement the model. Why we should include it in our review
I am tagging the first author of the article @cangermueller to make sure I didn't miss anything and/or to add on to this summary. |
@gwaygenomics What did they do to create the 2D input for the CpG module if the single cells are initially unordered? Did they create a cell-cell similarity matrix? This relates to the discussion in #79. |
@agitter Yeah, I stared at this bit for a while - still not sure if I'm understanding correctly. From the supplement:
it looks like the convolutions are only at the single cell level, but weights are shared across cells. This makes more sense since any structure across cells would be artificially imposed. |
@gwaygenomics You're right, and that makes a lot more sense. They say:
There is still something interesting that they are doing with the distances between neighboring CpG sites that I need to look at further. |
Hi guys, sorry for the late reply, I was on traveling. I am happy to hear that you want to review DeepCpG. I did not use a 2d convolutional kernel of size C x L to learn dependencies between C cells and L CpG sites, since here the information flow between cells would depend on the ordering of rows (=cells) in the input tensor. Instead, I used a 2d convolution with kernel size 1 x L to only learn dependencies between CpG sites. Dependencies between cells are learnt afterwards by fusion modules, i.e. hidden layers that are connected to the all output neurons of the CpG module and the DNA module. This is the same as scanning the CpG neighbourhood of cells with 1d convolutions, sharing their weights, and connecting the resulting hidden layers. However, this would be slower. Does this make sense? Concerning point '1. DNA module': Prediction performance only increased slightly by using a window wider than +/- 250bp. As a trade-off between compute costs and performance, I therefore decided to use +/- 250bp. Concerning point 3. ‘Why we should include it’: I tried to make the model interpretable by
Let me know if I can help you with anything else. Best, |
@cangermueller Thank you for providing context for your paper! Regarding point 1, what kind of computational costs would have been required to go to a larger window (say 1k bp)? Are there any practical concerns (e.g. the examples become somewhat more unique with a larger window and thus are more training data required)? I could easily see some discussion of the computational costs associated with scaling these methods discussed in the review. If you want to pitch in on the full review (via #2 and #88) we'd love to get your perspective. |
Twice the window size means twice as much GPU memory and compute time. The main concern is the memory bottleneck of GPUs. E.g. the cluster I used only had GPUs with 4GBs. I'll have a look at the entire review. |
In the quest for common themes across papers, note that the authors of #24 also wrote that memory was a limiting factor. @cangermueller if you do decide you want to contribute more, I'd be interested in your thoughts on what topics weren't covered in your recent review #47. We all thought that was an excellent overview and aim to provide a different perspective here, as described in #2 and #88. |
As noted in #244, this preprint was updated this month. I haven't checked the differences, but there was mention of updated code at https://github.com/cangermueller/deepcpg/ We may consider highlighting the software as one example of a project that provides good documentation, IPython notebook examples, pre-trained models ("model zoo"), etc. |
The main differences are:
As you noted, I have also refactored the code-base of DeepCpG, provided pre-trained models, and notebooks. However, it is not yet perfect. I am still extending the documentation and notebooks. Let me know if anything is still unclear! |
What is not mentioned in the manuscript: Batch-normalization yielded worse results, such that it is not used. I also evaluated a couple of different architecture for the DNA module, including convolutional - recurrent models, ResNets, and dilated convolutionals. However, I quite simple CNN with two conv layer and one FC layer with 128 units performed best. |
@cangermueller thanks for updating us here. It sounds like some major improvements. I really like the runnable examples and effort to make the software reusable. |
I edited the original post with the DOI of the published version. |
Thanks! |
Published: https://doi.org/10.1186/s13059-017-1189-z
Preprint: https://dx.doi.org/10.1101/055715
The text was updated successfully, but these errors were encountered: