Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convolutional Networks on Graphs for Learning Molecular Fingerprints #52

Closed
agitter opened this issue Aug 5, 2016 · 3 comments
Closed

Comments

@agitter
Copy link
Collaborator

agitter commented Aug 5, 2016

http://arxiv.org/abs/1509.09292

At a glance: Related to virtual screening #45. Molecular fingerprints are one standard way to featurize chemical compounds for virtual screening. This paper adapts the standard fingerprinting algorithm by implementing it as a neural network and describes the advantages of doing so. Notably, it outputs real-valued fingerprints instead of binary fingerprints.

@agitter
Copy link
Collaborator Author

agitter commented Jan 2, 2017

This was presented at NIPS 2015 so I'll use that for the official reference instead of the arXiv version above. The reviews could provide useful context.

@swamidass
Copy link
Contributor

So re: #52, which is written by my friend Aparu. They have an interesting approach/architecture that should certainly be studied. Our group was excited about how they were able connect predictions back to structure too.

Respectfully, there are several problems with this study that severely bracket any conclusions that might be drawn. It is right that this paper was published in 2015, but I by 2017 or 2018 a paper like this should be unpublishable if reviewers are doing their jobs.

First, it appears that they are not doing the benchmarks correctly. At minimum they should have done Maximum Similarity using ECFP-like fingerprints and Tanimoto Similarity. Instead, they used the approach that I already noted is really substandard and no one uses (neural network or logistic regressor on fingerprint vector). Ideally they should also consider SVMs (with Tanimoto Kernel) and IRVs. Likewise they should have used standard datasets and reported results from other groups in a comparable way.

Second, they only validate on 3 datasets, and only one of them is bioactivity. Look at the IRV paper I published the same year. We validate on hundreds (I think over 1000) datasets! And we see robust improvement of our approach over other state of the art methods. This just too low of a sample size of make any conclusions about accuracy other than to point out there improvement over a poorly executed baseline method is quite low.

Third, the most exciting part about the paper was the ability to trace models back to specific substructures. However this turns out to be an illusion. This was done in a fairly ad hoc way, and it is not clear if the process can be automated and the specific weight cutoffs can be generalized or justified.

This is all important. I think it is possible that DL will produce real gains (over what has already been realized by IRVs, metabolism networks, etc.), but it far from clear if this is the case. We have a proliferation of new methods without clear evidence that they are actually improving accuracy right now. In fact, the data currently seems to suggest the opposite.

@swamidass
Copy link
Contributor

Looking at the reveiwer's comments, they lacked expertise in chemical informatics.

dhimmel added a commit to dhimmel/deep-review that referenced this issue Nov 3, 2017
@cgreene cgreene closed this as completed Mar 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants