-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology #213
Comments
Interesting study that uses binarized chemical compound vectors of length 166 (that look like this) combined with dosage concentration data to generate new compounds that may help prioritize candidate small molecules that treat cancer patients. Biological Aspects
Computational Aspects
Why we should include it in our review
This is not my field of expertise, but I am interested in adversarial methods so I gave this paper a thorough read. However, the methods, results, and evaluation remain a bit unclear to me. Another really nice thing about this paper is the availability of source code (https://github.com/spoilt333/onco-aae). Perhaps @spoilt333 can help to clarify some of my confusion. I outlined my understanding above, but a couple of points remain:
Overall, I thought the paper elegantly laid out the problem of very high drug development failure rate and the evolution of computational methods for compound prioritization. They also apply a promising approach that appears to be working at first glance. I think it would be great to see this approach work really well as it appears to be a very promising approach for drug development and drug repurposing. However, I think that given my concerns perhaps it is not suitable for this review. Maybe we could talk about the idea of the approach in the discussion - I am not sure. |
Hello there.
I'll try to answer the points
1. Actually, GI neuron was trained jointly with rest latent neurons as
predictor of "efficiency" of drug. But, after training, it was used as
tuner for generating new drugs. Latent layer is a kind of noise and GI is a
condition for Decoder net, and both used to produce output.
2. There was no reason to pick exactly 640 samples, but we had to chose
some:) As output layer has sigmoid activation we treat it as probability of
presence of corresponding bit in compound code. So, "similarity" was just a
likelihood of a compound to be sampled from generated vector.
3. Discriminator is a standard GANs part. In fact, it is a binary
classifier which tries to determine was sample came from some "true"
distribution or it was generated by NN. In our case, true distribution was
Gaussian, and false came from Encoder.
4. It is really a big point and we are going to make it more clear in next
paper. There is few ideas
Most important hyperparameter is a latent layer size IMO. We did
experiments with different sizes and there was a problem with big - we
couldn't able to make generator converge, but for a few neurons it
converges well. We have no answer to why does it behave in such way, but
since the paper was published code was evolved a lot and we are going to
try again.
We also had few experiments with different depths of autoencoder, but
finally chose the same number as in original AAE paper - 2 fully connected
layers for Encoder and for Decoder.
About exact error numbers we should ask my co-author Kuzma Khrabrov. Hi is
in copy with Alex Zhavoronkov.
2017-02-02 18:58 GMT+03:00 Greg Way <notifications@github.com>:
… Interesting study that uses binarized chemical compound vectors of length
166 (that look like this
<http://www.nature.com/nprot/journal/v9/n9/fig_tab/nprot.2014.151_F2.html>)
combined with dosage concentration data to generate new compounds that may
help prioritize candidate small molecules that treat cancer patients.
Biological Aspects
- Chemical compounds with dosage information as input
- Also included is the chemical's corresponding growth inhibition in a
breast cancer cell line (MCF-7)
Computational Aspects
- adversarial autoencoder <https://arxiv.org/abs/1511.05644> that
encodes input binarized chemical compound vectors into a length 5 latent
layer
- 2 layer encoder to learn how the molecular fingerprint impacts
growth inhibition
- The latent layer can thereby represent a vector of how well the
corresponding fingerprint impacts MCF-7 growth
- 2 layer decoder for reconstruction
- The adversarial training comes in as the authors sample from a
learned prior distribution
- The sampled length 5 vector from the prior is then run through a
discriminator to detect real latent vectors from fake
- Growth inhibition is sampled from a normal distribution with
mean=5 and variance=1 independently from the prior
- Once the model is trained, the sampled latent vector is decoded to
output an artificial molecular fingerprint with a corresponding drug
concentration
- This artificial fingerprint is compared against a reference of 72
million compounds from pubchem <https://pubchem.ncbi.nlm.nih.gov/>
- The authors then selected the top 10 most similar compounds to
their predicted compounds if the decoded log concentration was less than
-5.0 molar
Why we should include it in our review
I am not entirely sure if we should consider this paper for our review.
This is not my field of expertise, but I am interested in adversarial
methods so I gave this paper a thorough read. However, the methods,
results, and evaluation remain a bit unclear to me. Another really nice
thing about this paper is the availability of source code (
https://github.com/spoilt333/onco-aae). Perhaps @spoilt333
<https://github.com/spoilt333> can help to clarify some of my confusion.
I outlined my understanding above, but a couple of points remain:
1. Why was the growth inhibition (GI) sampled independently?
- it seems to me that this is a critical component of the model and if
the GI is high, then the drug is considered effective. Isn't this
artificial sampling decoupled from the learning process?
1. Why did the authors choose to sample 640 vectors and how did they
exactly determine similar compounds from pubchem?
2. What is the discriminator? Is it using some sort of density metric
or KL divergence as compared to the latent distribution?
3. There is no discussion on how the model is training and if it is
actually learning something meaningful. The authors do really nicely
discuss several specific examples of "nearest" compounds so it seems to be
working but it would really be great to see some sort of model evaluation.
- For example, what is the reconstruction cost associated with the
autoencoder portion of the model and what was stopping criteria? What is it
across epochs?
- What are the hyperparameters of the model and how were they chosen?
Overall, I thought the paper elegantly laid out the problem of the very
high drug development failure rate and the evolution of computational
methods for compound prioritization. They also apply a promising approach
that appears to be working at first glance. I think it would be great to
see this approach work really well as it appears to be a very promising
approach for drug development and drug repurposing. However, I think that
given my concerns perhaps it is not suitable for this review. Maybe we
could talk about the *idea* of the approach in the discussion - I am not
sure.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#213 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIXsci-Y6hRQ2Lgki31ymJMsyVcepU5sks5rYf0pgaJpZM4Lv5UF>
.
|
Hi @spoilt333 - this is great! thanks for your prompt response - i think this clears up a lot. I'll respond to your points below:
Ah, I see, this makes sense now - I think this is a nice innovation! I can see then that the rejection criterion was whether or not the concentration of the corresponding reconstructed molecular fingerprint was reasonable.
Great, ok, I see now. I must have missed that the output layer was sigmoid.
Yep! I was wondering what the architecture of the discriminator was. Sounds like it could be a logistic regression classifier? Or was it that you sampled several times from the generator and if it fell beyond the distribution of the real latent space then it was rejected?
I have found this to be the case as well. Looking forward to the next paper. Thanks again for responding so quickly, I will update my summary posted above accordingly. |
I think it could be not clear enough from code because of some optimization
tricks.
You're right, discriminator is logistic regression classifier with
reformulated cost.
About output layer - it has no activation in code, but
inside tf.nn.sigmoid_cross_entropy_with_logits sigmoid applied to evaluate
a cost. And, of course, after generating new vectors we applied it too.
2017-02-03 1:49 GMT+03:00 Greg Way <notifications@github.com>:
… Hi @spoilt333 <https://github.com/spoilt333> - this is great! thanks for
your prompt response - i think this clears up a lot. I'll respond to your
points below:
1. Actually, GI neuron was trained jointly with rest latent neurons as
predictor of "efficiency" of drug. But, after training, it was used as
tuner for generating new drugs. Latent layer is a kind of noise and GI is a
condition for Decoder net, and both used to produce output.
Ah, I see, this makes sense now - I think this is a nice innovation! I can
see then that the rejection criterion was whether or not the concentration
of the corresponding reconstructed molecular fingerprint was reasonable.
1. There was no reason to pick exactly 640 samples, but we had to
chose some:) As output layer has sigmoid activation we treat it as
probability of presence of corresponding bit in compound code. So,
"similarity" was just a likelihood of a compound to be sampled from
generated vector.
Great, ok, I see now. I must have missed that the output layer was sigmoid.
1. Discriminator is a standard GANs part. In fact, it is a binary
classifier which tries to determine was sample came from some "true"
distribution or it was generated by NN. In our case, true distribution was
Gaussian, and false came from Encoder.
Yep! I was wondering what the *architecture* of the discriminator was.
Sounds like it could be a logistic regression classifier? Or was it that
you sampled several times from the generator and if it fell beyond the
distribution of the real latent space then it was rejected?
1. It is really a big point and we are going to make it more clear in
next paper. There is few ideas Most important hyperparameter is a latent
layer size IMO.
I have found this to be the case as well. Looking forward to the next
paper.
Thanks again for responding so quickly, I will update my summary posted
above accordingly.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#213 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIXsco68TtiuUxXEsH4AcqjwJJO37fahks5rYl2PgaJpZM4Lv5UF>
.
|
http://doi.org/10.18632/oncotarget.14073
The text was updated successfully, but these errors were encountered: