-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CGSchNet support #21
Comments
Is there really any difference from SchNet? The cfconv layer seems to be nearly identical. The only difference they mention is that they change the activation function from softplus to tanh, and they don't give any explanation for why they made that change. (It also seems like a doubtful choice, since bounded activation functions often don't work as well as unbounded ones for hidden layers.) |
I'm not quite sure! BTW, we've recently noticed that ANI uses CELU, which is not C2-continuous. This causes significant problems with some optimizers, and in principle shouldn't be used with MD. They're now retraining with softplus right now. But we should double-check the activation functions we implement are all C2-continuous. |
Agreed! |
|
@peastman : @brookehus gave a good summary. For our systems and CG mapping choices, we found that using Tanh() as an activation function in place of (shifted) Softplus() resulted in more stable simulations using trained models. Of course, there is no problem for users to try any activation function that they wish. |
Ok, I can add the option to use tanh instead. Can you look at the API in #18 and see if otherwise it looks good for your purposes? Since it's based on handwritten CUDA kernels, it obviously won't be as flexible as pure PyTorch code. |
@nec4 do you mind taking a look when you have time? let me know if you need anything |
Of course - I will take a look. I'll let you guys know if there is any outstanding issue with our purposes with regard to the API. I will probably get back to you guys sometime early next week. |
Hello! - from what I see, the only differences from whats in #18 and our code is the fact that we currently don't use neighbor cutoffs in our models (although we allow for the option of a simple neighbor list) and we have not implemented cosine cutoffs - though in principle these features may be useful to us in the future. Additionally, we have found (following the original schnet paper) that normalizing the output of the CfConv by the number of beads/atoms results in improved performance when using the network generatively (eg, calculating forces for simulations). If there is more to discuss, please let me know! |
Is that because you specifically don't want the cosine cutoff function, or just because you haven't gotten to implementing it yet? In other words, is it a problem that the current implementation includes it?
Is that just a scaling of the output? Or does it require changes to the cfconv kernel itself? |
@peastman: For the cutoff, we just never needed it/used it - I don't think that the current implementation including it is necessarily a problem (though I have not tried it in any of my models, so I cannot say for certain). For the scaling, it is just simple scalar normalization; i,e, we just divide the final output of the CfConv by the number of beads in the system - so I don't think it requires changes to the kernel itself (you could just put another block after the CfConv layer that performs a simple scaling). |
Ok, thanks. So it sounds like the only feature I need to add is the option to use tanh. |
@peastman: I think so too! Let us know if there is anymore information we can provide. |
@peastman: Would love to see if we could support the CGSchNet model described in this excellent paper from @brookehus, since this could allow us to support much larger coarse-grained models as part of our ML integration.
The text was updated successfully, but these errors were encountered: