Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CGSchNet support #21

Open
jchodera opened this issue Nov 19, 2020 · 13 comments
Open

CGSchNet support #21

jchodera opened this issue Nov 19, 2020 · 13 comments
Labels
question Further information is requested

Comments

@jchodera
Copy link
Member

@peastman: Would love to see if we could support the CGSchNet model described in this excellent paper from @brookehus, since this could allow us to support much larger coarse-grained models as part of our ML integration.

@peastman
Copy link
Member

Is there really any difference from SchNet? The cfconv layer seems to be nearly identical. The only difference they mention is that they change the activation function from softplus to tanh, and they don't give any explanation for why they made that change. (It also seems like a doubtful choice, since bounded activation functions often don't work as well as unbounded ones for hidden layers.)

@jchodera
Copy link
Member Author

I'm not quite sure!

BTW, we've recently noticed that ANI uses CELU, which is not C2-continuous. This causes significant problems with some optimizers, and in principle shouldn't be used with MD. They're now retraining with softplus right now. But we should double-check the activation functions we implement are all C2-continuous.

@peastman
Copy link
Member

Agreed!

@brookehus
Copy link

  • there's no difference from SchNet in terms of the structure! the cgnet code just allows for the activation function to be changed easily in contrast w/ (e.g.) schnetpack (and also has other modularities like writing your own normalization scheme; in general i support a software that easily allows for swapping out these kinds of things, but since peter's influenced a lot of my coding i doubt this is news)
  • @peastman re: doubtful choices, i believe we did obtain lower losses with shifted softplus, which is what canonical schnet uses, and what one would expect, as you point out. however, the simulations resulting from models trained with SSP were problematic (very rugged, iirc), especially as system size increases. we looked into it, and this seems to have to do with whether the activation fxn saturates or not. perhaps stronger regularization or model averaging could offset this instead of or in addition to the activation fxn switch. @nec4, feel free to add your knowledge here

@nec4
Copy link

nec4 commented Nov 20, 2020

@peastman : @brookehus gave a good summary. For our systems and CG mapping choices, we found that using Tanh() as an activation function in place of (shifted) Softplus() resulted in more stable simulations using trained models. Of course, there is no problem for users to try any activation function that they wish.

@peastman
Copy link
Member

Ok, I can add the option to use tanh instead. Can you look at the API in #18 and see if otherwise it looks good for your purposes? Since it's based on handwritten CUDA kernels, it obviously won't be as flexible as pure PyTorch code.

@brookehus
Copy link

@nec4 do you mind taking a look when you have time? let me know if you need anything

@nec4
Copy link

nec4 commented Nov 27, 2020

Of course - I will take a look. I'll let you guys know if there is any outstanding issue with our purposes with regard to the API. I will probably get back to you guys sometime early next week.

@nec4
Copy link

nec4 commented Nov 30, 2020

Hello! - from what I see, the only differences from whats in #18 and our code is the fact that we currently don't use neighbor cutoffs in our models (although we allow for the option of a simple neighbor list) and we have not implemented cosine cutoffs - though in principle these features may be useful to us in the future. Additionally, we have found (following the original schnet paper) that normalizing the output of the CfConv by the number of beads/atoms results in improved performance when using the network generatively (eg, calculating forces for simulations). If there is more to discuss, please let me know!

@peastman
Copy link
Member

peastman commented Dec 3, 2020

we have not implemented cosine cutoffs - though in principle these features may be useful to us in the future.

Is that because you specifically don't want the cosine cutoff function, or just because you haven't gotten to implementing it yet? In other words, is it a problem that the current implementation includes it?

normalizing the output of the CfConv by the number of beads/atoms results in improved performance

Is that just a scaling of the output? Or does it require changes to the cfconv kernel itself?

@nec4
Copy link

nec4 commented Dec 3, 2020

@peastman: For the cutoff, we just never needed it/used it - I don't think that the current implementation including it is necessarily a problem (though I have not tried it in any of my models, so I cannot say for certain). For the scaling, it is just simple scalar normalization; i,e, we just divide the final output of the CfConv by the number of beads in the system - so I don't think it requires changes to the kernel itself (you could just put another block after the CfConv layer that performs a simple scaling).

@peastman
Copy link
Member

peastman commented Dec 3, 2020

Ok, thanks. So it sounds like the only feature I need to add is the option to use tanh.

@nec4
Copy link

nec4 commented Dec 4, 2020

@peastman: I think so too! Let us know if there is anymore information we can provide.

@raimis raimis added the question Further information is requested label May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants