GAP with SOAP for a molecule #213

vvassilevg · 2020-06-26T12:00:39Z

I would like to fit with GAP the PES of molecules using SOAP. As a test, I am using a glycine molecule (500 training points).
So far, I get high Mean Absolute Errors on the training set (the errors are of course even higher for the test set) for both Energies and Forces (above 0.07 eV and 0.7 ev/A, respectively).

I have tested different parameters for the gap fit command. One example is:

gap_fit at_file=train.xyz \
gap={soap cutoff=5.0 \
          covariance_type=dot_product \
          zeta=2 \
          delta=0.016 \
          atom_sigma=0.3 \
          l_max=14 \
          n_max=14 \
          n_sparse=2000 \
          sparse_method=cur_points} \
force_parameter_name=forces \
e0_method=average \
default_sigma={0.001 0.2 0.0 0.0} \
do_copy_at_file=F sparse_separate_file=F \
gp_file=gap_soap.xml

I have tried different values for l_max, n_max and cutoff.

An example of a molecule in my train.xyz files is:

10
Lattice="200.0 0.0 0.0 0.0 200.0 0.0 0.0 0.0 200.0" Properties=species:S:1:pos:R:3:forces:R:3 energy=-7735.046780 pbc="T T T"
N -5.400440 5.468773 2.837348 0.197830 0.001017 0.374624
H -6.365580 4.176578 3.913868 0.136175 0.068701 -0.293331
H -3.910505 6.045725 3.925944 -0.120680 0.003343 -0.049358
C -4.498210 4.417443 0.471706 -0.433094 1.062071 -0.682114
C -2.707543 2.205386 0.405318 0.102343 -0.550705 -0.104142
O -1.920056 1.231402 -1.521152 -0.009198 0.153232 0.169461
O -2.067417 1.333725 2.758112 -0.292697 0.513378 -0.084358
H -0.893916 -0.033471 2.454792 0.283477 -0.392345 -0.057309
H -6.148205 4.006446 -0.739463 0.074871 -0.442791 0.154181
H -3.631109 5.954848 -0.713001 0.060973 -0.415900 0.572343

Energy is in eV, forces in eV/A, and in this case, positions are in a.u., but I have also trained using A.

Is there anything missing (or wrong) when setting the gap_fit command and the soap descriptor?

I will be very grateful for any help you can provide.

Best regards,

The text was updated successfully, but these errors were encountered:

gabor1 · 2020-06-26T13:04:06Z

The positions MUST be in Angstroms, otherwise the force (in eV/A) is not the precise derivative of the energy. Make sure the forces are really forces and not gradients (as given by some quantum chemistry packages). On your soap command: - (n_max,l_max) = (14,14) is complete overkill, and also slows everything down. set it to (12,6) to get high accuracy results, (8,4) for something quicker - you need to add the "add_species=T" command, otherwise all atomic species are ignored - This is further down, once you get the accuracy that you want, but if you want a potential that can be used for high temperature MD, you will need to do something to enforce atomic repulsion at close approach. Either an explicitly fitted 2b potential (can be added to the gap string, look at published papers), or some other baseline. In this case I recommend not using the "e0_method=average" command, but actually explicitly compute the isolated atom energies and add them to the training set (they will be picked up and used for e0 for each species) Let me know how you get on!

…

-- Gábor Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 26 Jun 2020, at 13:00, vvassilevg ***@***.***> wrote: I would like to fit with GAP the PES of molecules using SOAP. As a test, I am using a glycine molecule (500 training points). So far, I get high Mean Absolute Errors on the training set (the errors are of course even higher for the test set) for both Energies and Forces (above 0.07 eV and 0.7 ev/A, respectively). I have tested different parameters for the gap fit command. One example is: gap_fit at_file=train.xyz \ gap={soap cutoff=5.0 \ covariance_type=dot_product \ zeta=2 \ delta=0.016 \ atom_sigma=0.3 \ l_max=14 \ n_max=14 \ n_sparse=2000 \ sparse_method=cur_points} \ force_parameter_name=forces \ e0_method=average \ default_sigma={0.001 0.2 0.0 0.0} \ do_copy_at_file=F sparse_separate_file=F \ gp_file=gap_soap.xml I have tried different values for l_max, n_max and cutoff. An example of a molecule in my train.xyz files is: 10 Lattice="200.0 0.0 0.0 0.0 200.0 0.0 0.0 0.0 200.0" Properties=species:S:1:pos:R:3:forces:R:3 energy=-7735.046780 pbc="T T T" N -5.400440 5.468773 2.837348 0.197830 0.001017 0.374624 H -6.365580 4.176578 3.913868 0.136175 0.068701 -0.293331 H -3.910505 6.045725 3.925944 -0.120680 0.003343 -0.049358 C -4.498210 4.417443 0.471706 -0.433094 1.062071 -0.682114 C -2.707543 2.205386 0.405318 0.102343 -0.550705 -0.104142 O -1.920056 1.231402 -1.521152 -0.009198 0.153232 0.169461 O -2.067417 1.333725 2.758112 -0.292697 0.513378 -0.084358 H -0.893916 -0.033471 2.454792 0.283477 -0.392345 -0.057309 H -6.148205 4.006446 -0.739463 0.074871 -0.442791 0.154181 H -3.631109 5.954848 -0.713001 0.060973 -0.415900 0.572343 Energy is in eV, forces in eV/A, and in this case, positions are in a.u., but I have also trained using A. Is there anything missing (or wrong) when setting the gap_fit command and the soap descriptor? I will be very grateful for any help you can provide. Best regards, — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

vvassilevg · 2020-06-29T11:07:24Z

Dear Prof. Csányi,

Thank you for your message.

I already set my gap fit input with your suggested values of (n_max, l_max) and with "add_species=T", and I also set all positions in A. However, the results I am obtaining are practically the same (only forces improve a little).

I think that my forces are not gradients. So, do you think that there is something else missing in my input?

Best regards.

gabor1 · 2020-06-29T11:43:28Z

Show me your new command line, and the scatter plot of target vs predicted energies, and target vs predicted force components, the latter coloured according to the element

…

-- Gábor Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 29 Jun 2020, at 12:16, vvassilevg ***@***.***> wrote: Dear Prof. Csányi, Thank you for your message. I already set my gap fit input with your suggested values of (n_max, l_max) and with "add_species=T", and I also set all positions in A. However, the results I am obtaining are practically the same (only forces improve a little). I think that my forces are not gradients. So, do you think that there is something else missing in my input? Best regards. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

vvassilevg · 2020-06-29T16:08:46Z

The command is:

gap_fit at_file=train.xyz \
gap={soap cutoff=5.0 \
          covariance_type=dot_product \
          zeta=2 \
          delta=0.016 \
          atom_sigma=0.3 \
          add_species=T \
          n_max=8 \
          l_max=4 \
          n_sparse=4000 \
          sparse_method=cur_points} \
force_parameter_name=forces \
e0_method=average \
default_sigma={0.001 0.2 0.0 0.0} \
do_copy_at_file=F sparse_separate_file=F \
gp_file=gap_soap.xml

The plots are at the end of the message.

Do you think that (n_max, l_max) = (12,6) could make a huge improvement, or Is it possible that I would need to add the 2b or 3b potentials?

gabor1 · 2020-06-29T16:55:40Z

Why are you using such a small delta? it is supposed to be typical energy per atom (really: target function value for the descriptor, but here that is energy per atom), which I would expect to be about 0.1 eV, and you have a 1/10th of that. I wouldn’t expect the n_max,l_max to be your limiting factor.

…

-- Gábor Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 29 Jun 2020, at 17:09, vvassilevg ***@***.***> wrote: The command is: gap_fit at_file=train.xyz \ gap={soap cutoff=5.0 \ covariance_type=dot_product \ zeta=2 \ delta=0.016 \ atom_sigma=0.3 \ add_species=T \ n_max=8 \ l_max=4 \ n_sparse=4000 \ sparse_method=cur_points} \ force_parameter_name=forces \ e0_method=average \ default_sigma={0.001 0.2 0.0 0.0} \ do_copy_at_file=F sparse_separate_file=F \ gp_file=gap_soap.xml The plots are at the end of the message. Do you think that (n_max, l_max) = (12,6) could make a huge improvement, or Is it possible that I would need to add the 2b or 3b potentials? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mcaroba · 2020-06-29T17:27:48Z

I actually think the delta should not matter since there's only one GAP, except for the relative ratio of regularization to delta which might be smudging the predictions (otherwise the alphas just get scaled). I would guess based on experience that the ratio atom_sigma to rcut might be too small for only 8 radial basis functions to resolve accurately. I would increase nmax or even better, increase atom_sigma. And increase the delta as Gábor suggested.

…

On Mon, 29 Jun 2020, 19:55 gabor1, ***@***.***> wrote: Why are you using such a small delta? it is supposed to be typical energy per atom (really: target function value for the descriptor, but here that is energy per atom), which I would expect to be about 0.1 eV, and you have a 1/10th of that. I wouldn’t expect the n_max,l_max to be your limiting factor. -- Gábor Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/ > On 29 Jun 2020, at 17:09, vvassilevg ***@***.***> wrote: > > > The command is: > > gap_fit at_file=train.xyz \ > gap={soap cutoff=5.0 \ > covariance_type=dot_product \ > zeta=2 \ > delta=0.016 \ > atom_sigma=0.3 \ > add_species=T \ > n_max=8 \ > l_max=4 \ > n_sparse=4000 \ > sparse_method=cur_points} \ > force_parameter_name=forces \ > e0_method=average \ > default_sigma={0.001 0.2 0.0 0.0} \ > do_copy_at_file=F sparse_separate_file=F \ > gp_file=gap_soap.xml > > The plots are at the end of the message. > > Do you think that (n_max, l_max) = (12,6) could make a huge improvement, or Is it possible that I would need to add the 2b or 3b potentials? > > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub, or unsubscribe. > — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#213 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADR3Q2ECT5QEDEIKJPWOQRDRZDBRXANCNFSM4OJIKUSQ> .

gabor1 · 2020-06-29T17:36:22Z

I think the delta matters because there are both energies and forces... it's best to try to stick to the heuristic. I agree that there is a rescaling of both delta and sigma that probably leaves things invariant...

n=8,l=4 is for a crude accuracy, but not this crude...

gabor1 · 2020-06-29T17:36:47Z

And the fact that the forces are not around the x=y line is the really troublesome thing

mcaroba · 2020-06-29T17:43:13Z

Yes, you are right about the deltas. I still think the atom_sigma is too small for that cutoff. Compare 0.5/3.7 for the a-C GAP to 0.3/5 for this one. Roughly twice as small and with the same number of radial functions. I think that could make for a noisy kernel.

…

On Mon, 29 Jun 2020, 20:37 gabor1, ***@***.***> wrote: And the fact that the forces are not around the x=y line is the really troublesome thing — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#213 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADR3Q2AS54NX7QTGUX4STZDRZDGLZANCNFSM4OJIKUSQ> .

gabor1 · 2020-06-29T17:46:40Z

In general you are right Miguel, but the problems here seem to me bigger.

vvassilevg · 2020-06-30T14:28:29Z

Thank you both for your comments.

I have tested different values of delta and the best result I have got so far is with 0.25:
Energy MAE -- 0.005587 eV
Force MAE -- 0.132692 eV/A

Then I kept delta=0.25 and increased atoms_sigma to 0.5, the errors are practically the same, but slightly worse:
Energy MAE -- 0.006723 eV
Force MAE -- 0.143231 eV/A

Do you think I can improve the accuracy of forces with even higher values of delta and/or using (n_max,l_max) = (12,6)?

Best regards,

gabor1 · 2020-06-30T14:31:00Z

Your force errors look much much better. you should think carefully whether these are good enough for your purposes.. you are likely hitting the locality limit. no solutions are easy to make the descriptors longer range (you can use multiple soaps, but in any case you might need a lot more data etc). maybe time to compute some observable that is more directly related to what you want and see if it is good enough?

…

-- Gábor Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 30 Jun 2020, at 15:28, vvassilevg ***@***.***> wrote: Thank you both for your comments. I have tested different values of delta and the best result I have got so far is with 0.25: Energy MAE -- 0.005587 eV Force MAE -- 0.132692 eV/A Then I kept delta=0.25 and increased atoms_sigma to 0.5, the errors are practically the same, but slightly worse: Energy MAE -- 0.006723 eV Force MAE -- 0.143231 eV/A Do you think I can improve the accuracy of forces with even higher values of delta and/or using (n_max,l_max) = (12,6)? Best regards, — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mcaroba · 2020-06-30T14:45:52Z

Indeed, it looks a lot better. It seems that I was mistaken about both deltas and the atom_sigma...

…

On Tue, 30 Jun 2020 at 17:31, gabor1 ***@***.***> wrote: Your force errors look much much better. you should think carefully whether these are good enough for your purposes.. you are likely hitting the locality limit. no solutions are easy to make the descriptors longer range (you can use multiple soaps, but in any case you might need a lot more data etc). maybe time to compute some observable that is more directly related to what you want and see if it is good enough? -- Gábor Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/ > On 30 Jun 2020, at 15:28, vvassilevg ***@***.***> wrote: > > > Thank you both for your comments. > > I have tested different values of delta and the best result I have got so far is with 0.25: > Energy MAE -- 0.005587 eV > Force MAE -- 0.132692 eV/A > > > > Then I kept delta=0.25 and increased atoms_sigma to 0.5, the errors are practically the same, but slightly worse: > Energy MAE -- 0.006723 eV > Force MAE -- 0.143231 eV/A > > > > Do you think I can improve the accuracy of forces with even higher values of delta and/or using (n_max,l_max) = (12,6)? > > Best regards, > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub, or unsubscribe. > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#213 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADR3Q2CCFAYNZIIHHQWZ5ELRZHZLJANCNFSM4OJIKUSQ> .

-- *Dr. Miguel Caro* *Academy of Finland Postdoctoral Researcher* Department of Electrical Engineering and Automation and Department of Applied Physics Aalto University <http://www.aalto.fi>, Finland *Email*: mcaroba@gmail.com *Work*: miguel.caro@aalto.fi *Website*: miguelcaro.org

vvassilevg · 2020-07-01T11:42:53Z

I guess now that I know how to tune the parameters I can test the models to see how they perform and also work with other systems.

Thank you for your help with this issue.

Best regards,

vvassilevg closed this as completed Jul 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GAP with SOAP for a molecule #213

GAP with SOAP for a molecule #213

vvassilevg commented Jun 26, 2020

gabor1 commented Jun 26, 2020 via email

vvassilevg commented Jun 29, 2020

gabor1 commented Jun 29, 2020 via email

vvassilevg commented Jun 29, 2020

gabor1 commented Jun 29, 2020 via email

mcaroba commented Jun 29, 2020 via email

gabor1 commented Jun 29, 2020

gabor1 commented Jun 29, 2020

mcaroba commented Jun 29, 2020 via email

gabor1 commented Jun 29, 2020

vvassilevg commented Jun 30, 2020

gabor1 commented Jun 30, 2020 via email

mcaroba commented Jun 30, 2020 via email

vvassilevg commented Jul 1, 2020

GAP with SOAP for a molecule #213

GAP with SOAP for a molecule #213

Comments

vvassilevg commented Jun 26, 2020

gabor1 commented Jun 26, 2020 via email

vvassilevg commented Jun 29, 2020

gabor1 commented Jun 29, 2020 via email

vvassilevg commented Jun 29, 2020

gabor1 commented Jun 29, 2020 via email

mcaroba commented Jun 29, 2020 via email

gabor1 commented Jun 29, 2020

gabor1 commented Jun 29, 2020

mcaroba commented Jun 29, 2020 via email

gabor1 commented Jun 29, 2020

vvassilevg commented Jun 30, 2020

gabor1 commented Jun 30, 2020 via email

mcaroba commented Jun 30, 2020 via email

vvassilevg commented Jul 1, 2020