Prediction of H2 using the pre-trained DimeNet++ model #13

qikuizhu · 2021-02-04T12:58:23Z

I have trained the DimeNet++ model with the QM9 dataset and then predicted the hydrogen molecule, H2 which is the simplest molecule as a benchmark, using the pre-trained model. However, I obtained the poor result for the atomization energy of H2. My result may be something wrong, so please try and report your result.

Because the atomic distance (i.e. bond length) of H2 is 0.74 Å, the 3D structure of the hydrogen molecule can be written as

atom x y z
H 0.0 0.0 0.0
H 0.74 0.0 0.0

and the atomization energy is 4.54 eV (see https://wiki.fysik.dtu.dk/gpaw/dev/tutorials/H2/atomization.html). The prediction by the pre-trained DimeNet++ model, however, was about 9.79 eV and its error is 9.79 - 4.54 = 5.25 eV = 120 kcal/mol. This very poor result for the simplest molecule seems to be something wrong because the DimeNet++ model learned and predicted the atomization energy of molecules in the QM9 dataset with less than MAE = 0.01 eV = 0.23 kcal/mol.

Probably, the main reason is that the QM9 dataset does not include the "diatomic molecules" such as H2, N2, and O2. Even If a machine learning model achieved a low MAE on such QM9 dataset, if the error for the simplest hydrogen molecule H2 is over 100 kcal/mol, can we say that the model could capture the molecular energy?

gasteigerjo · 2021-02-08T15:56:55Z

This is indeed a very interesting point, and the pretrained model in this repository yields a smaller, but still similarly high error of 2.47eV in my quick experiment.

From a physical perspective the H2 molecule is certainly the simplest molecule there is, but from the QM9 data perspective this is an extreme outlier. There are no H-H bonds in the QM9 dataset, and the model will predict something that is similar to the data and bonds it has seen.

I would say this beautifully shows the weaknesses of a purely data-driven approach: By mostly decoupling the model from the physical ground-truth you can get extreme outliers in cases you would not expect, and your extrapolation abilities are severely limited. This is certainly something that can be improved, but I think we should always be aware of the chemical space our data covers.

Remember that the model only knows the atom types and geometry. It only knows those things about the wave function that it learned from this data. Atom types and direct interactions that it has never seen before will be neigh impossible for it to predict.

gasteigerjo closed this as completed Feb 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction of H2 using the pre-trained DimeNet++ model #13

Prediction of H2 using the pre-trained DimeNet++ model #13

qikuizhu commented Feb 4, 2021 •

edited

gasteigerjo commented Feb 8, 2021 •

edited

Prediction of H2 using the pre-trained DimeNet++ model #13

Prediction of H2 using the pre-trained DimeNet++ model #13

Comments

qikuizhu commented Feb 4, 2021 • edited

gasteigerjo commented Feb 8, 2021 • edited

qikuizhu commented Feb 4, 2021 •

edited

gasteigerjo commented Feb 8, 2021 •

edited