Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretraining with full_det=True #27

Closed
n-gao opened this issue May 17, 2021 · 5 comments
Closed

Pretraining with full_det=True #27

n-gao opened this issue May 17, 2021 · 5 comments

Comments

@n-gao
Copy link
Contributor

n-gao commented May 17, 2021

Hi,
I noticed that the default parameter for full_det is True. So, I would expect that during the pretraining one also fits the dense Slater-determinant obtained by Hartree Fock. However, it looks like the code only retrieves two blocks from the Slater determinant and fits it to the diagonal blocks of FermiNet while fitting the rest of FermiNet's orbitals to 0.
Is there a good reason to train like this?
Wouldn't a better approach be to fit FermiNet's orbitals to the product of the two (spin-up and spin-down) matrices obtained by Hartree Fock?

@jsspencer
Copy link
Collaborator

The reason is deciding where to spend time -- in developing code, testing it and in computational time.

Pretraining is just to create some initial state that is vaguely close to the ground state (ie within 10s of Hartrees). There's a deliberate choice for pretraining to both be simple and quite crude (so the optimisation doesn't need to break symmetry, for example). This is also why we use a small basis by default, for example. Better pretraining may or may not improve convergence -- I think there's (quickly) diminishing returns.

Note that this is not the only simplification we make during pre-training. There's also some discussion in (e.g.) #14 and maybe also in our papers.

@n-gao
Copy link
Contributor Author

n-gao commented May 18, 2021

Thanks for the response! I see. Though, I was wondering since it seemed more involved to pad everything with zeros than training with the full matrix.

@dpfau
Copy link
Collaborator

dpfau commented May 18, 2021 via email

@n-gao
Copy link
Contributor Author

n-gao commented May 18, 2021

There seems to be a small misunderstanding. Maybe this figure helps clearing things up.
In the top left corner is the RHF Slater determinant. In pretrain.py:91 we then only select the two left blocks. Then, in pretrain.py:155 we put these two blocks to the diagonal and then fit to the FermiNet determinant.
My question is then, why don't just compute the MSE between directly between the RHF Slater and the FermiNet Slater.
image

@jsspencer
Copy link
Collaborator

Note that we train against the UHF state by default rather than the RHF state. I am not convinced by your spin labels on the right-hand side -- line 90 selects the values of the alpha spin-orbitals evaluated at the positions of the alpha electrons and similar the the beta spin-orbitals and electrons.

The full_det option is experimental. Again, I don't think pretraining is particularly important and there was a choice to pretrain analogous elements for both full_det=True and full_det=False (i.e. a block diagonal, spin-factored wavefunction). Pretraining against a dense matrix instead also works (though a quick test on neon showed that it resulted in a network with a much higher initial energy).

More importantly training a neural network to as a function predictor by matching the outputs behaves quite poorly (e.g. arXiv:1706.04859) -- the pretraining algorithm is pretty crude and is really designed to give only a starting point which doesn't encounter numerical problems. FermiNet orbitals can be quite different from Hartree-Fock orbitals, so accurately representing Hartree-Fock from pretraining isn't a priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants