Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On the validity score (improving understanding and/or modifications) #44

Open
sgbaird opened this issue Aug 5, 2022 · 4 comments
Open

Comments

@sgbaird
Copy link
Member

sgbaird commented Aug 5, 2022

https://twitter.com/keeeto2000/status/1555143104650428419 by @keeeto (@keeeto2000 on Twitter)

Very timely idea! I am not sure I totally follow the validity score - it sounds a bit like an FID score. If this runs on top of xtal2png, would it be possible to just implement an FID score? Obvs it might be a bit slower because Inception...

As_a_distance_between_probability_distributions_(the_FID_score) has a "see also" to Wasserstein_metric § Normal_distributions

And in the Fréchet inception distance wiki article:

In other words, it is the 2-Wasserstein distance on {\displaystyle \mathbb {R} ^{n}}\mathbb {R} ^{n}.

Wondering whether this needs to be renamed, explained differently, or if it needs to be changed to a different calculation. The intention behind the validity score is to ensure that the generated structures are "reasonable" and "valid" (i.e. realistic), and a set of structures with a similar space group number distribution to known structures from train+test seemed like a good way to tell, especially with the difficulty some models have had with generating structures other than P1 symmetry. Using e_above_hull would be another option, but this requires a high-fidelity property predictor and could lead to bias depending on the model used.

@sgbaird
Copy link
Member Author

sgbaird commented Aug 5, 2022

Related:

(1) Dimitrakopoulos, P.; Sfikas, G.; Nikou, C. Wind: Wasserstein Inception Distance For Evaluating Generative Adversarial Network Performance. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2020; pp 3182–3186. https://doi.org/10.1109/ICASSP40776.2020.9053325.

I.e. relaxing the Gaussian assumption (Frechet --> Wasserstein)

(from abstract)

... We extend FID by relaxing the Gaussian hypothesis of the related inception features and extend it for non-Gaussian, multimodal distributions. ...

@keeeto
Copy link

keeeto commented Aug 5, 2022

I think that the score is a good one. I didn't think of the point regarding the difficulty in generating anything but P1 - so the bit of text that you added there certainly helps to motivate the score. I might call it something like crystallographic validity or similar, as it is based on the symmetry rather than the chemistry. You could have a load of crazy chemical bonds for example and yet have a reasonable spacegroup distribution.

If you wanted to check for chemical bonding validity an interesting approach might be to calculate the elemental embeddings using our recent SkipAtom https://github.com/lantunes/skipatom https://www.nature.com/articles/s41524-022-00729-3 . Essentially SkipAtom generates an embedding based on observed chemical environments, it is based on word embeddings from NLP. If your SkipAtom emebdding is similar to the training set, then you should have similar chemistry - this would be a kind of "crystal chemistry validity".

@sgbaird
Copy link
Member Author

sgbaird commented Aug 6, 2022

@keeeto I like the idea of considering both structural and chemical validity, which is in line with some other recent changes #39 (comment). I've been hoping to use SkipAtom at some point, so I'm glad you're bringing it up in this context. I've also been interested in using it as the elemental featurizer for CrabNet in a Matbench submission for matbench_expt_gap. I already incorporated a SkipAtom CSV file into CrabNet lantunes/skipatom#6, so just a matter of fleshing out a short script and preparing a benchmark submission.

@keeeto
Copy link

keeeto commented Aug 8, 2022

Oh - that sounds cool. I will be really interested to hear how SkipAtom + CrabNet works :). BTW we have been using a slight mod of CrabNet in an upcoming piece of work - nice to see some good cross-pollintaion; Open Source win!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants