Skip to content
This repository has been archived by the owner on Apr 27, 2023. It is now read-only.

Add dropout on predict #8

Merged
merged 2 commits into from
Jun 20, 2019
Merged

Conversation

WardLT
Copy link
Contributor

@WardLT WardLT commented Jun 19, 2019

Adds option to maintain use of dropout during predictions.

Useful as a method for assessing model uncertainty (https://arxiv.org/abs/1506.02142)

@chc273
Copy link
Contributor

chc273 commented Jun 20, 2019

Nice addition! @WardLT

@chc273 chc273 merged commit 7e1ea00 into materialsvirtuallab:master Jun 20, 2019
@sgbaird
Copy link

sgbaird commented Jan 31, 2022

@WardLT how does one use this to get UQ for model predictions?

@WardLT WardLT deleted the add_dropout branch January 31, 2022 23:16
@WardLT
Copy link
Contributor Author

WardLT commented Jan 31, 2022

The idea's to keep performing dropout during inference by setting dropout_on_predict=True, which means you'll use a different sets of weights each time and the predictions will be different. The paper linked above suggests that the distributions of points can be used to measure the confidence intervals of the models (e.g., standard deviation can be used as a proxy for "confidence of the models," percentiles of the prediction distributions can be used to set confidence intervals).

Just realized now I forgot to add information about this to the documentation. Sorry about that.

@sgbaird
Copy link

sgbaird commented Feb 1, 2022

Gotcha, so train multiple times and take the standard deviation of the multiple predictions, for example, if I'm understanding correctly. Thanks!

@WardLT
Copy link
Contributor Author

WardLT commented Feb 1, 2022 via email

@sgbaird
Copy link

sgbaird commented Feb 1, 2022

@WardLT, ah, OK. Maybe what I'm still missing is how to go from the changes in this PR to standard deviations on the model predictions.

@WardLT
Copy link
Contributor Author

WardLT commented Feb 1, 2022

Sure, I'll step through the procedure with more detail to see what I'm mistakenly skipping over (pardon if I'm hitting things you already know):

  1. Create a MEGNet model with dropout layers and dropout_on_predict=True, then train it. Nothing special here outside of setting the option. Keras will train the model using dropout, which means a different population of weights are zeroed out each batch.
  2. Perform multiple predictions on a single entry using the trained model. Here's where the special part is. Normally, Keras turns off the dropout layers during prediction so that you use all of the weights in the network. In our case, Keras will continue to perform dropout by setting different weights to zero each time we make a prediction because we forced dropout_on_predict.
    1. Important Note: dropout_on_predict=True means that Keras will use the same sets of weights on each batch. So, creating a batch with the same entries in it will results in the same prediction (because Keras is zeroing the same weights for dropout within a batch).
  3. Compute the mean and standard deviation of the predictions. Each inference on a single entry will result in subtly different predictions and there is evidence to support these variations being a good measure of uncertainty for the model. A common way to turn them into a single confidence metric is to compute the standard deviation of the predictions and to turn them into a single prediction by averaging.
    1. If you're a friend of Bayesian methods, the variation of predictions is thought to represent a posterior distribution of model outputs conditioned on the training data. In non-Bayesian speak, this means you can take many other statistics about the distribution (e.g., the fraction of predictions that result in the compound being on the convex hull is the probability the material will be on the hull).

Does this clear things up?

@sgbaird
Copy link

sgbaird commented Feb 1, 2022

Perfect! I think that cleared things up. Fit once, predict multiple times, and treat each prediction as a sample from the posterior distribution. Thank you!

@sgbaird
Copy link

sgbaird commented Feb 2, 2022

@WardLT I've been looking into this a bit more and wonder if you could shed some additional light. This approach seems to work well in the case where the distribution of the new test data is similar to the training data, but provides overconfident uncertainties if the distributions are distinct (the latter of which is usually the case in materials discovery campaigns). In other words, there seems to be some agreement in the literature (see below) that bootstrapped ensembles and Monte-Carlo dropout methods are often overconfident for out-of-domain predictions. What are your thoughts on the strengths and weaknesses of the dropout approach in a materials science context? Does the dropout uncertainty that you described fall into this category, or this there something distinct?

A key failing of ensemble metrics is that with sufficient model damping (e.g., by L2 regularization), variance over models can approach zero41 for compounds very distant from training data, leading to over-condence in model predictions.
Another approach to obtain model-derived variances in dropout-regularized neural networks is Monte Carlo dropout (mc-dropout)50 (Fig. 1). In mc-dropout, a single trained model is run repeatedly with varied dropout masks, randomly eliminating nodes from the model (ESI Text S1†). The variance over these predictions provides an effective credible interval with the modest cost of running the model multiple times rather than the added cost of model re-training. In transition metal complex discovery, we found that dropout-generated credible intervals provided a good estimate of errors on a set aside test partition but were over-confident when applied to more diverse transition metal complexes.7,8 Consistent with the ensembles and mcdropout estimates, uncertainty in ANNs can be interpreted by taking a Bayesian view of weight uncertainty where a prior is assumed over the distribution of weights of the ANN and then updated upon observing data, giving a distribution over possible models.51 However, if the distribution of the new test data is distinct from training data, as is expected in chemical discovery, this viewpoint on model uncertainty may be incomplete.

Janet, J. P.; Duan, C.; Yang, T.; Nandy, A.; Kulik, H. J. A Quantitative Uncertainty Metric Controls Error in Neural Network-Driven Chemical Discovery. Chem. Sci. 2019, 10 (34), 7913–7922. https://doi.org/10.1039/C9SC02298H.

and

For example, when the OCHEM data set served as the test set, the RMSE remained nearly constant despite successive removal of predictions ranked lowest by STD but decreased with removal of predictions ranked lowest by SDC (Figure 1B). That is, STD failed to identify predictions with large errors, whereas SDC successfully identified and removed these predictions. This contrast was more pronounced in the RMSEs of the Bradley test set: whereas successive removal of predictions ranked lowest by SDC considerably reduced the RMSEs of the remaining predictions, removal of predictions ranked lowest by STD gradually increased the RMSE of the remaining predictions. In other words, only SDC was successful in removing predictions with large errors.

Liu, R.; Wallqvist, A. Molecular Similarity-Based Domain Applicability Metric Efficiently Identifies Out-of-Domain Compounds. J. Chem. Inf. Model. 2019, 59 (1), 181–189. https://doi.org/10.1021/acs.jcim.8b00597.

Note: sum of distance-weighted contributions == SDC, standard deviation == STD

Btw, I think this PR is a good contribution. Just getting curious in the context of https://github.com/ml-evs/modnet-matbench/issues/18 and a few other projects and want to get out of my echo chamber.

Also relevant to MEGNet uncertainty quantification is the unlockNN repository mentioned in #338. Thanks also for the patience with me opening discussion back up on a PR from several years ago.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants