WIP: Implement bootstrap #107

yonatank93 · 2023-03-14T16:17:08Z

We want to implement a bootstrap method of UQ. In general, we do this by taking the compute arguments and samples from the list of compute arguments with replacement. Then, we train the model using this sample of compute arguments. The optimal parameters give a point in the ensemble.

TODO:

Documentation
Example(s)
Tests
Discuss changes done in some internal functions

This modification is to help with generating the bootstrap compute arguments. The default, however, is to return a flat list, which is the same behavior as before this modification.

This implementation is to allow the use of `scipy.optimize.least_squares` function when using `_WrapperCalculator`.

These updates were made to help running bootstrap sampling for neural network model.

Instead of sample from each calculator independently, we will combine all the compute arguments and sample from the combined list. Then, we will split them into their respective calculators.

We want to cache the initial guess that we set prior to training because in the bootstrap, we want to start the training NOT from the last optimal parameter. If we use the last optimal values, the result might be biased from seeing the entire dataset. We want to treat each step in the bootstrap sampling as though the bootstrap sample compute arguments are the only data we have.

The default is to use cached initial parameter guess.

For NN model, we use `reset_parameters` method implemented in each layer module. For the empirical model, I removed the option to have custom function to get the initial guess. This is done to make the class behave more similarly to the bootstrap NN class. However, this can be implemented in the future, if needed.

When the default bootstrap cas generator is used with _WrapperCalculator, the number of configurations for each calculator might be different than the original list. As such, the old list of residual functions might not be appropriate for some configurations. For example if we use the original list of residual functions, we might use `forces_residual` when the bootstrap configuration only compute for energy.

yonatank93 · 2023-03-14T16:19:40Z

Note that this is still a very rough draft, though functional.
The intended use of the class, at least for empirical models, is illustrated in examples/examples_bootstrap_SW_Si.py.
I want to have a similar workflow/syntax for NN models.

mjwen · 2023-03-14T21:32:55Z

thanks @yonatank93! Let me know when ready and I'll take a look.

yonatank93 · 2023-03-14T22:05:16Z

I will @mjwen. Sorry that it is still very premature. By the way, the example for running bootstrap that I currently have reflects the workflow I intend users to use. That is, I want users to get an ensemble of parameters first before propagating the error to some other predictions, or in other words, the set of parameters is fixed. This is to answer your previous email.

mjwen · 2023-03-14T22:45:15Z

@yonatank93 totally ok! I just wanted to make sure I get notified when it is ready.

…le calculators Previously, if the cas in separate calculators have the same identifier, there was a problem. Fix this by appending information about which calculator each ca comes from.

Note that we haven't tested the case when we have models for multiple elements.

I think this version works with CalculatorTorchSeparateSpecies.

* Create a parent class for the 2 bootstrap classes, because they have many same methods. * Create a wrapper class to automatically pick between the 2 bootstrap classes depending on the type of loss function. * Update the test * Add typing

yonatank93 · 2023-03-17T17:49:31Z

@mjwen The bootstrap implementation is ready for you to look at. I haven't added anything in kliff/doc, I will work on this once we are ok with this implementation.

The callback function can also break the loop inside the run method. It can also be used to monitor the convergence of the optimization in each iteration, etc.

kliff/calculators/calculator.py

kliff/calculators/calculator_torch.py

kliff/uq/bootstrap.py

examples/example_uq_bootstrap.py

* Create a parent class for the 2 bootstrap classes, because they have many same methods. * Create a wrapper class to automatically pick between the 2 bootstrap classes depending on the type of loss function. * Update the test * Add typing

The callback function can also break the loop inside the run method. It can also be used to monitor the convergence of the optimization in each iteration, etc.

* Typing * Fix documentation

I added an argument to specify how many bootstrap compute arguments to generate for each sample. Generally, in each sample we want to have the same number of compute arguments as the number of compute arguments in the original list. However, with this option there might be an interesting study that can be done, e.g., if we use fewer compute arguments in each sample.

* Previously, there was a problem when getting the parameters if we use gpu. * I changed so that if we set flat=False when getting parameters, it now returns a list of torch.nn.Parameters instead of torch.Tensor.

Previously, I changed to use `torch.Parameter` and update the parameters using something like ``` for param in model.parameters(): param = <torch.Parameter> ``` However, this doesn't seem to update the parameters in the model. So, I revert back so that updating parameters is done by doing ``` for param in model.parameters(): param.data = <torch.Tensor> ```

The tests include: * Test for the function to retrieve the size of parameters. * Test if the parameter values are updated accordingly. * Test if changing parameters lead to change in predictions.

optimizer settings

There was an issue that the old way to compute sigma raised a `VisibleDeprecationWarning` for some python version and error for Python > 3.9. I think this was because we were using list of float and array inside `np.linalg.norm` to compute the norm of value corresponding to each data point.

…liff into implement_bootstrap

yonatank93 · 2023-04-04T01:22:28Z

@mjwen I applied what we just discussed. Now I see that adding the argument seed is more elegant, like what you suggested. I can also see how this automatically leads to reproducible UQ results, even if the user doesn't pay attention to it. I use the seed to create a local random number generator, which I think satisfies some of my reasoning above too.

I also ran pre-commit and built the documentation, but you might want to double-check it. It is ready for you to review again.

mjwen · 2023-04-04T01:35:53Z

We have CI checking the formatting, using the same pre-commit configs. So as long as it passes the checks, it is good.

Everything is good now. I've merged it. Thanks!

yonatank93 · 2023-04-04T01:37:10Z

Thanks for your help.

yonatank93 and others added 13 commits January 9, 2023 09:13

Initial version of bootstrap class for empirical model

bbd3272

get_compute_arguments can return a flat or nested list

14fc880

This modification is to help with generating the bootstrap compute arguments. The default, however, is to return a flat list, which is the same behavior as before this modification.

Add has_opt_params_bounds method to _WrapperCalculator

f2aacc1

This implementation is to allow the use of `scipy.optimize.least_squares` function when using `_WrapperCalculator`.

Initial updates on the neural network calculator

d7f7893

These updates were made to help running bootstrap sampling for neural network model.

Initial working script for bootstrap neural network

f534814

MOdify the default bootstrap cas generator for empirical model

290aa11

Instead of sample from each calculator independently, we will combine all the compute arguments and sample from the combined list. Then, we will split them into their respective calculators.

Add an option to input initial guess in each step

09f8184

The default is to use cached initial parameter guess.

Clean up

22f01e0

Documentation for bootstrap sampler clas for empirical models

ff39f77

Initial draft of bootstrap example for empirical model

cb176d3

Yonatan Kurniawan added 9 commits March 15, 2023 22:08

Test for bootstrap empirical model

26f783d

BUG: default generator function for empirical model when using multip…

54b844d

…le calculators Previously, if the cas in separate calculators have the same identifier, there was a problem. Fix this by appending information about which calculator each ca comes from.

Documentation for bootstrap NN class

c6e9c85

Test for bootstrap neural network model

10d98a5

Note that we haven't tested the case when we have models for multiple elements.

Clean up and update documentation

2c2da19

Work out the compatibility with CalculatorTorchSeparateSpecies

8c01c5d

I think this version works with CalculatorTorchSeparateSpecies.

Finallyze the draft of bootstrap example

be23b7e

Refactoring

e0b2879

* Create a parent class for the 2 bootstrap classes, because they have many same methods. * Create a wrapper class to automatically pick between the 2 bootstrap classes depending on the type of loss function. * Update the test * Add typing

Run pre-commit

bb29000

Add callback function to bootstrap empirical model

cd5da02

The callback function can also break the loop inside the run method. It can also be used to monitor the convergence of the optimization in each iteration, etc.

mjwen self-requested a review March 21, 2023 14:57

mjwen requested changes Mar 21, 2023

View reviewed changes

Yonatan Kurniawan and others added 25 commits April 3, 2023 19:06

Finallyze the draft of bootstrap example

65e56b6

Refactoring

44c26d9

* Create a parent class for the 2 bootstrap classes, because they have many same methods. * Create a wrapper class to automatically pick between the 2 bootstrap classes depending on the type of loss function. * Update the test * Add typing

Run pre-commit

2b983ff

Add callback function to bootstrap empirical model

7672b6d

The callback function can also break the loop inside the run method. It can also be used to monitor the convergence of the optimization in each iteration, etc.

Apply changes based on minor feedback

248e670

* Typing * Fix documentation

DOC: convert to google style

bc30a4e

BUG: Fix get parameter and update parameter

35ef9e4

* Previously, there was a problem when getting the parameters if we use gpu. * I changed so that if we set flat=False when getting parameters, it now returns a list of torch.nn.Parameters instead of torch.Tensor.

DOC: Convert to googledoc style

d08e432

Fix example for the documentation page

6d1f464

Add an option to specify callback function for NN case

2322919

Also import each bootstrap class for empirical and NN models

ad9aa34

DOC: Add page about bootstrapping

c13382d

TST: Tests for retrieving and updating NN model parameters

f59bac8

The tests include: * Test for the function to retrieve the size of parameters. * Test if the parameter values are updated accordingly. * Test if changing parameters lead to change in predictions.

DOC: Add the shape of numpy array.

e623e38

Remove commands that are not needed

9de3928

DOC: Fix typos

f872874

Add a default random seed and use a local random seed generator

9a40764

Remove pretraining before running bootstrap and added notes about

ad62939

optimizer settings

Update due to DeprecationWarning

c7b014c

Merge branch 'implement_bootstrap' of https://github.com/yonatank93/k…

4e14f66

…liff into implement_bootstrap

Apply pre-commit

0978449

Build documentation

ad844e6

mjwen approved these changes Apr 4, 2023

View reviewed changes

mjwen merged commit 02d5a8c into openkim:master Apr 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Implement bootstrap #107

WIP: Implement bootstrap #107

yonatank93 commented Mar 14, 2023 •

edited

Loading

yonatank93 commented Mar 14, 2023

mjwen commented Mar 14, 2023 •

edited

Loading

yonatank93 commented Mar 14, 2023

mjwen commented Mar 14, 2023

yonatank93 commented Mar 17, 2023

yonatank93 commented Apr 4, 2023

mjwen commented Apr 4, 2023

yonatank93 commented Apr 4, 2023

WIP: Implement bootstrap #107

WIP: Implement bootstrap #107

Conversation

yonatank93 commented Mar 14, 2023 • edited Loading

yonatank93 commented Mar 14, 2023

mjwen commented Mar 14, 2023 • edited Loading

yonatank93 commented Mar 14, 2023

mjwen commented Mar 14, 2023

yonatank93 commented Mar 17, 2023

yonatank93 commented Apr 4, 2023

mjwen commented Apr 4, 2023

yonatank93 commented Apr 4, 2023

yonatank93 commented Mar 14, 2023 •

edited

Loading

mjwen commented Mar 14, 2023 •

edited

Loading