Experimental emulator #42

jangerit · 2020-06-23T18:26:21Z

This PR adds an experimental emulator (data-driven virtual experiment) to Summit.

The emulator is a BNN predicting experimental outcomes (e.g. yield) given the inputs (i.e., the conditions) of an experiment. It is included in benchmarks.

The following workflow describes the way to build such an emulator:

In the folder summit/benchmarks/experiment_emulator:
1. upload dataset to the ./data folder with real experimental data
2. create a loading procedure in experimental_datasets.py that returns inputs and outcomes of the experiments in the uploaded dataset (c.f. load_reizman_suzuki())
3. set up BNN parameters (e.g. dataset, input dimension, prediction objective, hyperparameters, saving location) and data transformation (e.g.transform discrete variables to one-hot vectors) in bnn.py
4. run python bnn.py
  -> The final BNN model parameters will be saved in ./trained_models and the emulator training is finished.
In the folder summit/benchmarks:
Create a benchmark with the inputs and outputs of the dataset the BNN emulator was trained on. Thereby, set up a BNN model similar to the trained BNN and load the parameters of the trained model (c.f. reizman_suzuki_emulator.py).
-> The emulator is now ready for use and acts like a virtual experiment.

Further regression techniques, like ANNs, can be implemented analogously according to the workflow described above.

Emulators included in this PR are based on the data for a Suzuki-Miyaura cross coupling reaction (#25) obtained from the SI of the paper published by Reizman et al. (2016).

marcosfelt

This is great @jangerit. I like the overall idea and implementation, especially since it uses some standard packages (pytorch, blitz) for implementing the models.

I'm wondering if there is any way to get a smoother workflow for people who want to use this. Is there any way you could write a function that people would pass a domain and dataset and get back a benchmark? I think that's something we could sell to people; we have a workflow that allows you to take your experimental data and in one line of code create a benchmark for future testing.

Also, in terms of including datasets in the package, I think there are some best practices for this (i.e,. setup.py as a package_data parameter). Since the datasets are small, I think we can include them in the wheel distributed on PyPI.

marcosfelt · 2020-06-26T13:08:46Z

summit/benchmarks/experiment_emulator/experimental_datasets.py

+    ----------
+    return_X_y : bool, default=False
+        If True, returns ``(data, target)`` instead of a `data` dict object.
+        See below for more information about the `data` and `target` object.


The case parameter should be documented here.

This reverts commit 9dddba8.

This reverts commit da6a88c.

jangerit · 2020-07-13T16:42:08Z

Update of this PR includes an easier workflow for creating Experimental Emulators:

the emulator is initialized wrt summit.domain;
the emulator can then be trained, validated, and inferred on summit.datasets (it is also possible to provide a csv-file with training data that has the format of a summit.dataset, cf. summit/benchmarks/experiment/emulator/data/baumgartner_aniline_cn_crosscoupling.csv;
for the regression a BNN is employed (the emulator class can easily be extended by other regressors).
Example:

from summit.benchmarks import ExperimentalEmulator, ReizmanSuzukiEmulator
from summit.utils.dataset import DataSet
import numpy as np

test_domain = ReizmanSuzukiEmulator().domain
e = ExperimentalEmulator(domain=test_domain, model_name="Pytest")
columns = [v.name for v in e.domain.variables]
train_values = {("catalyst", "DATA"): ["P1-L2", "P1-L7", "P1-L3", "P1-L3"], ("t_res", "DATA"): [60, 120, 110, 250],
                     ("temperature", "DATA"): [110, 30, 70, 80], ("catalyst_loading", "DATA"): [0.508, 0.6, 1.4, 1.3],
                     ("yield", "DATA"): [20, 40, 60, 34], ("ton", "DATA"): [33, 34, 21, 22]}
train_dataset = DataSet(train_values, columns=columns)
e.train(train_dataset, verbose=False, cv_fold=2, test_size=0.25)
columns = [v.name for v in e.domain.variables]
values = [
    float(v.bounds[0] + 0.6 * (v.bounds[1] - v.bounds[0])) if v.variable_type == 'continuous' else v.levels[-1] for v in
    e.domain.variables]
values = np.array(values)
values = np.atleast_2d(values)
conditions = DataSet(values, columns=columns)
results = e.run_experiments(conditions)

jangerit · 2020-07-13T16:56:47Z

Note for future updates:

I've had issues to add torch to poetry: poetry add torch=1.4.0 did not work.

Command line that finally worked: poetry add https://download.pytorch.org/whl/cu101/torch-1.4.0-cp37-cp37m-linux_x86_64.whl, c.f. stackoverflow/60079421.

marcosfelt · 2020-07-13T18:26:06Z

Note for future updates:

I've had issues to add torch to poetry: poetry add torch=1.4.0 did not work.

Command line that finally worked: poetry add https://download.pytorch.org/whl/cu101/torch-1.4.0-cp37-cp37m-linux_x86_64.whl, c.f. stackoverflow/60079421.

I think we should note this in an issue to work on later, especially since this is a Linux specific fix.

marcosfelt · 2020-07-13T18:26:37Z

Update of this PR includes an easier workflow for creating Experimental Emulators:

the emulator is initialized wrt summit.domain;
the emulator can then be trained, validated, and inferred on summit.datasets (it is also possible to provide a csv-file with training data that has the format of a summit.dataset, cf. summit/benchmarks/experiment/emulator/data/baumgartner_aniline_cn_crosscoupling.csv;
for the regression a BNN is employed (the emulator class can easily be extended by other regressors).
Example:

from summit.benchmarks import ExperimentalEmulator, ReizmanSuzukiEmulator
from summit.utils.dataset import DataSet
import numpy as np

test_domain = ReizmanSuzukiEmulator().domain
e = ExperimentalEmulator(domain=test_domain, model_name="Pytest")
columns = [v.name for v in e.domain.variables]
train_values = {("catalyst", "DATA"): ["P1-L2", "P1-L7", "P1-L3", "P1-L3"], ("t_res", "DATA"): [60, 120, 110, 250],
                     ("temperature", "DATA"): [110, 30, 70, 80], ("catalyst_loading", "DATA"): [0.508, 0.6, 1.4, 1.3],
                     ("yield", "DATA"): [20, 40, 60, 34], ("ton", "DATA"): [33, 34, 21, 22]}
train_dataset = DataSet(train_values, columns=columns)
e.train(train_dataset, verbose=False, cv_fold=2, test_size=0.25)
columns = [v.name for v in e.domain.variables]
values = [
    float(v.bounds[0] + 0.6 * (v.bounds[1] - v.bounds[0])) if v.variable_type == 'continuous' else v.levels[-1] for v in
    e.domain.variables]
values = np.array(values)
values = np.atleast_2d(values)
conditions = DataSet(values, columns=columns)
results = e.run_experiments(conditions)

Amazing! I think we need some more documentation but let's handle that in another PR. For now, I'm really happy with this.

jangerit added 7 commits June 23, 2020 00:29

first commit emulator

6aa6460

add layer freezing

0201e6e

add test for emulator

2824e76

update parity plots

0939e20

adjust imports

fa4c9b2

fix doctest

902c998

update docs

0c40aff

jangerit requested a review from marcosfelt June 23, 2020 18:32

remove tester.py

5539183

marcosfelt reviewed Jun 26, 2020

View reviewed changes

jangerit added 20 commits July 1, 2020 23:41

make easy workflow for emulator 1

8db0465

make emulator workflow easier 2

005f72a

fix loading in infer_model

57bbb03

include predefined Emulators

7b4f723

delete old files

241ea0b

update doctest

f38ff72

update doctest and fix bugs

9fac4c8

resolve conflicts

8cb842f

merge master

0ef8bea

run reizman, add plot function

81845a0

update BNN strcuture, train reizman mdoels

4da3b01

delete test files

4218c47

fix path bug

aa3d236

update dir in trained models

4c9b670

fix doctest

8d45a3e

fix doctest 2

c05e3c3

add Emulator to_dict, from_dict

138bff4

add cross-validation, descriptor var, data preprocessing

487e279

update descriptor inference

d09ec3a

merge master, resolve conflicts

2a48f4b

jangerit added 9 commits July 13, 2020 03:32

rebuild dep, add categorical var

9d818ab

add trained models

966d8fa

update tests

1d688c2

resolve conflicts

da6a88c

resolve conflicts 2

9dddba8

Revert "resolve conflicts 2"

1e93742

This reverts commit 9dddba8.

Revert "resolve conflicts"

943a78d

This reverts commit da6a88c.

merge master

534d157

rebuild dependencies

0081e6b

jangerit assigned marcosfelt and jangerit Jul 13, 2020

jangerit linked an issue Jul 13, 2020 that may be closed by this pull request

Cross-coupling benchmark #25

Closed

marcosfelt approved these changes Jul 13, 2020

View reviewed changes

marcosfelt merged commit 6dcd243 into master Jul 13, 2020

jangerit mentioned this pull request Jul 13, 2020

Poetry add torch #56

Closed

marcosfelt deleted the exp_emul branch August 3, 2020 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental emulator #42

Experimental emulator #42

jangerit commented Jun 23, 2020 •

edited

marcosfelt left a comment

marcosfelt Jun 26, 2020

jangerit commented Jul 13, 2020 •

edited

jangerit commented Jul 13, 2020 •

edited

marcosfelt commented Jul 13, 2020

marcosfelt commented Jul 13, 2020

Experimental emulator #42

Experimental emulator #42

Conversation

jangerit commented Jun 23, 2020 • edited

marcosfelt left a comment

Choose a reason for hiding this comment

marcosfelt Jun 26, 2020

Choose a reason for hiding this comment

jangerit commented Jul 13, 2020 • edited

jangerit commented Jul 13, 2020 • edited

marcosfelt commented Jul 13, 2020

marcosfelt commented Jul 13, 2020

jangerit commented Jun 23, 2020 •

edited

jangerit commented Jul 13, 2020 •

edited

jangerit commented Jul 13, 2020 •

edited