Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental emulator #42

Merged
merged 37 commits into from Jul 13, 2020
Merged

Experimental emulator #42

merged 37 commits into from Jul 13, 2020

Conversation

jangerit
Copy link
Contributor

@jangerit jangerit commented Jun 23, 2020

This PR adds an experimental emulator (data-driven virtual experiment) to Summit.

The emulator is a BNN predicting experimental outcomes (e.g. yield) given the inputs (i.e., the conditions) of an experiment. It is included in benchmarks.

The following workflow describes the way to build such an emulator:

  • In the folder summit/benchmarks/experiment_emulator:
    1. upload dataset to the ./data folder with real experimental data
    2. create a loading procedure in experimental_datasets.py that returns inputs and outcomes of the experiments in the uploaded dataset (c.f. load_reizman_suzuki())
    3. set up BNN parameters (e.g. dataset, input dimension, prediction objective, hyperparameters, saving location) and data transformation (e.g.transform discrete variables to one-hot vectors) in bnn.py
    4. run python bnn.py
      -> The final BNN model parameters will be saved in ./trained_models and the emulator training is finished.
  • In the folder summit/benchmarks:
    Create a benchmark with the inputs and outputs of the dataset the BNN emulator was trained on. Thereby, set up a BNN model similar to the trained BNN and load the parameters of the trained model (c.f. reizman_suzuki_emulator.py).
    -> The emulator is now ready for use and acts like a virtual experiment.

Further regression techniques, like ANNs, can be implemented analogously according to the workflow described above.

Emulators included in this PR are based on the data for a Suzuki-Miyaura cross coupling reaction (#25) obtained from the SI of the paper published by Reizman et al. (2016).

@jangerit jangerit requested a review from marcosfelt June 23, 2020 18:32
Copy link
Member

@marcosfelt marcosfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great @jangerit. I like the overall idea and implementation, especially since it uses some standard packages (pytorch, blitz) for implementing the models.

I'm wondering if there is any way to get a smoother workflow for people who want to use this. Is there any way you could write a function that people would pass a domain and dataset and get back a benchmark? I think that's something we could sell to people; we have a workflow that allows you to take your experimental data and in one line of code create a benchmark for future testing.

Also, in terms of including datasets in the package, I think there are some best practices for this (i.e,. setup.py as a package_data parameter). Since the datasets are small, I think we can include them in the wheel distributed on PyPI.

----------
return_X_y : bool, default=False
If True, returns ``(data, target)`` instead of a `data` dict object.
See below for more information about the `data` and `target` object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case parameter should be documented here.

@jangerit
Copy link
Contributor Author

jangerit commented Jul 13, 2020

Update of this PR includes an easier workflow for creating Experimental Emulators:

  • the emulator is initialized wrt summit.domain;
  • the emulator can then be trained, validated, and inferred on summit.datasets (it is also possible to provide a csv-file with training data that has the format of a summit.dataset, cf. summit/benchmarks/experiment/emulator/data/baumgartner_aniline_cn_crosscoupling.csv;
  • for the regression a BNN is employed (the emulator class can easily be extended by other regressors).
    Example:
from summit.benchmarks import ExperimentalEmulator, ReizmanSuzukiEmulator
from summit.utils.dataset import DataSet
import numpy as np

test_domain = ReizmanSuzukiEmulator().domain
e = ExperimentalEmulator(domain=test_domain, model_name="Pytest")
columns = [v.name for v in e.domain.variables]
train_values = {("catalyst", "DATA"): ["P1-L2", "P1-L7", "P1-L3", "P1-L3"], ("t_res", "DATA"): [60, 120, 110, 250],
                     ("temperature", "DATA"): [110, 30, 70, 80], ("catalyst_loading", "DATA"): [0.508, 0.6, 1.4, 1.3],
                     ("yield", "DATA"): [20, 40, 60, 34], ("ton", "DATA"): [33, 34, 21, 22]}
train_dataset = DataSet(train_values, columns=columns)
e.train(train_dataset, verbose=False, cv_fold=2, test_size=0.25)
columns = [v.name for v in e.domain.variables]
values = [
    float(v.bounds[0] + 0.6 * (v.bounds[1] - v.bounds[0])) if v.variable_type == 'continuous' else v.levels[-1] for v in
    e.domain.variables]
values = np.array(values)
values = np.atleast_2d(values)
conditions = DataSet(values, columns=columns)
results = e.run_experiments(conditions)

@jangerit
Copy link
Contributor Author

jangerit commented Jul 13, 2020

Note for future updates:

I've had issues to add torch to poetry: poetry add torch=1.4.0 did not work.

Command line that finally worked: poetry add https://download.pytorch.org/whl/cu101/torch-1.4.0-cp37-cp37m-linux_x86_64.whl, c.f. stackoverflow/60079421.

@jangerit jangerit linked an issue Jul 13, 2020 that may be closed by this pull request
@marcosfelt
Copy link
Member

Note for future updates:

I've had issues to add torch to poetry: poetry add torch=1.4.0 did not work.

Command line that finally worked: poetry add https://download.pytorch.org/whl/cu101/torch-1.4.0-cp37-cp37m-linux_x86_64.whl, c.f. stackoverflow/60079421.

I think we should note this in an issue to work on later, especially since this is a Linux specific fix.

@marcosfelt
Copy link
Member

Update of this PR includes an easier workflow for creating Experimental Emulators:

  • the emulator is initialized wrt summit.domain;
  • the emulator can then be trained, validated, and inferred on summit.datasets (it is also possible to provide a csv-file with training data that has the format of a summit.dataset, cf. summit/benchmarks/experiment/emulator/data/baumgartner_aniline_cn_crosscoupling.csv;
  • for the regression a BNN is employed (the emulator class can easily be extended by other regressors).
    Example:
from summit.benchmarks import ExperimentalEmulator, ReizmanSuzukiEmulator
from summit.utils.dataset import DataSet
import numpy as np

test_domain = ReizmanSuzukiEmulator().domain
e = ExperimentalEmulator(domain=test_domain, model_name="Pytest")
columns = [v.name for v in e.domain.variables]
train_values = {("catalyst", "DATA"): ["P1-L2", "P1-L7", "P1-L3", "P1-L3"], ("t_res", "DATA"): [60, 120, 110, 250],
                     ("temperature", "DATA"): [110, 30, 70, 80], ("catalyst_loading", "DATA"): [0.508, 0.6, 1.4, 1.3],
                     ("yield", "DATA"): [20, 40, 60, 34], ("ton", "DATA"): [33, 34, 21, 22]}
train_dataset = DataSet(train_values, columns=columns)
e.train(train_dataset, verbose=False, cv_fold=2, test_size=0.25)
columns = [v.name for v in e.domain.variables]
values = [
    float(v.bounds[0] + 0.6 * (v.bounds[1] - v.bounds[0])) if v.variable_type == 'continuous' else v.levels[-1] for v in
    e.domain.variables]
values = np.array(values)
values = np.atleast_2d(values)
conditions = DataSet(values, columns=columns)
results = e.run_experiments(conditions)

Amazing! I think we need some more documentation but let's handle that in another PR. For now, I'm really happy with this.

@marcosfelt marcosfelt merged commit 6dcd243 into master Jul 13, 2020
@jangerit jangerit mentioned this pull request Jul 13, 2020
@marcosfelt marcosfelt deleted the exp_emul branch August 3, 2020 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cross-coupling benchmark
2 participants