New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental emulator #42
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great @jangerit. I like the overall idea and implementation, especially since it uses some standard packages (pytorch, blitz) for implementing the models.
I'm wondering if there is any way to get a smoother workflow for people who want to use this. Is there any way you could write a function that people would pass a domain and dataset and get back a benchmark? I think that's something we could sell to people; we have a workflow that allows you to take your experimental data and in one line of code create a benchmark for future testing.
Also, in terms of including datasets in the package, I think there are some best practices for this (i.e,. setup.py as a package_data
parameter). Since the datasets are small, I think we can include them in the wheel distributed on PyPI.
---------- | ||
return_X_y : bool, default=False | ||
If True, returns ``(data, target)`` instead of a `data` dict object. | ||
See below for more information about the `data` and `target` object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case parameter should be documented here.
Update of this PR includes an easier workflow for creating Experimental Emulators:
from summit.benchmarks import ExperimentalEmulator, ReizmanSuzukiEmulator
from summit.utils.dataset import DataSet
import numpy as np
test_domain = ReizmanSuzukiEmulator().domain
e = ExperimentalEmulator(domain=test_domain, model_name="Pytest")
columns = [v.name for v in e.domain.variables]
train_values = {("catalyst", "DATA"): ["P1-L2", "P1-L7", "P1-L3", "P1-L3"], ("t_res", "DATA"): [60, 120, 110, 250],
("temperature", "DATA"): [110, 30, 70, 80], ("catalyst_loading", "DATA"): [0.508, 0.6, 1.4, 1.3],
("yield", "DATA"): [20, 40, 60, 34], ("ton", "DATA"): [33, 34, 21, 22]}
train_dataset = DataSet(train_values, columns=columns)
e.train(train_dataset, verbose=False, cv_fold=2, test_size=0.25)
columns = [v.name for v in e.domain.variables]
values = [
float(v.bounds[0] + 0.6 * (v.bounds[1] - v.bounds[0])) if v.variable_type == 'continuous' else v.levels[-1] for v in
e.domain.variables]
values = np.array(values)
values = np.atleast_2d(values)
conditions = DataSet(values, columns=columns)
results = e.run_experiments(conditions) |
Note for future updates: I've had issues to add torch to poetry: Command line that finally worked: |
I think we should note this in an issue to work on later, especially since this is a Linux specific fix. |
Amazing! I think we need some more documentation but let's handle that in another PR. For now, I'm really happy with this. |
This PR adds an experimental emulator (data-driven virtual experiment) to Summit.
The emulator is a BNN predicting experimental outcomes (e.g. yield) given the inputs (i.e., the conditions) of an experiment. It is included in benchmarks.
The following workflow describes the way to build such an emulator:
experimental_datasets.py
that returns inputs and outcomes of the experiments in the uploaded dataset (c.f.load_reizman_suzuki()
)bnn.py
python bnn.py
-> The final BNN model parameters will be saved in ./trained_models and the emulator training is finished.
Create a benchmark with the inputs and outputs of the dataset the BNN emulator was trained on. Thereby, set up a BNN model similar to the trained BNN and load the parameters of the trained model (c.f.
reizman_suzuki_emulator.py
).-> The emulator is now ready for use and acts like a virtual experiment.
Further regression techniques, like ANNs, can be implemented analogously according to the workflow described above.
Emulators included in this PR are based on the data for a Suzuki-Miyaura cross coupling reaction (#25) obtained from the SI of the paper published by Reizman et al. (2016).