How to properly save and load an experiment #76

arvieFrydenlund · 2019-05-07T19:27:44Z

I have a modified version of this https://botorch.org/tutorials/custom_botorch_model_in_ax

where I have saved the experiment after each call to get_botorch.

        for i in range(len(exp.trials.values()), num_bo_trails+2):
            print('Running optimization batch {}/{}'.format(i+1, num_bo_trails))
            model = get_botorch(experiment=exp, data=exp.eval(), search_space=exp.search_space,
                                model_constructor=_get_and_fit_gp)

            save(exp, args.bo_save_path)
            batch = exp.new_trial(generator_run=model.gen(1))

If that loop gets interupted, I want to be able to reload the experiment and restart the loop from where it left off. However I get his issue:

File "Torch1venv/venv/lib/python3.6/site-packages/ax/core/observation.py", line 189, in observations_from_data
obs_parameters = experiment.arms_by_name[features["arm_name"]].parameters.copy()
KeyError: '0'

After the first get_botorch call after I try to load up again.

Also I noticed that the trail status always seems to be 'status=TrialStatus.RUNNING' and never completed? Do I manually need to set trials to completed?

Thanks.

lena-kashtelyan · 2019-05-07T20:59:56Z

Hello, @arvieFrydenlund! Re: your second question, thank you for pointing that out, we will update SimpleExperiment to change trial status when they have been completed. As to the reloading bug, I'm taking a look now.

lena-kashtelyan · 2019-05-07T21:22:33Z

Hey, @arvieFrydenlund, we tried to repro the bug you are getting, and coudn't get the same issue to come up. Would you mind sharing your full notebook?

arvieFrydenlund · 2019-05-11T16:54:35Z

This should work as a minimum example.
You can run this the whole way through, then run it again and it will work fine as all experiments were done in the first run.

However, if you then delete the save, run it again but kill the process (or add say if i == 5: exit() in the last loop), then try to run it again (which should load the trails that had been completed before the kill) I then get that error.

import argparse

from ax import ParameterType
from ax import RangeParameter
from ax import SearchSpace
from ax import SimpleExperiment
from ax import save
from ax import load
from ax.modelbridge import get_sobol
from ax.modelbridge.factory import get_botorch

from botorch.models import SingleTaskGP

def run(parameterization, *_unused):
    return {'ce': (0.0, 0.0)}

def _get_and_fit_gp(Xs, Ys, **kwargs):
    return SingleTaskGP(Xs[0], Ys[0].view(-1))

def _main():
    bo_save_path = 'delete_this.json'
    # experiment
    parameters = [RangeParameter(name='y0', parameter_type=ParameterType.FLOAT, lower=0.01, upper=0.25)]
    search_space = SearchSpace(parameters)
    # load or set up experiment with initial sobel runs
    if os.path.exists(bo_save_path):
        exp = load(bo_save_path)
        print(exp.arms_by_name)
        print(exp.__dict__)
    else:
        exp = SimpleExperiment(name='exp',
                               search_space=search_space,
                               evaluation_function=run,
                               objective_name='ce',
                               minimize=True)

        number_of_initial_independent_runs = 5
        sobol = get_sobol(exp.search_space, seed=42)  # remember to seed this thing like a farmer
        exp.new_batch_trial(generator_run=sobol.gen(number_of_initial_independent_runs))  # makes 5 random values of y0

    save(exp, bo_save_path)

    num_bo_trails = 20
    for i, v in enumerate(exp.trials.values()):
        print(i, v)
    print('There have been {} trials '.format(len(exp.trials.values())))
    if len(exp.trials.values()) is not num_bo_trails + 1:
        for i in range(len(exp.trials.values()), num_bo_trails+2):
            print('Running optimization batch {}/{}'.format(i+1, num_bo_trails))
            model = get_botorch(experiment=exp, data=exp.eval(), search_space=exp.search_space,
                                model_constructor=_get_and_fit_gp)

            save(exp, bo_save_path)
            batch = exp.new_trial(generator_run=model.gen(1))

    print("Done!")


if __name__ == '__main__':
    _main()

Summary: This is a fix for #76 -- basically there were two separate issues, but both had to do with JSON encoding not working properly. Reviewed By: kkashin Differential Revision: D15314286 fbshipit-source-id: 92bafd5d462562d1fa671992cba72133155dd0a2

arvieFrydenlund · 2019-05-13T17:51:28Z

Hey, I did a new pull and the minimum example still breaks, though in a different way now (but the status is now correct though). However I'm not sure if its just me who is doing this wrong or if its an issue on your end? Is simple experiment not the way to go for this?

If I run it the first time with if i == 5: exit(), then rerun it without that I now get

0 BatchTrial(experiment_name='exp', index=0, status=TrialStatus.COMPLETED)
1 Trial(experiment_name='exp', index=1, status=TrialStatus.COMPLETED)
2 Trial(experiment_name='exp', index=2, status=TrialStatus.COMPLETED)
3 Trial(experiment_name='exp', index=3, status=TrialStatus.COMPLETED)
4 Trial(experiment_name='exp', index=4, status=TrialStatus.COMPLETED)
There have been 5 trials
Running optimization batch 6/20
[INFO 05-13 13:45:21] StandardizeY: Outcome ce is constant, within tolerance.
Running optimization batch 7/20
Traceback (most recent call last):
File "min_example.py", line 68, in
_main()
File "min_example.py", line 55, in _main
model = get_botorch(experiment=exp, data=exp.eval(), search_space=exp.search_space,
File "/home/arvie/PycharmProjects/Torch1venv/venv/lib/python3.6/site-packages/ax/core/simple_experiment.py", line 144, in eval
for trial in self.trials.values()
File "/home/arvie/PycharmProjects/Torch1venv/venv/lib/python3.6/site-packages/ax/core/simple_experiment.py", line 145, in
if trial.status != TrialStatus.FAILED
File "/home/arvie/PycharmProjects/Torch1venv/venv/lib/python3.6/site-packages/ax/core/simple_experiment.py", line 108, in eval_trial
f"Cannot evaluate trial {trial.index} as no attached data was "
ValueError: Cannot evaluate trial 5 as no attached data was found and no evaluation function is set on this SimpleExperiment.``SimpleExperiment is geared to synchronous and sequential cases where each trial is evaluated before more trials are created. For all other cases, use Experiment.

ldworkin · 2019-05-13T20:34:38Z

Hey @arvieFrydenlund -- sorry, I forgot to mention this! It's a simple fix on your end. After you load the experiment from the json file, you'll just need to re-set the evaluation function, e.g.

exp = load(bo_save_path)
exp.evaluation_function = run

We don't store evaluation functions, since function serialization is a difficult problem. We should make this more clear though :)

ldworkin · 2019-05-19T12:34:25Z

Closing, since this should be fixed in our current release.

kkashin added the bug Something isn't working label May 8, 2019

kkashin assigned ldworkin May 8, 2019

kkashin added the fixready Fix has landed on master. label May 13, 2019

ldworkin closed this as completed May 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to properly save and load an experiment #76

How to properly save and load an experiment #76

arvieFrydenlund commented May 7, 2019

lena-kashtelyan commented May 7, 2019

lena-kashtelyan commented May 7, 2019

arvieFrydenlund commented May 11, 2019 •

edited

Loading

arvieFrydenlund commented May 13, 2019

ldworkin commented May 13, 2019 •

edited

Loading

ldworkin commented May 19, 2019

How to properly save and load an experiment #76

How to properly save and load an experiment #76

Comments

arvieFrydenlund commented May 7, 2019

lena-kashtelyan commented May 7, 2019

lena-kashtelyan commented May 7, 2019

arvieFrydenlund commented May 11, 2019 • edited Loading

arvieFrydenlund commented May 13, 2019

ldworkin commented May 13, 2019 • edited Loading

ldworkin commented May 19, 2019

arvieFrydenlund commented May 11, 2019 •

edited

Loading

ldworkin commented May 13, 2019 •

edited

Loading