Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to properly save and load an experiment #76

Closed
arvieFrydenlund opened this issue May 7, 2019 · 6 comments
Closed

How to properly save and load an experiment #76

arvieFrydenlund opened this issue May 7, 2019 · 6 comments
Assignees
Labels
bug Something isn't working fixready Fix has landed on master.

Comments

@arvieFrydenlund
Copy link

I have a modified version of this https://botorch.org/tutorials/custom_botorch_model_in_ax

where I have saved the experiment after each call to get_botorch.

        for i in range(len(exp.trials.values()), num_bo_trails+2):
            print('Running optimization batch {}/{}'.format(i+1, num_bo_trails))
            model = get_botorch(experiment=exp, data=exp.eval(), search_space=exp.search_space,
                                model_constructor=_get_and_fit_gp)

            save(exp, args.bo_save_path)
            batch = exp.new_trial(generator_run=model.gen(1))

If that loop gets interupted, I want to be able to reload the experiment and restart the loop from where it left off. However I get his issue:

File "Torch1venv/venv/lib/python3.6/site-packages/ax/core/observation.py", line 189, in observations_from_data
obs_parameters = experiment.arms_by_name[features["arm_name"]].parameters.copy()
KeyError: '0'

After the first get_botorch call after I try to load up again.

Also I noticed that the trail status always seems to be 'status=TrialStatus.RUNNING' and never completed? Do I manually need to set trials to completed?

Thanks.

@lena-kashtelyan
Copy link
Contributor

Hello, @arvieFrydenlund! Re: your second question, thank you for pointing that out, we will update SimpleExperiment to change trial status when they have been completed. As to the reloading bug, I'm taking a look now.

@lena-kashtelyan
Copy link
Contributor

Hey, @arvieFrydenlund, we tried to repro the bug you are getting, and coudn't get the same issue to come up. Would you mind sharing your full notebook?

@kkashin kkashin added the bug Something isn't working label May 8, 2019
@arvieFrydenlund
Copy link
Author

arvieFrydenlund commented May 11, 2019

This should work as a minimum example.
You can run this the whole way through, then run it again and it will work fine as all experiments were done in the first run.

However, if you then delete the save, run it again but kill the process (or add say if i == 5: exit() in the last loop), then try to run it again (which should load the trails that had been completed before the kill) I then get that error.

import argparse

from ax import ParameterType
from ax import RangeParameter
from ax import SearchSpace
from ax import SimpleExperiment
from ax import save
from ax import load
from ax.modelbridge import get_sobol
from ax.modelbridge.factory import get_botorch

from botorch.models import SingleTaskGP

def run(parameterization, *_unused):
    return {'ce': (0.0, 0.0)}

def _get_and_fit_gp(Xs, Ys, **kwargs):
    return SingleTaskGP(Xs[0], Ys[0].view(-1))

def _main():
    bo_save_path = 'delete_this.json'
    # experiment
    parameters = [RangeParameter(name='y0', parameter_type=ParameterType.FLOAT, lower=0.01, upper=0.25)]
    search_space = SearchSpace(parameters)
    # load or set up experiment with initial sobel runs
    if os.path.exists(bo_save_path):
        exp = load(bo_save_path)
        print(exp.arms_by_name)
        print(exp.__dict__)
    else:
        exp = SimpleExperiment(name='exp',
                               search_space=search_space,
                               evaluation_function=run,
                               objective_name='ce',
                               minimize=True)

        number_of_initial_independent_runs = 5
        sobol = get_sobol(exp.search_space, seed=42)  # remember to seed this thing like a farmer
        exp.new_batch_trial(generator_run=sobol.gen(number_of_initial_independent_runs))  # makes 5 random values of y0

    save(exp, bo_save_path)

    num_bo_trails = 20
    for i, v in enumerate(exp.trials.values()):
        print(i, v)
    print('There have been {} trials '.format(len(exp.trials.values())))
    if len(exp.trials.values()) is not num_bo_trails + 1:
        for i in range(len(exp.trials.values()), num_bo_trails+2):
            print('Running optimization batch {}/{}'.format(i+1, num_bo_trails))
            model = get_botorch(experiment=exp, data=exp.eval(), search_space=exp.search_space,
                                model_constructor=_get_and_fit_gp)

            save(exp, bo_save_path)
            batch = exp.new_trial(generator_run=model.gen(1))

    print("Done!")


if __name__ == '__main__':
    _main()

facebook-github-bot pushed a commit that referenced this issue May 13, 2019
Summary: This is a fix for #76 -- basically there were two separate issues, but both had to do with JSON encoding not working properly.

Reviewed By: kkashin

Differential Revision: D15314286

fbshipit-source-id: 92bafd5d462562d1fa671992cba72133155dd0a2
@kkashin kkashin added the fixready Fix has landed on master. label May 13, 2019
@arvieFrydenlund
Copy link
Author

Hey, I did a new pull and the minimum example still breaks, though in a different way now (but the status is now correct though). However I'm not sure if its just me who is doing this wrong or if its an issue on your end? Is simple experiment not the way to go for this?

If I run it the first time with if i == 5: exit(), then rerun it without that I now get

0 BatchTrial(experiment_name='exp', index=0, status=TrialStatus.COMPLETED)
1 Trial(experiment_name='exp', index=1, status=TrialStatus.COMPLETED)
2 Trial(experiment_name='exp', index=2, status=TrialStatus.COMPLETED)
3 Trial(experiment_name='exp', index=3, status=TrialStatus.COMPLETED)
4 Trial(experiment_name='exp', index=4, status=TrialStatus.COMPLETED)
There have been 5 trials
Running optimization batch 6/20
[INFO 05-13 13:45:21] StandardizeY: Outcome ce is constant, within tolerance.
Running optimization batch 7/20
Traceback (most recent call last):
File "min_example.py", line 68, in
_main()
File "min_example.py", line 55, in _main
model = get_botorch(experiment=exp, data=exp.eval(), search_space=exp.search_space,
File "/home/arvie/PycharmProjects/Torch1venv/venv/lib/python3.6/site-packages/ax/core/simple_experiment.py", line 144, in eval
for trial in self.trials.values()
File "/home/arvie/PycharmProjects/Torch1venv/venv/lib/python3.6/site-packages/ax/core/simple_experiment.py", line 145, in
if trial.status != TrialStatus.FAILED
File "/home/arvie/PycharmProjects/Torch1venv/venv/lib/python3.6/site-packages/ax/core/simple_experiment.py", line 108, in eval_trial
f"Cannot evaluate trial {trial.index} as no attached data was "
ValueError: Cannot evaluate trial 5 as no attached data was found and no evaluation function is set on this SimpleExperiment.``SimpleExperiment is geared to synchronous and sequential cases where each trial is evaluated before more trials are created. For all other cases, use Experiment.

@ldworkin
Copy link
Contributor

ldworkin commented May 13, 2019

Hey @arvieFrydenlund -- sorry, I forgot to mention this! It's a simple fix on your end. After you load the experiment from the json file, you'll just need to re-set the evaluation function, e.g.

exp = load(bo_save_path)
exp.evaluation_function = run

We don't store evaluation functions, since function serialization is a difficult problem. We should make this more clear though :)

@ldworkin
Copy link
Contributor

Closing, since this should be fixed in our current release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixready Fix has landed on master.
Projects
None yet
Development

No branches or pull requests

4 participants