Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Ax as a supplier of candidates for black box evaluation #120

Closed
avimit opened this issue Jul 7, 2019 · 28 comments
Closed

Using Ax as a supplier of candidates for black box evaluation #120

avimit opened this issue Jul 7, 2019 · 28 comments
Assignees
Labels
fixready Fix has landed on master. question Further information is requested

Comments

@avimit
Copy link

avimit commented Jul 7, 2019

Hi,

I have been trying, in resent days, to use Ax for my task.

The use case: supplying X new candidates for evaluation, given known+pending evaluations. Our "evaluation" is a training & testing of an ML model done on a cloud sever. I just want to feed the results to the BO model, and get new points for evaluation = to have Ax power our HPO. No success yet.

In BoTorch, I achieved this goal, with these 5 lines at the core:

model = botorch.models.SingleTaskGP(X, Y)
mll = gpytorch.mlls.ExactMarginalLogLikelihood(model.likelihood, model)
botorch.fit.fit_gpytorch_model(mll)

acquisition_function = botorch.acquisition.qNoisyExpectedImprovement(model, X_baseline)
X_candidates_tensor = botorch.optim.joint_optimize(acquisition_function, bounds=bounds, 
                                                   q=batch_size, num_restarts=1, raw_samples=len(X))

I've been trying to use BotorchModel via the developer API. Questions:

  • Do I have to state an evaluation function when defining an "experiment"? In our use case the function is a "black box": we have a platform for launching train jobs as resources are freed, and collecting evaluations when ready, and I want to get from Ax X new candidates for evaluation, as in the BoTorch example above.
  • I couldn't find how to load the known+pending evaluations to the model.
  • Are the objective_weights, that the gen() function of BotorchModel requires, weights for low/high-fidelity evals?

Have I been looking at the wrong place? Should I have been using the service API (loosing some flexibility)?
Could you please direct me to relevant examples in both APIs?

(One of my main reasons for shifting to Ax, is that I want in the future to optimize over a mixed domain: some parameters continuous, and some discrete; but this is a different question...)

Thanks a lot,
Avi

@lena-kashtelyan
Copy link
Contributor

Hello, @avimit! May I ask, just as a clarification, why the Service API does not work for you? That is the API we generally intended for a use case where trials are evaluated externally and then the data is logged back to Ax, so overall it seems that the Service API should be the right fit. If the issue is the need to pass custom models, that is possible, by passing a generation_strategy argument as mentioned here in the tutorial, and I am happy to provide a more complete example. Let me know!

@lena-kashtelyan lena-kashtelyan self-assigned this Jul 8, 2019
@lena-kashtelyan lena-kashtelyan added the question Further information is requested label Jul 8, 2019
@avimit
Copy link
Author

avimit commented Jul 9, 2019

I haven't tried the Service API before this week; I had assumed (wrongly?) that the developer API is the one for me.

No, the issue was never custom models. The issue was loading known+pending evaluations to a model, and getting next candidates for evaluation. And doing so externally, without stating the evaluation function.

I will have a look at this example. Thanks!

@eytan
Copy link
Contributor

eytan commented Jul 9, 2019

@avimit , the service API definitely supports the asynchronous evaluation with proper handling of pending points. Please let us know if this works for you, and if you have any suggestions for how we could make this functionality clearer in the docs (I can see how calling the Developer API the "Developer API" is a little confusing, since all developers might think it's the API for them ;).

Re: your query about objective_weights, this has nothing to do with fidelities. It instead specifies how you should weight multiple outcomes, if using a multi-output GP to model your results. FWIW, we are actively working on having more first-class support for multi-fidelity BayesOpt in Ax/BoTorch, and it should be available in the coming months.

@avimit
Copy link
Author

avimit commented Jul 9, 2019

@lena-kashtelyan, @eytan, thank you again for the generous responsiveness 🙏🏼

There is one obstacle which still stops me from managing a full run with the service API: I still haven't found how to load known (= with results) and pending evaluations to the Ax client.

Another question: I was happy to read in the documentation that when not setting a generation strategy "one is intelligently chosen based on properties of search space" - this could be very handy. Two questions:

  1. Does this include a mixed search space containing both discrete and continuous parameters?
  2. If one does want to control the model & acquisition function used, are they taken from BoTorch? Can you direct me to an example of such user control?

@lena-kashtelyan
Copy link
Contributor

lena-kashtelyan commented Jul 10, 2019

@avimit, hello again!

Re: loading known and pending evaluations –– by evaluations here, do you mean trial evaluations? A pending trial would be one, the parameterization which which has been suggested by the AxClient, but evaluation data has not yet been logged back for; a known trial would be the one that has been "completed", so data for it has been logged? If so, you can view those as trials on the experiment, as ax.experiment.trials; if you are looking to access some more specific data about them, let me know, and I can show how to do so.

  1. It does, but at the moment the choice of the generation strategy is rather basic: it uses quasi-random strategy for mostly discrete spaces and GP+EI for semi-discrete and continuous search spaces. We plan on expanding the strategies included in this default choice in the future.

  2. Yes, you can! You would need to construct a GenerationStrategy and pass it into the instantiation of the client: AxClient(generation_strategy=GenerationStrategy(...)). A good example of one can be found in the section 2 of this tutorial: https://github.com/facebook/Ax/blob/master/tutorials/benchmarking_suite_example.ipynb. Let me know if you need help with it! The Models enum functionality is new and exists on master, but not in the stable version yet, so you might want to just specify models in GenerationStep-s as factory functions, see the description in the tutorial.

@avimit
Copy link
Author

avimit commented Jul 10, 2019

@lena-kashtelyan, thanks!

No, I mean pre-existing results, known beforehand, before using Ax; initiations from outside.

..and on top of that, as you write a "pending trial, the parameterization which which has been suggested by the AxClient, but evaluation data has not yet been logged back for"

And also, how do I report results of AxClient suggested evaluations back to Ax?

@avimit
Copy link
Author

avimit commented Jul 11, 2019

@lena-kashtelyan, hi,

I did mange to load results to the client; not sure if it's the correct way. I will describe, and then ask a few questions:

Description:

  • For loading pre-existing results, I used attach_trial for the parameters, and then complete_trial for the known result.
  • For getting next candidates I used get_next_trial, and in time complete_trial again, after the evaluation is ready.

Questions:

  1. I understand that all these functions support only 1-arm trials, correct?
  2. When using complete_trial, I only supply a trial index, and the result - a float number. I noticed that this value is accepted as the "mean", while "sem" is set to 0.0. Is this OK?
  3. I also noticed that with no prior results, get_next_trial will not supply more than two trials, because "not enough data has been observed to fit next model", which makes sense. When I load 3 prior results, as described before, I can get as many trial as I want, but they look bad; here's an example:
[[ 4.79597718 12.81666481]
 [-1.43194072 14.2205354 ]
 [ 7.9622972   1.        ]
 [ 5.81201589  1.        ]
 [ 5.76916304  1.        ]
 [ 5.77305317  1.        ]
 [ 5.7656176   1.        ]
 [ 5.77405888  1.        ]
 [ 5.7857986   1.        ]
 [ 5.78060032  1.        ]
 [ 5.79093689  1.        ]
 [ 5.8179453   1.        ]
 [ 5.77304465  1.        ]
 [ 5.78786731  1.        ]
 [ 5.82333215  1.        ]
 [ 5.74413777  1.        ]
 [ 5.78686904  1.        ]
 [ 5.76843816  1.        ]
 [ 5.7590922   1.        ]
 [ 5.78587056  1.        ]]
  1. It looks like the acquisition is not set right...

@lena-kashtelyan
Copy link
Contributor

@avimit, hello again,

Re: description of your actions sounds exactly right –– that is how the Service API is meant to be used.

  1. Yes, for now that is the case. Is it limiting for your use case?

  2. That is the expected behavior, yes, we assume that the results are noiseless in that case. We are currently working on improving that logic, which will improve model fits, because noiselessness will no longer be assumed. If your actual evaluation function is noisy, then assuming that it's not will give not the best model fits, so you would be better off estimating your SEM, as discussed in this issue.

  3. Something strange is going on there. You should be able to obtain at least 5 trials without adding more data. Would it be possible for you to share a reproducible example for your case? I would only need the search space description (how many parameters and what their domains are), as well as what points you were attaching thorough attach_trial. If I examine what went wrong there, it will likely also explain why the suggested trials are bad.

@avimit
Copy link
Author

avimit commented Jul 11, 2019

@lena-kashtelyan, good to hear from you again,

  1. Not really, I can always loop over the desired number of trials.
  2. Thank you, I will look into that. Our evaluation, usually the training & testing of CNNs, must be, by definition, noisy. Isn't leaving the SEM 0.0 an error? - when I tried BoTorch I started with the qNoisyExpectedImprovement acquisition function.
  3. You are right about that! - I found a shameful stupid bug... Now it is indeed 5. Still, the phenomenon from before continues:
    - With no prior results, get_next_trial will not supply more than 5 trials;
    - After I supply 3 prior results, get_next_trialwill go on, but the trials beyond 5 look bad:
[[ 8.02219987  9.00363529]
 [ 4.73311067  7.36813933]
 [-1.74387068 13.48540092]
 [-4.00791582  5.75379705]
 [ 2.58076489 11.87318027]
 [ 5.74114028  1.        ]
 [ 5.81732725  1.        ]
 [ 5.77250866  1.        ]
 [ 5.77864214  1.        ]
 [ 5.79553306  1.        ]]

       These are the parameters with which I create the experiment:

[{'name': 'input', 'type': 'range', 'bounds': [-5, 10], 'value_type': 'float'}, 
 {'name': 'input2', 'type': 'range', 'bounds': [1, 15], 'value_type': 'float'}]

@avimit
Copy link
Author

avimit commented Jul 12, 2019 via email

@avimit
Copy link
Author

avimit commented Jul 12, 2019

I tried to generate 10 trials, as you can see above, after supplying 3 init points.

@lena-kashtelyan
Copy link
Contributor

lena-kashtelyan commented Jul 12, 2019

@avimit, hello again ––

My apologies, I deleted my comment after realizing I did not need the points, since I was able to reproduce the issue on my own. Thank you very much for your patience and cooperation.

  1. Great to hear!

  2. Regarding the question with the SEM, in your case it would be best to pass some form of a SEM estimation (@bletham, would you mind chiming in on what a good way to do so could be, for now?). Is that something that will be possible in your case? We are also working on moving away from the noiselessness assumption, but that will take some time on our part. I will also consult with my other teammates on what a good solution for your case would be in the meantime.

  3. There was a bug in the Service API, which was causing weird arms to be generated. The fix is ready and will be in on master shortly, and I will update you when it is –– thank you very much for pointing it out!

@avimit
Copy link
Author

avimit commented Jul 12, 2019

@lena-kashtelyan, thank you!
This is to be expected with such a new package. I will be waiting for your update (no rush). I appreciate Ax, and like what you are planning to add to it 👍

@lena-kashtelyan
Copy link
Contributor

@avimit, thank you for your feedback and your patience! I will keep you posted as the changed are merged onto master.

@lena-kashtelyan
Copy link
Contributor

@avimit, the fix for the service API bug should now be on master, and the trials it's generating for you should look more reasonable. Also, regarding the fact that it will not generate more trials after the first 5, if you need more trials in parallel in the beginning, check out this section of the Service API tutorial. At the end, there is an explanation of the flag you can use.

For the SEM being set to 0, I will update you when that behavior is fixed! Thank you, again, for pointing out!

@lena-kashtelyan lena-kashtelyan added the fixready Fix has landed on master. label Jul 17, 2019
@avimit
Copy link
Author

avimit commented Jul 17, 2019

@lena-kashtelyan, I now installed version 0.1.3 (ax-platform-0.1.3): still getting these strange trials after number 5.

...Ah, I see: the fix is on master, but not yet on a release...

I now installed master, and the trials are indeed more diverse:

[[-3.95379175  4.3494333 ]
 [ 2.23415419 13.51886213]
 [ 3.07661593  6.61754888]
 [ 5.03874123 13.24272346]
 [ 0.27027205  6.11668086]
 [10.          1.        ]
 [ 5.3336149   1.        ]
 [10.         15.        ]
 [ 5.75261546  3.77404557]
 [ 1.08852536  1.        ]]

Still, they are different in nature compared to the first 5...
Is this how it should be?

Thanks!
Avi

@lena-kashtelyan
Copy link
Contributor

lena-kashtelyan commented Jul 19, 2019

@avimit, I reproduced your case locally (just to verify that the behavior will be the same), started by attaching the three initial trials you provided:

ax = AxClient()
ax.create_experiment(
    parameters=[
        {'name': 'input', 'type': 'range', 'bounds': [-5, 10], 'value_type': 'float'}, 
        {'name': 'input2', 'type': 'range', 'bounds': [1, 15], 'value_type': 'float'},
    ],
)
ax.attach_trial({"input":10., 'input2': 4.78911683})
ax.complete_trial(0, -5.13350902)
ax.complete_trial(1, -17.5083)
ax.attach_trial({"input":-5, 'input2': 15.})
ax.attach_trial({"input":9.03619777, 'input2': 3.18465702})
ax.complete_trial(2, -2.14991914)

Then, I generated 15 more trials without completing any more than the initial three:

for _ in range(15):
    ax.get_next_trial()

And these are the parameterizations that were generated:

  1. 10.0, 4.78 # initial parameterizations

  2. -5, 15.0

  3. 9.03, 3.18

  4. -0.89, 8.04 # Sobol-generated parameterizations

  5. -4.90, 2.87

  6. 3.98, 13.38

  7. 5.11, 1.

  8. -1.92, 11.53

  9. 10.0, 1.0 # GP+EI-generated parameterizations

  10. 5.19, 4.81

  11. 9.99, 14.99

  12. 0.44, 1.0

  13. 2.11, 4.68

  14. 4.60, 8.94

  15. 7.74, 1.0

  16. 10.0, 11.81

  17. -2.13, 3.20

  18. 5.57, 15.0

The GP-generated trials are quite similar to yours, for me. And they are indeed different in nature from the quasi-randomly generated ones; this is because they are generated through GP + EI. Those points are chosen so as to target areas with a mixture of high uncertainty and good objective function value, whereas the quasi-random points are purely exploratory. Let me know if this helps!

@avimit
Copy link
Author

avimit commented Jul 19, 2019

@lena-kashtelyan, thank you! So you are saying that this is normal behaviour. I need to read your documentation more closely: wasn't aware of the automatic shift between generation methods.
Can you direct me to a documentation section which discusses this?

I read here that "generation_strategy – Optional generation strategy. If not set, one is intelligently chosen based on properties of search space." Can I read somewhere about the rules of the intelligent choice making? Does defining a GenerationStrategy manually, mean setting a list of GenerationSteps, each with model & number of arms?

Update: I ran now with many more prior results (23, instead of 3), and the generated trials (20 of them) do look more 'random':

[[-4.8454738  12.40396094]
 [ 1.50334656  3.58794704]
 [ 3.60955745 11.0689379 ]
 [ 5.65392017  2.62550272]
 [-0.67834571  8.59934282]
 [-3.42358052 12.55679034]
 [-3.42573924 11.21177024]
 [ 9.76430453  2.62833629]
 [-3.54783257 13.47168885]
 [ 8.82704277  1.        ]
 [-4.1205391  11.67763715]
 [-2.67112402 10.86252798]
 [-2.91145959 12.19877435]
 [ 0.87686565  6.37085831]
 [ 3.46594351  1.        ]
 [-2.48823271  9.03475795]
 [-4.36813646 10.50712756]
 [-3.91070174 12.71645143]
 [ 9.08590565  2.04251522]
 [ 1.84706187  5.81079829]]

I also rolled back to version 0.1.3 and got strange repetitive trials again, so the bug-fix in master really fixed a bug 👍🏻

BTW, I noticed that when (accidentally) providing too many duplicate initial points, Ax crushes with RuntimeError.

@lena-kashtelyan
Copy link
Contributor

@avimit, it is normal behavior indeed! I was just making notes on how we need to expand the documentation on GenerationStrategy and its choice by the Service API. Stay tuned, we will add those docs shortly! In your case, the generation strategy you are getting is 5 Sobol trials, then Bayesian Optimization (GP+EI) trials powered by BoTorch. The code for that choice resides in ax/service/utils/dispatch.py, in case that helps.

To set the generation strategy manually, you will indeed need to make one, with the GenerationStep-s. Let me know if you have questions about them; those need more documentation too, so for now I can be the human docusaurus : )

Just out of curiosity –– are you looking to specify a generation strategy of your own for research / experimental purposes?

Regarding the runtime error, is it coming from gpytorch? May I see the stacktrace if you come across that error again?

@avimit
Copy link
Author

avimit commented Jul 25, 2019

@lena-kashtelyan Yes, I may wish to specify a generation strategy of my own, but at a later stage, not right away.

I will try to reproduce the runtime error and update. Thanks

Update: I reproduced it, and sorry, it is an error when running BoTorch, not Ax ... I am testing several packages in a sequence (GPyOpt, BoBorch, Ax), and didn't notice that it failed before the Ax section. Indeed, as you had guessed, the error is coming from gpytorch.

If you are still interested, although it's not an Ax error, this is the error message tail:

  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/botorch/fit.py", line 35, in fit_gpytorch_model
    mll, _ = optimizer(mll, track_iterations=False, **kwargs)
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/botorch/optim/fit.py", line 188, in fit_gpytorch_scipy
    callback=cb,
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/scipy/optimize/_minimize.py", line 603, in minimize
    callback=callback, **options)
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/scipy/optimize/lbfgsb.py", line 335, in _minimize_lbfgsb
    f, g = func_and_grad(x)
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/scipy/optimize/lbfgsb.py", line 285, in func_and_grad
    f = fun(x, *args)
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/scipy/optimize/optimize.py", line 293, in function_wrapper
    return function(*(wrapper_args + args))
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/scipy/optimize/optimize.py", line 63, in __call__
    fg = self.fun(x, *args)
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/botorch/optim/fit.py", line 223, in _scipy_objective_and_grad
    loss = -mll(*args).sum()
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/gpytorch/module.py", line 22, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py", line 28, in forward
    res = output.log_prob(target)
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/gpytorch/distributions/multivariate_normal.py", line 129, in log_prob
    inv_quad, logdet = covar.inv_quad_logdet(inv_quad_rhs=diff.unsqueeze(-1), logdet=True)
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py", line 992, in inv_quad_logdet
    cholesky = CholLazyTensor(self.cholesky())
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py", line 718, in cholesky
    res = self._cholesky()
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/gpytorch/utils/memoize.py", line 34, in g
    add_to_cache(self, cache_name, method(self, *args, **kwargs))
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py", line 403, in _cholesky
    cholesky = psd_safe_cholesky(evaluated_mat.double()).to(self.dtype)
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/gpytorch/utils/cholesky.py", line 47, in psd_safe_cholesky
    raise e
  File "/Users/temp/.virtualsenv/tf36/lib/python3.6/site-packages/gpytorch/utils/cholesky.py", line 21, in psd_safe_cholesky
    L = torch.cholesky(A, upper=upper, out=out)
RuntimeError: cholesky_cpu: U(2,2) is zero, singular U.

@lena-kashtelyan
Copy link
Contributor

lena-kashtelyan commented Jul 29, 2019

@avimit, sounds good! Please feel free to open up an issue regarding adding a custom generation strategy if you need help with it.

I will pass the error on to the BoTorch folks –– I think it's something they've dealt with before. Is it blocking for you at all? Thank you for reporting it!

@avimit
Copy link
Author

avimit commented Jul 29, 2019

@lena-kashtelyan Not blocking, but it does demand consideration: I would have to de-duplicate init points (sometimes we collect results from several previous runs, and it can be that some of them used the same "grid" initiation, and so share X points)

@lena-kashtelyan
Copy link
Contributor

@avimit, I just heard back from @Balandat regarding this issue, and it seems like it would be helpful to have a slightly more elaborate repro: next time you see the errors, could you record what the trials and the rest of the data were?

If that helps, you should be able to get the data via ax_client.experiment.fetch_data() and the trials via {trial_name: trial.arm.parameters for (trial_name, trial) in ax_client.experiment.trials.items()}.

@avimit
Copy link
Author

avimit commented Aug 23, 2019

@lena-kashtelyan, I noticed that you don't yet have a tag with the above bug fix (the last one is still 0.1.3), pip install ax-platform will install a version with the bug, no?

@lena-kashtelyan
Copy link
Contributor

@avimit, we're getting a release ready today / tomorrow!

@avimit
Copy link
Author

avimit commented Aug 29, 2019

@lena-kashtelyan, Hi, I read that v0.1.4 Release is broken, and not to be used.
Will v0.1.5 be coming soon?

@lena-kashtelyan
Copy link
Contributor

lena-kashtelyan commented Aug 29, 2019

@avimit, it's out today! With it comes the fix for this bug. Closing the issue for now; feel free to reopen if something seems unsolved still, and thank you so much again for you patience and feedback.

Edit: both bugs discussed in this issue actually –– assumption that SEM is 0 and generating similar trials because the existing ones have not yet been updated with data.

@avimit
Copy link
Author

avimit commented Sep 3, 2019

@avimit, hello again,

Re: description of your actions sounds exactly right –– that is how the Service API is meant to be used.

  1. Yes, for now that is the case. Is it limiting for your use case?
  2. That is the expected behavior, yes, we assume that the results are noiseless in that case. We are currently working on improving that logic, which will improve model fits, because noiselessness will no longer be assumed. If your actual evaluation function is noisy, then assuming that it's not will give not the best model fits, so you would be better off estimating your SEM, as discussed in this issue.
  3. Something strange is going on there. You should be able to obtain at least 5 trials without adding more data. Would it be possible for you to share a reproducible example for your case? I would only need the search space description (how many parameters and what their domains are), as well as what points you were attaching thorough attach_trial. If I examine what went wrong there, it will likely also explain why the suggested trials are bad.

Hi,
Do I understand correctly that point 2 was also fixed in v0.1.5, and noiselessness is no longer assumed?

Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixready Fix has landed on master. question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants