-
Notifications
You must be signed in to change notification settings - Fork 92
/
using.dox
633 lines (506 loc) · 24.2 KB
/
using.dox
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
namespace bayesopt
{
/*! \page usemanual Using the library
\tableofcontents
The library is intended to be both fast and clear for development and
research. At the same time, it allows great level of customization and
guarantees a high level of accuracy and numerical robustness.
\section running Running your own problems.
The best way to design your own problem is by following one of the
examples. Basically, there are 3 steps that should be followed:
- Define the function to optimize.
- Modify the parameters of the optimization process. In general, many
problems can be solved with the default set of parameters, but some
of them will require some tuning.
- The set of parameters and the default set can be found in
parameters.h.
- In general most users will need to modify only the parameters
described in \ref basicparams.
- Advanced users should read \ref params for a full description of the parameters.
- Set and run the corresponding optimizer (continuous, discrete,
categorical, etc.). In this step, the corresponding restriction should
be defined.
- Continuous optimization requires box constraints (upper and lower bounds).
- Discrete optimization requires the set of discrete values.
- Categorical optimization requires the number of categories per dimension.
<HR>
\section basicparams Basic parameter setup
Many users will only need to change the following parameters. Advanced
users should read \ref params for a full description of the
parameters.
- \b n_iterations: Number of iterations of BayesOpt. Each iteration
corresponds with a target function evaluation. Curently, this is the
only stopping criteria. In general, more evaluations result in higher
precision [Default 190]
- \b noise: Observation noise/signal ratio. [Default 1e-6]
- For stochastic functions (if several evaluations of the same point
produce different results) it should match as close as possible the
variance of the noise with respect to the variance of the
signal. Too much noise results in slow convergence while not enough
noise might result in not converging at all.
- For simulations and deterministic functions, it should be close to
0. However, to avoid numerical instability due to model inaccuracies,
make it always greater than 0. For example, between 1e-10 and 1e-14.
If execution time is not an issue, accuracy might be improving
modifying the following parameters.
- \b l_type: Learning method for the kernel hyperparameters. Setting
this parameter to L_MCMC uses a more robust learning method which
might result in better accuracy, but the overall execution time will
increase. [Default L_EMPIRICAL]
- \b n_iter_relearn: Number of iterations between re-learning kernel
parameters. That is, kernel learning ocur 1 out of \em
n_iter_relearn iterations. Ideally, the best precision is obtained
when the kernel parameters are learned every iteration
(n_iter_relearn=1). However, this \i learning part is
computationally expensive and implies a higher cost per
iteration. If n_iter_relearn=0, then there is no
relearning. [Default 50]
- \b n_inner_iterations: (only for continuous optimization) Maximum
number of iterations (per dimension!) to optimize the acquisition
function (criteria). That is, each iteration corresponds with a
criterion evaluation. If the original problem is high dimensional or
the result is needed with high precision, we might need to increase
this value. [Default 500]
<HR>
\section usage API description
Here we show a brief summary of the different ways to use the
library. Basically, there are two ways to use the library based on
your coding style:
-Callback: The user sends a function pointer or handler to the
optimizer, following a prototype. This method is available for C/C++,
Python, Matlab and Octave.
-Inheritance: This is a more object oriented method and allows more
flexibility. The user creates a module with his function, process,
etc. This module inherits one of BayesOpt models, depending if the
optimization is discrete or continuous, and overrides the \em
evaluateSample method. This method is available only for C++ and
Python.
\subsection cusage C usage
This interface is the most standard approach. Due to the large
compatibility with C code with other languages it could also be used
for other languages such as Fortran, Ada, etc.
The function to optimize must agree with the template provided in
bayesopt.h
\code{.c}
double my_function (unsigned int n, const double *x, double *gradient, void *func_data);
\endcode
Note that the gradient has been included for future compatibility,
although in the current implementation, it is not used. You can just
ignore it or send a NULL pointer.
The parameters are defined in the bopt_params struct. The easiest way
to set the parameters is to use
\code{.c}
bopt_params initialize_parameters_to_default(void);
\endcode
and then, modify the necessary fields. For the non-numeric parameters,
there are a set of functions that can help to set the corresponding
parameters:
\code{.c}
void set_kernel(bopt_params* params, const char* name);
void set_mean(bopt_params* params, const char* name);
void set_criteria(bopt_params* params, const char* name);
void set_surrogate(bopt_params* params, const char* name);
void set_log_file(bopt_params* params, const char* name);
void set_load_file(bopt_params* params, const char* name);
void set_save_file(bopt_params* params, const char* name);
void set_learning(bopt_params* params, const char* name);
void set_score(bopt_params* params, const char* name);
\endcode
Basically, it just need a pointer to the parameters and a string for
the parameter value. For example:
\code{.c}
bopt_params params = initialize_parameters_to_default();
set_learning(¶ms,"L_MCMC");
\endcode
Once we have set the parameters and the function, we can called the
optimizer according to our problem.
-For the continuous case:
\code{.cpp}
int bayes_optimization(int nDim, // number of dimensions
eval_func f, // function to optimize
void* f_data, // extra data that is transferred directly to f
const double *lb, const double *ub, // bounds
double *x, // out: minimizer
double *minf, // out: minimum
bopt_params parameters);
\endcode
-For the discrete case:
\code{.cpp}
int bayes_optimization_disc(int nDim, // number of dimensions
eval_func f, // function to optimize
void* f_data, // extra data that is transferred directly to f
double *valid_x, size_t n_points, // set of discrete points
double *x, // out: minimizer
double *minf, // out: minimum
bopt_params parameters);
\endcode
-For the categorical case:
\code{.cpp}
int bayes_optimization_categorical(int nDim, // number of dimensions
eval_func f, // function to optimize
void* f_data, // extra data that is transferred directly to f
int *categories, // array of size nDim with the number of categories per dim
double *x, // out: minimizer
double *minf, // out: minimum
bopt_params parameters);
\endcode
This interface catches all the expected exceptions and returns error
codes for C compatibility.
\subsection cppusage C++ usage
Besides being able to use the library with the \ref cusage from C++,
we can also take advantage of the object oriented properties of the
language.
This is the most straightforward and complete method to use the
library. The object that must be optimized must inherit from one of
the models defined in bayesopt.hpp.
Then, we just need to override the virtual functions called \b
evaluateSample which correspond to the function to be
optimized.
Optionally, we can redefine \b checkReachability to declare nonlinear
constrain (if a point is invalid, checkReachability should return \i
false and if it is valid, \i true). Note that the latter feature is
experimental. There is no convergence guarantees if used.
For example, with for a continuous problem, we will define our optimizer as:
\code{.cpp}
class MyOptimization: public ContinuousModel
{
public:
MyOptimization(bopt_params param):
ContinuosModel(input_dimension,param)
{
// My constructor
}
double evaluateSample( const boost::numeric::ublas::vector<double> &query )
{
// My function here
};
bool checkReachability( const boost::numeric::ublas::vector<double> &query )
{
// My restrictions here
};
};
\endcode
where \i input_dimension is an size_t value with the number of input
dimensions of the problem. Note that for C++ we use the Parameters
class defined in parameters.hpp for convenience.
Then, we use it like:
\code{.cpp}
Parameters param;
params.l_type = L_MCMC;
MyOptimization optimizer(params);
//Define bounds and prepare result.
boost::numeric::ublas::vector<double> bestPoint(dim);
boost::numeric::ublas::vector<double> lowerBound(dim);
boost::numeric::ublas::vector<double> upperBound(dim);
//Set the bounds. This is optional. Default is [0,1]
//Only required because we are doing continuous optimization
optimizer.setBoundingBox(lowerBounds,upperBounds);
//Collect the result in bestPoint
optimizer.optimize(bestPoint);
\endcode
For discrete a categorical cases, we just need to inherit from the
\ref DiscreteModel. Depending on the type of input we can use the
corresponding constructor. In this case, the setBoundingBox
step should be skipped.
Optionally, we can also choose to run every iteration
independently. See bayesopt.hpp and bayesoptbase.hpp
\subsection pyusage Python usage
The file python/demo_quad.py provides simple example of different ways
to use the library from Python.
1. \b Parameters: The parameters are defined as a Python dictionary
with the same structure and names as the Parameters class in the C++
interface, with the exception of \em kernel.* and \em mean.* which are
replaced by \em kernel_ and \em mean_ respectively.
There is no need to fill all the parameters. If any of the parameter
is not included in the dictionary, the default value is included
instead.
2. a) \b Callback: The callback interface is just a wrapper of the C
interface. In this case, the callback function should have the prototype
\code{.py}
def my_function (query):
\endcode
where \em query is a numpy array and the function returns a double
scalar.
The optimization process for a continuous model can be called as
\code{.py}
y_out, x_out, error = bayesopt.optimize(my_function,
n_dimensions,
lower_bound,
upper_bound,
parameters)
\endcode
where the result is a tuple with the minimum as a numpy array (x_out),
the value of the function at the minimum (y_out) and an error code.
Analogously, the function for a discrete model is:
\code{.py}
y_out, x_out, error = bo.optimize_discrete(my_function,
x_set,
parameters)
\endcode
where x_set is an array of arrays with the valid inputs.
And for the categorical case:
\code{.py}
y_out, x_out, error = bo.optimize_discrete(my_function,
categories,
parameters)
\endcode
where categories is an integer array with the number of categories per dimension.
2. b) \b Inheritance: The object oriented methodology is similar to the C++
interface.
\code{.py}
from bayesoptmodule import BayesOptContinuous
class MyOptimization(BayesOptContinuous):
def __init__(self):
BayesOptContinuous.__init__(n_dimensions)
def evaluateSample(self,query):
""" My function here """
\endcode
Then, the optimization process can be called as
\code{.py}
import numpy as np
my_opt = MyOptimization()
# Set non-default parameters
params["l_type"] = "L_MCMC"
my_opt.params = params
# Set the bounds. This is optional. Default is [0,1]
# Only required because we are doing continuous optimization
my_opt.lower_bound = #numpy array
my_opt.upper_bound = #numpy array
# Collect the results
y_out, x_out, error = my_instance.optimize()
\endcode
where the result is a tuple with the minimum as a numpy array (x_out),
the value of the function at the minimum (y_out) and an error code.
For discrete a categorical cases, we just need to inherit from the
bayesoptmodule.BayesOptDiscrete or
bayesoptmodule.BayesOptCategorical. See bayesoptmodule.py. In this
case, the "set bounds" step should be skipped.
Note: For some "expected" error codes, a corresponding Python
exception is raised. However, this exception is raised once the error
code is found the Python environment, so it does not have track of any
exception happening in the C++ part of the code.
\subsection matusage Matlab/Octave usage
The file matlab/runtest.m provides an example of different ways to use
BayesOpt from Matlab/Octave.
The parameters are defined as a Matlab struct with the same structure
and names as the bopt_params struct in the C/C++ interface, with the
exception of \em kernel.* and \em mean.* which are replaced by \em
kernel_ and \em mean_ respectively. Also, C arrays are replaced with
vector, thus there is no need to set the number of elements as a
separate entry.
There is no need to fill all the parameters. If any of the parameter
is not included in the Matlab struct, the default value is
automatically included instead.
The Matlab/Octave interface is just a wrapper of the C interface. In
this case, the callback function should have the form
\code{.m}
function y = my_function (query):
\endcode
where \em query is a Matlab vector and the function returns a scalar
value.
The optimization process can be run for continuous variables (both in
Matlab and Octave) as
\code{.m}
[x_out, y_out] = bayesoptcont('my_function',
n_dimensions,
parameters,
lower_bound,
upper_bound);
\endcode
where the result is the minimum as a vector (x_out) and the value of
the function at the minimum (y_out).
Analogously, the optimization process for discrete variables:
\code{.m}
[x_out, y_out] = bayesoptdisc('my_function',
xset,
parameters);
\endcode
and for categorical variables:
\code{.m}
[x_out, y_out] = bayesoptcat('my_function',
categories,
parameters);
\endcode
In Matlab, but not in Octave, the optimization can also be called with
function handlers. For example:
\code{.m}
[x_out, y_out] = bayesoptcont(@my_function,
n_dimensions,
parameters,
lower_bound,
upper_bound)
\endcode
<HR>
\section params Understanding the parameters
BayesOpt relies on a complex and highly configurable mathematical
model. In theory, it should work reasonably well for many problems in
its default configuration. However, Bayesian optimization shines when
we can include as much knowledge as possible about the target function
or about the problem. Or, if the knowledge is not available, keep the
model as general as possible (to avoid bias). In this part, knowledge
about Gaussian processes or nonparametric models in general might be
useful.
It is recommendable to read the page about \ref bopttheory in advance.
The parameters are bundled in a structure (C/C++/Matlab/Octave) or
dictionary (Python), depending on the API that we use. This is a brief
explanation of every parameter.
\subsection budgetpar Budget parameters
This set of parameters have to deal with the number of evaluations or
iterations for each step.
- \b n_iterations: Number of iterations of BayesOpt. Each iteration
corresponds with a target function evaluation. In general, more
evaluations result in higher precision [Default 190]
- \b n_iter_relearn: Number of iterations between re-learning kernel
parameters. That is, kernel learning ocur 1 out of \em
n_iter_relearn iterations. Ideally, the best precision is obtained
when the kernel parameters are learned every iteration
(n_iter_relearn=1). However, this \i learning part is
computationally expensive and implies a higher cost per
iteration. If n_iter_relearn=0, then there is no
relearning. [Default 50]
- \b n_inner_iterations: (only for continuous optimization) Maximum
number of iterations (per dimension!) to optimize the acquisition
function (criteria). That is, each iteration corresponds with a
criterion evaluation. If the original problem is high dimensional or
the result is needed with high precision, we might need to increase
this value. [Default 500]
\subsection initpar Initialization parameters
Sometimes, BayesOpt requires an initial set of samples to learn a
preliminary model of the target function. This parameter is important
if n_iter_relearn is 0 or too high.
- \b n_init_samples: Initial set of samples. Each sample requires a
target function evaluation. [Default 10]
- \b init_method: (for continuous optimization only, unsigned integer)
There are different strategies available for the initial design:
[Default 1, LHS].
1. Latin Hypercube Sampling (LHS)
2. Sobol sequences
3. Uniform Sampling
Random numbers are used frequently, from initial design, to MCMC,
Thompson sampling, etc. They are based on the boost random number
library.
- \b random_seed: If this value is positive (including 0), then it is
used as a fixed seed for the boost random number generator. If the
value is negative, a time based (variable) seed is used. For
debugging or benchmarking purposes, it might be useful to freeze the
random seed. [Default -1, variable seed].
\subsection logpar Logging parameters
- \b verbose_level: (integer)
- Negative -> Error -> stdout
- 0 -> Warning -> stdout
- 1 -> Info -> stdout
- 2 -> Debug -> stdout
- 3 -> Warning -> log file
- 4 -> Info -> log file
- 5 -> Debug -> log file
- >5 -> Error -> log file
- \b log_filename: Name/path of the log file (if applicable,
verbose_level>=3) [Default "bayesopt.log"]
\subsection critpar Exploration/exploitation parameters
This is the set of parameters that drives the sampling procedure to
explore more unexplored regions or improve the best current result.
- \b crit_name: Name of the sample selection criterion or a
combination of them. It is used to select which points to evaluate
for each iteration of the optimization process. Could be a
combination of functions like
"cHedge(cEI,cLCB,cPOI,cThompsonSampling)". See section \ref critmod for
the different possibilities. [Default: "cEI"]
- \b crit_params, (n_crit_params): Array\Vector with the set of
parameters for the selected criteria. If there are more than one
criterion, the parameters are split among them according to the
number of parameters required for each criterion. If the vector is
empty or n_crit_params is 0, then the default parameters are
selected for each criteria. [Default: crit_params = []]. For C, the
array needs a size variable \b n_crit_params. In this case, default:
n_crit_params = 0.
- \b epsilon: According to some authors \cite Bull2011, it is
recommendable to include an epsilon-greedy strategy to achieve near
optimal convergence rates. Epsilon is the probability of performing
a random (blind) evaluation of the target function. Higher values
implies forced exploration while lower values relies more on the
exploration/exploitation policy of the criterion [Default 0.0
(epsilon-greedy disabled)]
- \b force-jump: Sometimes, specially when the number of initial
points is too small, the learned model might be wrong and the
optimization get stuck. Forced jumps measure the number of
iterations where the difference between consecutive observations is
smaller than the expected noise. Thus, we assume that any gain is
pure noise and we could get more information somewhere else. This
parameter sets the number of iterations with no gain before doing a
random jump. If the parameter is 0, then this is disable. [Default
20]
\subsection surrpar Surrogate model parameters
The main advantage of Bayesian optimization over other optimization
model is the use of a surrogate model. These parameters allow to
configure it. See Section \ref surrmod for a detailed description.
- \b surr_name: Name of the hierarchical surrogate function
(nonparametric process and the hyperpriors on sigma and w).
[Default "sGaussianProcess"]
- \b noise: Observation noise/signal ratio. [Default 1e-6]
- For stochastic functions (if several evaluations of the same point
produce different results) it should match as close as possible the
variance of the noise with respect to the variance of the
signal. Too much noise results in slow convergence while not enough
noise might result in not converging at all.
- For simulations and deterministic functions, it should be close to
0. However, to avoid numerical instability due to model inaccuracies,
make it always greater than 0. For example, between 1e-10 and 1e-14.
- \b sigma_s: (only used for "sGaussianProcess" and
"sGaussianProcessNormal") Known signal variance [Default 1.0]
- \b alpha, \b beta: (only used for "sStudentTProcessNIG")
Inverse-Gamma prior hyperparameters (if applicable) [Default 1.0,
1.0]
\subsubsection meanpar Mean function parameters
This set of parameters represents the mean function (or trend) of the
surrogate model.
- \b mean.name: Name of the mean function. Could be a combination of
functions like "mSum(mConst, mLinear)". See Section \ref parmod for
the different possibilities. [Default: "mConst"]
- \b mean.coef_mean, \b mean.coef_std: Mean function coefficients as
vectors/array. [Default: "coef_mean=[1.0], coef_std=[1000.0]"]
- If the mean function is assumed to be known (like in
"sGaussianProcess"), then coef_mean represents the actual values
and coef_std is ignored.
- If the mean function has normal prior on the coeficients (like
"sGaussianProcessNormal" or "sStudentTProcessNIG") then both the
mean and std are used. The parameter mean.coef_std is a vector, it
does not consider correlations.
- For C, the size of both arrays is defined in \b mean.n_coef.
\subsubsection kernelpar Kernel parameters
The kernel of the surrogate model represents the correlation between
points, which is related to the smoothness of the prediction.
- \b kernel.name: Name of the kernel function. Could be a combination
of functions like "kSum(kSEARD,kMaternARD3)". See Section \ref
kermod for the different posibilities. [Default: "kMaternARD5"]
- \b kernel.hp_mean, \b kernel.hp_std: Kernel hyperparameters normal
prior in the log space. That is, if the hyperparameters are
\f$\theta\f$, this prior is \f$p(\log(\theta))\f$. Any "ilegal"
standard deviation (std<=0) results in a flat prior for the
corresponding component. [Default:hp_mean=[1.0], hp_std=[10.0]]
- If there are more than one kernel (a compound kernel), the
parameters are split among them according to the number of
parameters required for each criterion.
- ARD kernels require parameters for each dimension, if there are
only one dimension provided (like in the default), it is copied
for every dimension.
- For C, the size of both arrays is defined in \b kernel.n_hp.
\subsubsection hyperlearn Hyperparameter learning
Although BayesOpt tries to build a full analytic Bayesian model for
the surrogate function, the kernel hyperparameters cannot be estimated
in closed form. See Section \ref learnmod for a detailed description
- \b l_type: Learning method for the kernel
hyperparameters. Currently, L_FIXED, L_EMPIRICAL and L_MCMC are
implemented [Default L_EMPIRICAL]
- \b sc_type: Score function for the learning method. [Default SC_MAP]
- \b l_all: If true, all the parameters are learned (mean, sigma,
etc.) using the method defined in l_type. If false, only the kernel
hyperparameters are directly learned. [Default false]
\subsection savflags Load/Save data parameters
We can select to store or restore data in files, to continue an
optimization without starting over. The data is stored in plan text
files. Thus, by modifying the files, this method can be used to
preload existing datapoints in the model.
- \b load_save_flag: 1-Load data, 2-Save data, 3-Load and append data. Other values, no file saving or restore [Default 0]
- \b load_filename, \b save_filename; Filename to load/save data [Default "bayesopt.dat"]
*/
}