Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
Picking initial points with latin hypercubes #433
Comments
betatim
added the
Moderate
label
Jul 12, 2017
|
With latin hypercube designs we should be careful (even a straight line of
points is a valid latin hypercube design) in how we construct the grid.
The R package 'lhs' implements a few methods to construct latin hypercube
designs satisfying different criteria like mean distance of each point to
all other points.
I agree with the point that since we have a fixed set of points, we should
try to optimize their position instead of using quasi random sequences.
…
|
|
I thought for a 1D problem it would be fine to use an evenly spaced line of numbers? Then in N-dimensions the property you want is that none of the points overlap so that when you project onto a particular axis you get |
|
Do you have a benchmark (or reference) showing that this is better than
uniformly random points? I remember @MechCoder playing with a similar idea
(pseudo-random initialization points), which we eventually abandoned
because we couldnt show this was better.
…
|
|
I agree with @glouppe . We should replace things only if we can show that they improve on existing stuff. |
|
I will start by comparing different initializations on the branin function,
to see whether there is a difference already for low-dimensional functions.
Iaroslav Shcherbatyi <notifications@github.com> schrieb am Do., 13. Juli
2017 um 11:12 Uhr:
…
|
|
Sounds good! I am quite curious about what results will be :) |
|
Here is the first comparison on Branin-Hoo with the following parameters:
On this plot I show 100 repetitions of gp_minimize, each time with a different randomized initialization. Each point shows the minimal objective value attained on that run.
As expected the uniform random initialization has the highest standard deviation. I was a little surprised that the Sobol sequence performs so well when compared to Maximin LHS. Here is the Jupyter notebook if you want to play around with it yourself. 100 repetitions on one init method take around 30 minutes. |
|
Nice plots! (I had to finish my coffee and read the seaborn docs to realise what a stripplot does :) ) Should we be looking at median and several other quantiles instead of mean and std? From looking at the plots I was surprised by the number of outliers. Can we compare the minimum found after the init phase only? Just as a way to speed up stuff. Maybe we need to run for more than 10 samples in that case. Not sure if we want to find out which initialization method gives us the best starting point or if we care about the combined performance of init+optimizer? I will run your notebook with |
|
Yes, I noticed that too. I computed the median in between simulation runs for comparison and it gave the same relative ordering. In general I would say the spread is a really important metric for the initialization. If we look at the median absolute deviation or other robust measures we could remove the effect of some of the more extreme outliers. I also like the idea of comparing only the initializers. I might start an experiment for that later. |
|
This is with Median and MAD:
Notebook: https://gist.github.com/betatim/c67a068f7d68d9dbd973810596fe575e |
|
Yeah at the end you (almost) always want to use optimization on top of things, so it makes more sense to look at final result of optimization. |
|
Did you guys have a look at dask-patternsearch? |
|
@amueller dask-patternsearch seems to be something similar in spirit to Powell method implemented in scipy and derivative free scipy.optimize.minimize in general. Would be interesting to benchmark against these guys some time maybe. It seems that the assumption that they make is that objective is not too expensive to evaluate (or you have massive parallel computing power). Maybe we could take some of their things for initialization, if this improves benchmarks. |
eriknw
commented
Jul 14, 2017
Hey! Author of The pattern in A useful property of I'm happy to have discussions (maybe in a different forum) and to collaborate as appropriate. Cheers! |
|
Hi @eriknw :) Thanks for introducing your project and sorry if I got something wrong. Looks interesting indeed! Discussion on benchmarks in eriknw/dask-patternsearch#9 is quite relevant for scikit-optimize also, and has some interesting ideas. You can take a look at benchmarks that we have, but these might change somewhat in future as more complexity would result in more interesting benchmarks. Cheers! :) |
|
@iaroslav-ai Could you please also show the |
|
For historians, the strategy proposed in #262 was to use points generated from a sobel sequence to optimize the acquisition function and not as initial points as proposed in this issue. As the benchmarks in that PR indicate, that performs worse than using a combination of sampling and lbfgs (sampling to provide the start points of the lbfgs optimisation) |
|
@betatim In your two plots, why does the blue line change in between the first and the second plot? |
|
@MechCoder yup will update plots a bit later |
|
Hmm I take back my comment about VAE - just checked, it does not make estimations larger than -1.0 |
|
How are you using a VAE to model score? My idea of a VAE is to sample from the data generating distribution |
|
Yeah I think you are right, I learn to sample from |
|
Hmm plots above seem to suggest that anything other than uniform initialization might be better in the end |
Two things: the "grid" line moves, which makes it a bit hard to compare the two plots and in one case it is just random sampling (init phase) and in the other plot it is init + SMBO phase. |
|
Can we move the discussion of benchmark (methods) and models etc to a new issue? We are starting to mix several threads of conversation here which makes it hard to follow. For me these are the questions for this thread:
|
After looking at several libraries in Python, the most promising and maintained one is SALib. It offers the Sobol sequence in Advantages are:
|
|
Is Sobol better than latin hypercubes? This is the bit of code we would need right? I'd be tempted to take just that code instead of adding a new dependency :-/ Semi related question: what would we do about categorical dimensions? |
|
I extended the advantages of LHS and Sobol: Sobol sequenceAdvantage:
Disadvantage:
Latin hypercube samplingAdvantages:
Disadvantages:
Of course LHS without any optimization is easy to implement, and we could simply offer both.
Yes, we could just take the this and the
Categorical dimensions should not be a problem because we currently use one-hot encoding and the low-discrepancy sequence will evenly cover both sides of the 0/1 interval for each dummy variable. The start value of 0.5 is the only problematic value. |
+1 for that |
glouppe
closed this
Jul 28, 2017
glouppe
reopened this
Jul 28, 2017
|
Sounds like we have a proposal to add Sobol sequences as a way to initialise the optimizers.
OK, let's find out how tedious this is to implement in a nice way in practice. For example as a user I would just continue setting Wondering if this is best implemented as a function in Right now the sampling of random values is delegated to each |









betatim commentedJul 12, 2017
Inspired by the comments in #432: currently we pick the initial points at random, and discussed using a Sobol sequence (or other quasi random sequence) instead.
Finding a good implementation of a QR sequence generator in many dimensions isn't soo easy.
However I just realised that we pick the number of initial points "up front", which means we can design a "optimal" grid (latin hypercube?) and evaluate the objective using that. Sobol or random only wins over this static allocation if you do not know how many points you will sample. This would be great because I think latin hypercubes should be much easier to code up than a Sobol sequence.
Would be good to hear some opinions on this.