Conversation
I think you need to call |
That's nice! Would be good to plot the min in red or something, what do you think? |
With all these from functools import partial
from skopt import forest_minimize
from skopt.benchmarks import hart6 as _hart6
from skopt import plots
def hart6(x, noise_level=0.):
return _hart6(x) + noise_level * np.random.randn()
bounds = [(0., 1.), (0., 1.), (0., 1.), (0., 1.), (0., 1.), (0., 1.)]
func = partial(hart6, noise_level=2.0)
n_calls = 80*2
et_minimize = partial(forest_minimize, base_estimator="et")
et_res = et_minimize(func, bounds, n_calls=n_calls, random_state=1)
ax = plots.plot_scatter_matrix(et_res, bins=10) |
Interesting to see how much these distributions change just by rerunning things 😕 😟 edit: maybe not that worrying when you consider |
One more plot for #67 |
* `ax`: [`Axes`]: | ||
The matplotlib axes. | ||
""" | ||
samples = np.asarray(result.x_iters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we document that this function cannot be used for Categorical
dimensions?
Looks good |
The lower triangle shows the same as before but different cmap. Upper triangle shows where samples are and the objective function. (plus several bug fixes and tweaks to handle axis ranges better) Conclusion from my side: do what Gilles said |
Split things into two functions. One to show the order of samples and histograms, and another to show the objective function value. Below the output of each on Thoughts on the names? What is missing is how to calculate the partial dependence for the diagonal. Should we introduce a switch that allows using the model instead of the samples for |
Let In our case, there is no distribution defined over those features (it is in fact a search space, not a distribution), so partial dependence plots are not really defined either. However, we could still artificially assume that these feature are uniformly distributed within their bounds and plot This also extends to 2d plots where two feature values are fixed instead of one. |
I worked on this a bit here. Seems like "marginal" is the wrong word. This computes 1d versions of a 2d function using the samples. (I now wonder if you have access to lhcb.slack.com and watched the discussion there.) |
Hmm, I am not sure whether we should either reuse |
I think using the model is a better idea. The whole idea of SMBO is that the model adds something to the data points in terms of getting a good estimate of the true shape of the function you are optimising. It also has the advantage that we can generate more samples from the model than we can from the true objective function. (Practically I wonder if we could see much difference because the plots are fairly zoomed out.) I will implement (essentially): def partial_dependence(dimension, bins=10):
edges = mquantiles(X[:, dimension], prob=np.linspace(0, 1, bins, endpoint=False)[1:])
edges = np.array([low] + list(edges) + [high])
q_x = []
q_y = []
q_up = []
q_down = []
for i in range(edges.shape[0] - 1):
idx = (edges[i] < X[:, dimension]) & (X[:, dimension]< edges[i+1])
q_x.append(edges[i] + (edges[i+1] - edges[i])/2.)
# average it! no median
q_y.append(np.median(ys[idx]))
q_down.append(np.percentile(ys[idx], 16))
q_up.append(np.percentile(ys[idx], 84))
plt.plot(X[:, dimension], ys, '.')
plt.plot(q_x, q_y, 'r-', lw=2)
plt.fill_between(q_x, q_down, q_up, color='r', alpha=0.3) Adding the ability to do 2d and using samples from the model.
This isn't quite clear to me. In the notebook I draw random samples, then average them per quantile in |
Why not? This is what a partial dependence plot is. |
Oh wow 😮 ! I thought I had implemented the partial dependence plot, but no. Now that I understand (I think): def f(x):
return np.sin(x[0])*x[0] + np.cos(x[1])*x[1] + x[2]
def partial(dimension):
# dimension we want to see
xx = np.linspace(low, high, 100)
rvs = np.random.uniform(low, high, size=(100, 3))
y = []
for x in xx:
rvs_ = rvs.copy()
rvs_[:, dimension] = x
y.append(np.mean([f(r) for r in rvs_]))
return xx, y
plt.plot(*partial(0), 'b-')
plt.plot(*partial(1), 'r-')
plt.plot(*partial(2), 'g-') 📔 📚 every day is a school day with @glouppe! Merci. |
Yeah, that is it :)
Now my mumbling above might make more sense to you: I was wondering whether we should marginilize over uniformly random points, as you do here, or rather use the points collected in |
This is what you get as partial dependence plot for a ET model applied to def f(x):
return np.sin(x[0])*x[0] + np.cos(x[1])*x[1]
# one spurious/random/useless dimension
bounds = [(-3.141, 3*3.141), (-3.141, 3*3.141), (-10., -4.)]
n_calls = 160
n_random_starts = 15 It is nice to see that the model realises that the third dimension is a red herring. Note that the plot in column 1, row 2 only roughly looks like the real objective function. This confused me for a while but it makes sense because you only evaluate the expensive function at points close to the minimum -> the model doesn't learn anything about how the function looks like far away. If you increase the number of random samples to 150, thing start to look a lot more like the real objective function: A big question for me is how to write good/useful tests for this. Right now I am "debugging"/checking things as I go along by thinking hard about each plot and comparing things with little toy implementations outside of |
space = result.space | ||
samples = np.asarray(result.x_iters) | ||
order = range(samples.shape[0]) | ||
rvs = space.rvs(n_samples=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can increase this to 100
or 500
. n_samples=10
might lead to too noisy empirical estimates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, have you tried to compare visually replacing this with rvs = results.x_iters
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now I tuned the numbers so that they aren't super painful when you want to plot often. I had started with 100
points but this was super slow. +1 for 100 as default and we should definitely make it a parameter that can be adjusted by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess this should be tuned in combination with the number of steps from lower to upper bounds (set to 40 at the moment).
Another iteration of 🎨. Getting there. The following four pictures are The language in the docstrings could be improved. Not sure how to refer to partial dependence. Are you calculating the partial dependence of the model wrt dim1 and dim2? All reads a bit clunky 😢 |
Giving detailed explanations about those plots and how to read would make for a great notebook :) |
Could you add labels on the left and bottom, to make it easier to understand that a subplot concerns X_i vs X_j? |
|
||
def partial_dependence(space, model, i, j=None, sample_points=None, | ||
n_samples=100, n_points=40): | ||
"""Calculate partial dependence of `model` for dimensions `i` and `j` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead something like "Calculate the partial dependence for dimensions i
and j
with respect to the objective value, as approximated through model
".
For the plots on the diagonal, I think it would be better if all were using the same y-scale for the number of samples / objective value. E.g. for the 1d partial dependence plot of X3, it is not obvious that there is almost no correlation, simply because the plot is zoomed in. |
Added a few comments on the axis formatting and tried to structure things so that they will be slightly less incomprehensible in two months time.
Looks great! Just one last nitpick about the x and y-axis labels. Wouldnt it be easier if those were simply |
return xi, yi, zi | ||
|
||
|
||
def plot_objective_function(result, levels=10, n_points=40, n_samples=100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or simply, plot_objective
?
Shall we also update one of the notebooks to include of one those plots? |
(I just wanted to cancel my comment ... #fatfingers) |
Will investigate including it in an existing notebook. #162 is about creating notebooks that explain how to read these plots in general. We should have those too. |
That can also be done later. For me this PR is already good enough to be merged. |
Then I vote merge now, would be great to get this done before I leave for holiday 🌴 |
Boum! |
😃 Thanks for the comments and patience, this PR just kept getting bigger and bigger. |
Create a scatter plot from a
OptimizeResult
.Do you know how to make the top triangle vanish or something useful we could display there?
This is the output from a run on
hart6
: