[MRG] Scatter plot matrix #147

betatim · 2016-07-22T15:01:44Z

Create a scatter plot from a OptimizeResult.

Do you know how to make the top triangle vanish or something useful we could display there?

This is the output from a run on hart6:

glouppe · 2016-07-22T15:04:05Z

I think you need to call ax.axis("off") for the corresponding axes.

glouppe · 2016-07-22T15:08:06Z

That's nice! Would be good to plot the min in red or something, what do you think?

betatim · 2016-07-22T15:25:53Z

With all these i and js could you check that I didn't screw up any indexing please?

from functools import partial
from skopt import forest_minimize
from skopt.benchmarks import hart6 as _hart6
from skopt import plots

def hart6(x, noise_level=0.):
    return _hart6(x) + noise_level * np.random.randn()

bounds = [(0., 1.), (0., 1.), (0., 1.), (0., 1.), (0., 1.), (0., 1.)]
func = partial(hart6, noise_level=2.0)
n_calls = 80*2
et_minimize = partial(forest_minimize, base_estimator="et")
et_res = et_minimize(func, bounds, n_calls=n_calls, random_state=1)
ax = plots.plot_scatter_matrix(et_res, bins=10)

betatim · 2016-07-22T15:27:05Z

Interesting to see how much these distributions change just by rerunning things 😕 😟

edit: maybe not that worrying when you consider hart6 has six minima
edit2: local minima but not global, so back to worrying?

betatim · 2016-07-22T15:30:32Z

same problem, but with dummy_minimize:

betatim · 2016-07-22T15:34:42Z

One more plot for #67

glouppe · 2016-07-25T06:19:05Z

skopt/plots.py

+    * `ax`: [`Axes`]:
+        The matplotlib axes.
+    """
+    samples = np.asarray(result.x_iters)


Shall we document that this function cannot be used for Categorical dimensions?

glouppe · 2016-07-25T06:26:35Z

With all these i and js could you check that I didn't screw up any indexing please?

Looks good

betatim · 2016-07-25T14:43:45Z

The lower triangle shows the same as before but different cmap. Upper triangle shows where samples are and the objective function.

(plus several bug fixes and tweaks to handle axis ranges better)

Conclusion from my side: do what Gilles said

betatim · 2016-07-25T17:04:41Z

Split things into two functions. One to show the order of samples and histograms, and another to show the objective function value. Below the output of each on branin with two extra dimensions (that are meaningless).

plot_sampling_order

plot_objective_function

Thoughts on the names?

What is missing is how to calculate the partial dependence for the diagonal.

Should we introduce a switch that allows using the model instead of the samples for plot_objective_function or should it be a new function?

glouppe · 2016-07-25T20:39:19Z

Let F(X=(X_1, ..., X_p)) be the prediction of our model. For the 1d partial depenpence of plot of feature X_i, we want to plot f(x) = E[F(X)|X_i=x] where the expectation is taken over the joint distribution of all the other features j != i.

In our case, there is no distribution defined over those features (it is in fact a search space, not a distribution), so partial dependence plots are not really defined either. However, we could still artificially assume that these feature are uniformly distributed within their bounds and plot f(x) = E[F(X)|X_i=x] as empirically approximated using the mean prediction of the last model over uniformly randomly drawn points (i.e space.rvs()) where the value of feature X_i is changed to x.

This also extends to 2d plots where two feature values are fixed instead of one.

betatim · 2016-07-25T21:37:04Z

I worked on this a bit here. Seems like "marginal" is the wrong word. This computes 1d versions of a 2d function using the samples.

(I now wonder if you have access to lhcb.slack.com and watched the discussion there.)

glouppe · 2016-07-26T05:41:14Z

Hmm, I am not sure whether we should either reuse x_iters as the data we marginalize or instead sample uniformly at random within the bounds. After all, maybe it makes sense to reuse x_iters, since this is the data that was used for training the model.

betatim · 2016-07-26T09:45:39Z

I think using the model is a better idea. The whole idea of SMBO is that the model adds something to the data points in terms of getting a good estimate of the true shape of the function you are optimising. It also has the advantage that we can generate more samples from the model than we can from the true objective function.

(Practically I wonder if we could see much difference because the plots are fairly zoomed out.)

I will implement (essentially):

def partial_dependence(dimension, bins=10):
    edges = mquantiles(X[:, dimension], prob=np.linspace(0, 1, bins, endpoint=False)[1:])
    edges = np.array([low] + list(edges) + [high])

    q_x = []
    q_y = []
    q_up = []
    q_down = []
    for i in range(edges.shape[0] - 1):
        idx = (edges[i] < X[:, dimension]) & (X[:, dimension]< edges[i+1])
        q_x.append(edges[i] + (edges[i+1] - edges[i])/2.)
        # average it! no median
        q_y.append(np.median(ys[idx]))
        q_down.append(np.percentile(ys[idx], 16))
        q_up.append(np.percentile(ys[idx], 84))

    plt.plot(X[:, dimension], ys, '.')
    plt.plot(q_x, q_y, 'r-', lw=2)
    plt.fill_between(q_x, q_down, q_up, color='r', alpha=0.3)

Adding the ability to do 2d and using samples from the model.

approximated using the mean prediction of the last model over uniformly randomly drawn points (i.e space.rvs()) where the value of feature X_i is changed to x.

This isn't quite clear to me. In the notebook I draw random samples, then average them per quantile in X_i. Are you suggesting to draw random samples and then set the value of X_i to a certain value for each sample? I don't think you can do that 😕

glouppe · 2016-07-26T10:01:15Z

This isn't quite clear to me. In the notebook I draw random samples, then average them per quantile in X_i. Are you suggesting to draw random samples and then set the value of X_i to a certain value for each sample? I don't think you can do that

Why not? This is what a partial dependence plot is.

betatim · 2016-07-26T11:08:48Z

Oh wow 😮 ! I thought I had implemented the partial dependence plot, but no. Now that I understand (I think):

def f(x):
    return np.sin(x[0])*x[0] + np.cos(x[1])*x[1] + x[2]

def partial(dimension):
    # dimension we want to see
    xx = np.linspace(low, high, 100)

    rvs = np.random.uniform(low, high, size=(100, 3))

    y = []
    for x in xx:
        rvs_ = rvs.copy()
        rvs_[:, dimension] = x
        y.append(np.mean([f(r) for r in rvs_]))

    return xx, y

plt.plot(*partial(0), 'b-')
plt.plot(*partial(1), 'r-')
plt.plot(*partial(2), 'g-')

gives

📔 📚 every day is a school day with @glouppe! Merci.

glouppe · 2016-07-26T11:20:25Z

Yeah, that is it :)

rvs = np.random.uniform(low, high, size=(100, 3))

Now my mumbling above might make more sense to you: I was wondering whether we should marginilize over uniformly random points, as you do here, or rather use the points collected in res.x_iters (and evaluate them through res.models[-1]).

betatim · 2016-07-27T08:41:37Z

This is what you get as partial dependence plot for a ET model applied to

def f(x):
    return np.sin(x[0])*x[0] + np.cos(x[1])*x[1]
# one spurious/random/useless dimension
bounds = [(-3.141, 3*3.141), (-3.141, 3*3.141), (-10., -4.)]

n_calls = 160
n_random_starts = 15

It is nice to see that the model realises that the third dimension is a red herring. Note that the plot in column 1, row 2 only roughly looks like the real objective function. This confused me for a while but it makes sense because you only evaluate the expensive function at points close to the minimum -> the model doesn't learn anything about how the function looks like far away. If you increase the number of random samples to 150, thing start to look a lot more like the real objective function:

A big question for me is how to write good/useful tests for this. Right now I am "debugging"/checking things as I go along by thinking hard about each plot and comparing things with little toy implementations outside of skopt. For example this notebook which is only really useful for me today. Ideas welcome.

glouppe · 2016-07-27T11:11:41Z

skopt/plots.py

+    space = result.space
+    samples = np.asarray(result.x_iters)
+    order = range(samples.shape[0])
+    rvs = space.rvs(n_samples=10)


I think you can increase this to 100 or 500. n_samples=10 might lead to too noisy empirical estimates.

also, have you tried to compare visually replacing this with rvs = results.x_iters?

Right now I tuned the numbers so that they aren't super painful when you want to plot often. I had started with 100 points but this was super slow. +1 for 100 as default and we should definitely make it a parameter that can be adjusted by the user.

Yeah, I guess this should be tuned in combination with the number of steps from lower to upper bounds (set to 40 at the moment).

betatim · 2016-07-27T18:08:27Z

Another iteration of 🎨. Getting there. The following four pictures are plot_sampling_order and plot_objective_function for a ET model that has few and many n_random_starts.

The language in the docstrings could be improved. Not sure how to refer to partial dependence. Are you calculating the partial dependence of the model wrt dim1 and dim2? All reads a bit clunky 😢

glouppe · 2016-07-27T18:11:39Z

Giving detailed explanations about those plots and how to read would make for a great notebook :)

glouppe · 2016-07-27T18:13:14Z

Could you add labels on the left and bottom, to make it easier to understand that a subplot concerns X_i vs X_j?

glouppe · 2016-07-27T18:15:56Z

skopt/plots.py

+
+def partial_dependence(space, model, i, j=None, sample_points=None,
+                       n_samples=100, n_points=40):
+    """Calculate partial dependence of `model` for dimensions `i` and `j`


Maybe instead something like "Calculate the partial dependence for dimensions i and j with respect to the objective value, as approximated through model".

glouppe · 2016-07-27T18:30:11Z

For the plots on the diagonal, I think it would be better if all were using the same y-scale for the number of samples / objective value. E.g. for the 1d partial dependence plot of X3, it is not obvious that there is almost no correlation, simply because the plot is zoomed in.

Added a few comments on the axis formatting and tried to structure things so that they will be slightly less incomprehensible in two months time.

betatim · 2016-07-27T22:05:53Z

Giving detailed explanations about those plots and how to read would make for a great notebook :)

Easy, you read them just like you read tealeaves ;) 🍵

Plots below are branin without noise and two extra dimensions that are useless. ET model, with 15 random points.

glouppe · 2016-07-28T07:17:36Z

Looks great! Just one last nitpick about the x and y-axis labels. Wouldnt it be easier if those were simply X_i (on the y-axis?) and X_j (on the other axis). E.g, at the moment it is not immediate what X_3,2 means for the x-axis vs X_3,0 on the y-axis.

glouppe · 2016-07-28T07:19:03Z

skopt/plots.py

+        return xi, yi, zi
+
+
+def plot_objective_function(result, levels=10, n_points=40, n_samples=100):


Or simply, plot_objective?

glouppe · 2016-07-28T07:25:49Z

Shall we also update one of the notebooks to include of one those plots?

betatim · 2016-07-28T12:29:36Z

(I just wanted to cancel my comment ... #fatfingers)

betatim · 2016-07-28T12:33:02Z

Will investigate including it in an existing notebook. #162 is about creating notebooks that explain how to read these plots in general. We should have those too.

glouppe · 2016-07-28T12:39:21Z

That can also be done later.

For me this PR is already good enough to be merged.

betatim · 2016-07-28T12:47:16Z

Then I vote merge now, would be great to get this done before I leave for holiday 🌴

glouppe · 2016-07-28T12:49:51Z

Boum!

betatim · 2016-07-29T06:54:23Z

😃 Thanks for the comments and patience, this PR just kept getting bigger and bigger.

Scatter plot matrix plotting

f0961e9

betatim force-pushed the moar-plotting branch from 9affae2 to f0961e9 Compare July 22, 2016 15:04

Highlight minimum

98ac16c

betatim changed the title ~~[WIP] Scatter plot matrix~~ [MRG] Scatter plot matrix Jul 22, 2016

glouppe reviewed Jul 25, 2016
View reviewed changes

Display objective function in upper triangle

dca17a1

betatim changed the title ~~[MRG] Scatter plot matrix~~ [WIP] Scatter plot matrix Jul 25, 2016

Plot sampling order and objective function separately

165f2cc

First version of partial dependence plots

ed9eef3

glouppe reviewed Jul 27, 2016
View reviewed changes

betatim added 2 commits July 27, 2016 23:41

Improved doc strings for plotting functions

f271025

Added a few comments on the axis formatting and tried to structure things so that they will be slightly less incomprehensible in two months time.

Diagonals use shared y axis range

1307ac8

betatim changed the title ~~[WIP] Scatter plot matrix~~ [MRg] Scatter plot matrix Jul 27, 2016

betatim changed the title ~~[MRg] Scatter plot matrix~~ [MRG] Scatter plot matrix Jul 27, 2016

glouppe reviewed Jul 28, 2016
View reviewed changes

Rename plotting functions

6cad107

betatim closed this Jul 28, 2016

betatim reopened this Jul 28, 2016

betatim mentioned this pull request Jul 28, 2016

Notebook explaining how to read plot_evaluations and plot_objective #162

Closed

glouppe merged commit 7d77c05 into scikit-optimize:master Jul 28, 2016

betatim deleted the moar-plotting branch July 29, 2016 06:53

		return xi, yi, zi


		def plot_objective_function(result, levels=10, n_points=40, n_samples=100):

[MRG] Scatter plot matrix #147

[MRG] Scatter plot matrix #147

Conversation

betatim commented Jul 22, 2016

glouppe commented Jul 22, 2016

glouppe commented Jul 22, 2016

betatim commented Jul 22, 2016 • edited

betatim commented Jul 22, 2016 • edited

betatim commented Jul 22, 2016

betatim commented Jul 22, 2016

glouppe Jul 25, 2016

Choose a reason for hiding this comment

glouppe commented Jul 25, 2016

betatim commented Jul 25, 2016

betatim commented Jul 25, 2016

glouppe commented Jul 25, 2016 • edited

betatim commented Jul 25, 2016

glouppe commented Jul 26, 2016

betatim commented Jul 26, 2016

glouppe commented Jul 26, 2016

betatim commented Jul 26, 2016

glouppe commented Jul 26, 2016

betatim commented Jul 27, 2016

glouppe Jul 27, 2016

Choose a reason for hiding this comment

glouppe Jul 27, 2016 • edited

Choose a reason for hiding this comment

betatim Jul 27, 2016

Choose a reason for hiding this comment

glouppe Jul 27, 2016

Choose a reason for hiding this comment

betatim commented Jul 27, 2016 • edited

glouppe commented Jul 27, 2016

glouppe commented Jul 27, 2016 • edited

glouppe Jul 27, 2016 • edited

Choose a reason for hiding this comment

glouppe commented Jul 27, 2016 • edited

betatim commented Jul 27, 2016

glouppe commented Jul 28, 2016 • edited

glouppe Jul 28, 2016

Choose a reason for hiding this comment

glouppe commented Jul 28, 2016

betatim commented Jul 28, 2016

betatim commented Jul 28, 2016

glouppe commented Jul 28, 2016

betatim commented Jul 28, 2016

glouppe commented Jul 28, 2016

betatim commented Jul 29, 2016

betatim commented Jul 22, 2016 •

edited

betatim commented Jul 22, 2016 •

edited

glouppe commented Jul 25, 2016 •

edited

glouppe Jul 27, 2016 •

edited

betatim commented Jul 27, 2016 •

edited

glouppe commented Jul 27, 2016 •

edited

glouppe Jul 27, 2016 •

edited

glouppe commented Jul 27, 2016 •

edited

glouppe commented Jul 28, 2016 •

edited