Skip to content
This repository has been archived by the owner on Feb 28, 2024. It is now read-only.

[MRG] Scatter plot matrix #147

Merged
merged 10 commits into from Jul 28, 2016
Merged

Conversation

betatim
Copy link
Member

@betatim betatim commented Jul 22, 2016

Create a scatter plot from a OptimizeResult.

Do you know how to make the top triangle vanish or something useful we could display there?

This is the output from a run on hart6:
download 7

@glouppe
Copy link
Member

glouppe commented Jul 22, 2016

I think you need to call ax.axis("off") for the corresponding axes.

@glouppe
Copy link
Member

glouppe commented Jul 22, 2016

That's nice! Would be good to plot the min in red or something, what do you think?

@betatim
Copy link
Member Author

betatim commented Jul 22, 2016

With all these i and js could you check that I didn't screw up any indexing please?

from functools import partial
from skopt import forest_minimize
from skopt.benchmarks import hart6 as _hart6
from skopt import plots

def hart6(x, noise_level=0.):
    return _hart6(x) + noise_level * np.random.randn()

bounds = [(0., 1.), (0., 1.), (0., 1.), (0., 1.), (0., 1.), (0., 1.)]
func = partial(hart6, noise_level=2.0)
n_calls = 80*2
et_minimize = partial(forest_minimize, base_estimator="et")
et_res = et_minimize(func, bounds, n_calls=n_calls, random_state=1)
ax = plots.plot_scatter_matrix(et_res, bins=10)

download 9

@betatim
Copy link
Member Author

betatim commented Jul 22, 2016

Interesting to see how much these distributions change just by rerunning things 😕 😟

edit: maybe not that worrying when you consider hart6 has six minima
edit2: local minima but not global, so back to worrying?

@betatim betatim changed the title [WIP] Scatter plot matrix [MRG] Scatter plot matrix Jul 22, 2016
@betatim
Copy link
Member Author

betatim commented Jul 22, 2016

same problem, but with dummy_minimize:

download 10

@betatim
Copy link
Member Author

betatim commented Jul 22, 2016

One more plot for #67

* `ax`: [`Axes`]:
The matplotlib axes.
"""
samples = np.asarray(result.x_iters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we document that this function cannot be used for Categorical dimensions?

@glouppe
Copy link
Member

glouppe commented Jul 25, 2016

With all these i and js could you check that I didn't screw up any indexing please?

Looks good

@betatim betatim changed the title [MRG] Scatter plot matrix [WIP] Scatter plot matrix Jul 25, 2016
@betatim
Copy link
Member Author

betatim commented Jul 25, 2016

download 11

The lower triangle shows the same as before but different cmap. Upper triangle shows where samples are and the objective function.

(plus several bug fixes and tweaks to handle axis ranges better)

Conclusion from my side: do what Gilles said

@betatim
Copy link
Member Author

betatim commented Jul 25, 2016

Split things into two functions. One to show the order of samples and histograms, and another to show the objective function value. Below the output of each on branin with two extra dimensions (that are meaningless).

plot_sampling_order
download 12

plot_objective_function
download 13

Thoughts on the names?

What is missing is how to calculate the partial dependence for the diagonal.

Should we introduce a switch that allows using the model instead of the samples for plot_objective_function or should it be a new function?

@glouppe
Copy link
Member

glouppe commented Jul 25, 2016

Let F(X=(X_1, ..., X_p)) be the prediction of our model. For the 1d partial depenpence of plot of feature X_i, we want to plot f(x) = E[F(X)|X_i=x] where the expectation is taken over the joint distribution of all the other features j != i.

In our case, there is no distribution defined over those features (it is in fact a search space, not a distribution), so partial dependence plots are not really defined either. However, we could still artificially assume that these feature are uniformly distributed within their bounds and plot f(x) = E[F(X)|X_i=x] as empirically approximated using the mean prediction of the last model over uniformly randomly drawn points (i.e space.rvs()) where the value of feature X_i is changed to x.

This also extends to 2d plots where two feature values are fixed instead of one.

@betatim
Copy link
Member Author

betatim commented Jul 25, 2016

I worked on this a bit here. Seems like "marginal" is the wrong word. This computes 1d versions of a 2d function using the samples.

(I now wonder if you have access to lhcb.slack.com and watched the discussion there.)

@glouppe
Copy link
Member

glouppe commented Jul 26, 2016

Hmm, I am not sure whether we should either reuse x_iters as the data we marginalize or instead sample uniformly at random within the bounds. After all, maybe it makes sense to reuse x_iters, since this is the data that was used for training the model.

@betatim
Copy link
Member Author

betatim commented Jul 26, 2016

I think using the model is a better idea. The whole idea of SMBO is that the model adds something to the data points in terms of getting a good estimate of the true shape of the function you are optimising. It also has the advantage that we can generate more samples from the model than we can from the true objective function.

(Practically I wonder if we could see much difference because the plots are fairly zoomed out.)

I will implement (essentially):

def partial_dependence(dimension, bins=10):
    edges = mquantiles(X[:, dimension], prob=np.linspace(0, 1, bins, endpoint=False)[1:])
    edges = np.array([low] + list(edges) + [high])

    q_x = []
    q_y = []
    q_up = []
    q_down = []
    for i in range(edges.shape[0] - 1):
        idx = (edges[i] < X[:, dimension]) & (X[:, dimension]< edges[i+1])
        q_x.append(edges[i] + (edges[i+1] - edges[i])/2.)
        # average it! no median
        q_y.append(np.median(ys[idx]))
        q_down.append(np.percentile(ys[idx], 16))
        q_up.append(np.percentile(ys[idx], 84))

    plt.plot(X[:, dimension], ys, '.')
    plt.plot(q_x, q_y, 'r-', lw=2)
    plt.fill_between(q_x, q_down, q_up, color='r', alpha=0.3)

Adding the ability to do 2d and using samples from the model.

approximated using the mean prediction of the last model over uniformly randomly drawn points (i.e space.rvs()) where the value of feature X_i is changed to x.

This isn't quite clear to me. In the notebook I draw random samples, then average them per quantile in X_i. Are you suggesting to draw random samples and then set the value of X_i to a certain value for each sample? I don't think you can do that 😕

@glouppe
Copy link
Member

glouppe commented Jul 26, 2016

This isn't quite clear to me. In the notebook I draw random samples, then average them per quantile in X_i. Are you suggesting to draw random samples and then set the value of X_i to a certain value for each sample? I don't think you can do that

Why not? This is what a partial dependence plot is.

@betatim
Copy link
Member Author

betatim commented Jul 26, 2016

Oh wow 😮 ! I thought I had implemented the partial dependence plot, but no. Now that I understand (I think):

def f(x):
    return np.sin(x[0])*x[0] + np.cos(x[1])*x[1] + x[2]

def partial(dimension):
    # dimension we want to see
    xx = np.linspace(low, high, 100)

    rvs = np.random.uniform(low, high, size=(100, 3))

    y = []
    for x in xx:
        rvs_ = rvs.copy()
        rvs_[:, dimension] = x
        y.append(np.mean([f(r) for r in rvs_]))

    return xx, y

plt.plot(*partial(0), 'b-')
plt.plot(*partial(1), 'r-')
plt.plot(*partial(2), 'g-')

gives
download 19

📔 📚 every day is a school day with @glouppe! Merci.

@glouppe
Copy link
Member

glouppe commented Jul 26, 2016

Yeah, that is it :)

rvs = np.random.uniform(low, high, size=(100, 3))

Now my mumbling above might make more sense to you: I was wondering whether we should marginilize over uniformly random points, as you do here, or rather use the points collected in res.x_iters (and evaluate them through res.models[-1]).

@betatim
Copy link
Member Author

betatim commented Jul 27, 2016

download 20

This is what you get as partial dependence plot for a ET model applied to

def f(x):
    return np.sin(x[0])*x[0] + np.cos(x[1])*x[1]
# one spurious/random/useless dimension
bounds = [(-3.141, 3*3.141), (-3.141, 3*3.141), (-10., -4.)]

n_calls = 160
n_random_starts = 15

It is nice to see that the model realises that the third dimension is a red herring. Note that the plot in column 1, row 2 only roughly looks like the real objective function. This confused me for a while but it makes sense because you only evaluate the expensive function at points close to the minimum -> the model doesn't learn anything about how the function looks like far away. If you increase the number of random samples to 150, thing start to look a lot more like the real objective function:
download 21

A big question for me is how to write good/useful tests for this. Right now I am "debugging"/checking things as I go along by thinking hard about each plot and comparing things with little toy implementations outside of skopt. For example this notebook which is only really useful for me today. Ideas welcome.

space = result.space
samples = np.asarray(result.x_iters)
order = range(samples.shape[0])
rvs = space.rvs(n_samples=10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can increase this to 100 or 500. n_samples=10 might lead to too noisy empirical estimates.

Copy link
Member

@glouppe glouppe Jul 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, have you tried to compare visually replacing this with rvs = results.x_iters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I tuned the numbers so that they aren't super painful when you want to plot often. I had started with 100 points but this was super slow. +1 for 100 as default and we should definitely make it a parameter that can be adjusted by the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess this should be tuned in combination with the number of steps from lower to upper bounds (set to 40 at the moment).

@betatim
Copy link
Member Author

betatim commented Jul 27, 2016

Another iteration of 🎨. Getting there. The following four pictures are plot_sampling_order and plot_objective_function for a ET model that has few and many n_random_starts.

The language in the docstrings could be improved. Not sure how to refer to partial dependence. Are you calculating the partial dependence of the model wrt dim1 and dim2? All reads a bit clunky 😢

download 23

download 22

download 25

download 24

@glouppe
Copy link
Member

glouppe commented Jul 27, 2016

Giving detailed explanations about those plots and how to read would make for a great notebook :)

@glouppe
Copy link
Member

glouppe commented Jul 27, 2016

Could you add labels on the left and bottom, to make it easier to understand that a subplot concerns X_i vs X_j?


def partial_dependence(space, model, i, j=None, sample_points=None,
n_samples=100, n_points=40):
"""Calculate partial dependence of `model` for dimensions `i` and `j`
Copy link
Member

@glouppe glouppe Jul 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead something like "Calculate the partial dependence for dimensions i and j with respect to the objective value, as approximated through model".

@glouppe
Copy link
Member

glouppe commented Jul 27, 2016

For the plots on the diagonal, I think it would be better if all were using the same y-scale for the number of samples / objective value. E.g. for the 1d partial dependence plot of X3, it is not obvious that there is almost no correlation, simply because the plot is zoomed in.

Added a few comments on the axis formatting and tried to
structure things so that they will be slightly less incomprehensible
in two months time.
@betatim
Copy link
Member Author

betatim commented Jul 27, 2016

Giving detailed explanations about those plots and how to read would make for a great notebook :)

Easy, you read them just like you read tealeaves ;) 🍵

Plots below are branin without noise and two extra dimensions that are useless. ET model, with 15 random points.

download 27

download 26

@betatim betatim changed the title [WIP] Scatter plot matrix [MRg] Scatter plot matrix Jul 27, 2016
@betatim betatim changed the title [MRg] Scatter plot matrix [MRG] Scatter plot matrix Jul 27, 2016
@glouppe
Copy link
Member

glouppe commented Jul 28, 2016

Looks great! Just one last nitpick about the x and y-axis labels. Wouldnt it be easier if those were simply X_i (on the y-axis?) and X_j (on the other axis). E.g, at the moment it is not immediate what X_3,2 means for the x-axis vs X_3,0 on the y-axis.

return xi, yi, zi


def plot_objective_function(result, levels=10, n_points=40, n_samples=100):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or simply, plot_objective?

@glouppe
Copy link
Member

glouppe commented Jul 28, 2016

Shall we also update one of the notebooks to include of one those plots?

@betatim betatim closed this Jul 28, 2016
@betatim betatim reopened this Jul 28, 2016
@betatim
Copy link
Member Author

betatim commented Jul 28, 2016

(I just wanted to cancel my comment ... #fatfingers)

@betatim
Copy link
Member Author

betatim commented Jul 28, 2016

Will investigate including it in an existing notebook. #162 is about creating notebooks that explain how to read these plots in general. We should have those too.

@glouppe
Copy link
Member

glouppe commented Jul 28, 2016

That can also be done later.

For me this PR is already good enough to be merged.

@betatim
Copy link
Member Author

betatim commented Jul 28, 2016

Then I vote merge now, would be great to get this done before I leave for holiday 🌴

@glouppe glouppe merged commit 7d77c05 into scikit-optimize:master Jul 28, 2016
@glouppe
Copy link
Member

glouppe commented Jul 28, 2016

Boum!

@betatim betatim deleted the moar-plotting branch July 29, 2016 06:53
@betatim
Copy link
Member Author

betatim commented Jul 29, 2016

😃 Thanks for the comments and patience, this PR just kept getting bigger and bigger.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants