Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Issue #7987: Embarrassingly parallel "n_restarts_optimizer" in GaussianProcessRegressor #7997

Closed
wants to merge 24 commits into from

Conversation

amanp10
Copy link
Contributor

@amanp10 amanp10 commented Dec 7, 2016

Fixes #7987

It adds the Embarrassingly Parallel helper in GaussianProcessRegressor class.

I have used 'check_pickle=False' for the 'delayed' function as it was not working otherwise. However, I am not sure about the logic behind this.

@jnothman
Copy link
Member

jnothman commented Dec 7, 2016 via email

theta_initial = \
self.rng.uniform(bounds[:, 0], bounds[:, 1])
optima.append(
self._constrained_optimization(obj_func, theta_initial,
bounds))
Parallel(n_jobs=self.n_jobs)(delayed(optima_iterations,
check_pickle=False)()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be really surprised if using an inner function like this was working even with check_pickle=False for n_jobs != 1... You will need to move optima_iterations to the module scope (or class-level maybe).

Drawing random numbers in parallel processes seems a bit dangerous too. And I don't think you will be able to match the result of n_jobs=1 with n_jobs!=-1 this way. An alternative would be to draw all the random numbers first and then pass them to the function inside Parallel.

Optional: refactoring optima_iterations into a pure function that does not take self as argument is probably a good idea to reduce surprising behaviours.

Copy link
Member

@lesteve lesteve Dec 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To illustrate what I mean:

from joblib import Parallel, delayed

import numpy as np


def func(rng):
    return rng.randn(2)

rng = np.random.RandomState(0)

n_jobs=2
result = Parallel(n_jobs=n_jobs)(delayed(func)(rng) for x in [1, 2, 3])
print(result)

with n_jobs=1:

[array([ 1.76405235,  0.40015721]), array([ 0.97873798,  2.2408932 ]), array([ 1.86755799, -0.97727788])]

with n_jobs=2

[array([ 1.76405235,  0.40015721]), array([ 1.76405235,  0.40015721]), array([ 1.76405235,  0.40015721])]

What happens in the n_jobs=2 case is that each subprocess gets a copy of rng so the random numbers are the same for each iteration.

@amanp10 amanp10 changed the title [MRG] Issue #7987: Embarrassingly parallel "n_restarts_optimizer" in GaussianProcessRegressor [WIP] Issue #7987: Embarrassingly parallel "n_restarts_optimizer" in GaussianProcessRegressor Dec 7, 2016
@amanp10
Copy link
Contributor Author

amanp10 commented Dec 7, 2016

I have made the updates suggested by @jnothman . Please have a look.
Also, according to @lesteve I have changed the scope of optima_iterations to the Class and also taken care of the random numbers issue. Please have a look.
After your approval I will change the Title to [MRG] from [WIP]

@jnothman
Copy link
Member

jnothman commented Dec 8, 2016

MRG means "ready for full review before merging" so you should change that now.

@amanp10 amanp10 changed the title [WIP] Issue #7987: Embarrassingly parallel "n_restarts_optimizer" in GaussianProcessRegressor [MRG] Issue #7987: Embarrassingly parallel "n_restarts_optimizer" in GaussianProcessRegressor Dec 8, 2016
self.n_jobs = n_jobs

def _optima_iterations(self, optima, bounds, theta_initial, obj_func):
optima.append(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can that work ? If you read my previous #7997 (comment), when n_jobs!=1, each worker gets a copy of optima and this should not change optima in the main process ...

You should return self._constrained_optimizations(...) and do the optima.append in fit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a snippet to illustrate

from sklearn.externals.joblib import Parallel, delayed


def append(my_list, x):
    my_list.append(x)


if __name__ == '__main__':
    my_list = []
    Parallel(n_jobs=1)(delayed(append)(my_list, x) for x in range(10))
    print('my_list:', my_list)

    my_list_2 = []
    Parallel(n_jobs=2)(delayed(append)(my_list_2, x) for x in range(10))
    print('my_list_2', my_list_2)

Output:

my_list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
my_list_2 []

bounds))
theta_initials.append(theta_initial)
Parallel(n_jobs=self.n_jobs)(delayed(self._optima_iterations,
check_pickle=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is check_pickle=False needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gives an error "TypeError: can't pickle instancemethod objects" when check_pickle=True.
Hence I did otherwise. Please let me know if it is wrong.

# Test to check the functioning of n_jobs parameter.
for kernel in kernels:
gpr1 = GaussianProcessRegressor(kernel=kernel, n_jobs=1).fit(X, y)
gpr2 = GaussianProcessRegressor(kernel=kernel, n_jobs=-1).fit(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, use n_jobs=2 makes it easier to spot the difference compared to 1.

@amanp10
Copy link
Contributor Author

amanp10 commented Dec 8, 2016

I will look into these issues and get back to you asap.

@lesteve
Copy link
Member

lesteve commented Dec 8, 2016

I will look into these issues and get back to you asap.

I forgot to say, this is slightly worrying that the test was passing ... it is probably too forgiving.

@amanp10
Copy link
Contributor Author

amanp10 commented Dec 8, 2016

@lesteve I have made the changes you suggested, however some tests are failing (I have mentioned them in the test file). Please go through the changes and give your reviews. Also, I have made few changes in 2 example files to test this new parameter. Thank you for devoting so much of your time.

@@ -107,6 +108,11 @@ def optimizer(obj_func, initial_theta, bounds):
given, it fixes the seed. Defaults to the global numpy random
number generator.

n_jobs : int, default: 1
n_jobs is the number of workers requested by the callers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please copy the docstring from within scikit-learn, rather than from joblib. This text is a bit obscure.

self._constrained_optimization(obj_func, theta_initial,
bounds))
theta_initials.append(theta_initial)
Parallel(n_jobs=self.n_jobs)(delayed(self._optima_iterations,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a few ways you could make this line wrapping more readable! please find one...

@@ -209,12 +220,16 @@ def obj_func(theta, eval_gradient=True):
"Multiple optimizer restarts (n_restarts_optimizer>0) "
"requires that all bounds are finite.")
bounds = self.kernel_.bounds
for iteration in range(self.n_restarts_optimizer):
theta_initials = []
for i in range(self.n_restarts_optimizer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the size parameter of rng.uniform to generate n_restarts_optimizer elements in one fast operation

@amanp10
Copy link
Contributor Author

amanp10 commented Dec 9, 2016

I have made the changes @jnothman suggested above.

@amanp10
Copy link
Contributor Author

amanp10 commented Dec 10, 2016

Hello,
I am not able to understand why some of the tests are failing. Any kind of help is welcome.

def test_n_jobs_parallel():
# Test to check the functioning of n_jobs parameter.
for kernel in kernels:
gpr1 = GaussianProcessRegressor(kernel=kernel, n_jobs=1,
n_restarts_optimizer=5).fit(X, y)
gpr2 = GaussianProcessRegressor(kernel=kernel, n_jobs=2,
n_restarts_optimizer=5).fit(X, y)
gpr3 = GaussianProcessRegressor(kernel=kernel, n_jobs=-1,
n_restarts_optimizer=5).fit(X, y)
y1, y1_cov = gpr1.predict(X, return_cov=True)
y2, y2_cov = gpr2.predict(X, return_cov=True)
y3, y3_cov = gpr3.predict(X, return_cov=True)
# Successfully passed tests
assert_almost_equal(y1, y2)
assert_almost_equal(y1, y3)
assert_almost_equal(y1_cov, y2_cov)
assert_almost_equal(y1_cov, y3_cov)
# Failing tests
assert_almost_equal(gpr1.alpha_, gpr2.alpha_)
assert_almost_equal(gpr1.alpha_, gpr3.alpha_)
assert_almost_equal(gpr1.log_marginal_likelihood_value_,
gpr2.log_marginal_likelihood_value_)
assert_almost_equal(gpr1.log_marginal_likelihood_value_,
gpr3.log_marginal_likelihood_value_)

I also tried decreasing the precision value (required decimal places). This problem is arising for few kernels only.

@jnothman
Copy link
Member

(Meaningful commit messages would help us know what to expect when we review changes.)

@@ -39,7 +39,8 @@
kernel = 1.0 * RBF(length_scale=100.0, length_scale_bounds=(1e-2, 1e3)) \
+ WhiteKernel(noise_level=1, noise_level_bounds=(1e-10, 1e+1))
gp = GaussianProcessRegressor(kernel=kernel,
alpha=0.0).fit(X, y)
alpha=0.0, n_jobs=2,
n_restarts_optimizer=3).fit(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why you modified the examples?

Note: n_jobs!=1 is not really recommended in examples because it will cause a problem in Windows if we do not use if __name__ == '__main__'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason. I will remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General guideline: if there is not a strong reason behind a change, just keep it as it was before. The more focused the changes, the easier and more pleasant it is to review them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will surely take care of it from now on.

@amanp10
Copy link
Contributor Author

amanp10 commented Dec 14, 2016

According to @jnothman I used _constrained_optimization directly without using any other function. I also used @staticmethod decorator for _constrained_optimization.
However, when check_pickle=True , I get the following error,
TypeError: can't pickle function objects
There is a workaround for this using extrenal libraries but I am not sure whether to use it here.

Also, when backend=default, I get the following error,
pickle.PicklingError: Can't pickle <function _constrained_optimization at 0x7fdac5486230>: it's not found as sklearn.gaussian_process.gpr._constrained_optimization
Using an external function instead of _constrained_optimization might solve the issue.
What should I do next?

@amanp10
Copy link
Contributor Author

amanp10 commented Dec 24, 2016

Should I add a benchmark test for different n_jobs parameter with backend='threading' for now?

@jnothman
Copy link
Member

Yes, benchmark with backend='threading'

@jnothman
Copy link
Member

jnothman commented Dec 25, 2016

But it's also extremely easy to pull _constraint_optimization out as a function of (optimizer, obj_func, initial_theta, bounds) such that it is importable/picklable. However, obj_func remains unpicklable, and optimizer may be, so I think we're better off requiring that the threading backend only is used, and that check_pickle=False.

@amanp10
Copy link
Contributor Author

amanp10 commented Dec 28, 2016

I have added the Benchmark test but its failing. I have used python timeit, however it seems that with n_jobs=1 the fit method executes faster than with n_jobs=2 and n_jobs=-1.

@jnothman
Copy link
Member

jnothman commented Dec 28, 2016 via email

@amanp10
Copy link
Contributor Author

amanp10 commented Dec 28, 2016

The shapes were X->(6, 1) and y->(6,).
What do you suggest I do?

@jnothman
Copy link
Member

jnothman commented Dec 28, 2016 via email

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 3, 2017

What do you suggest I do next? I used the Boston House dataset but the tests were still failing. Should I try changing the function scopes so as to use backend="multiprocessing"?

@jnothman
Copy link
Member

jnothman commented Jan 4, 2017

Oh, I'd not realised you'd added tests for benchmarking. Usually we don't do this, but build a benchmarking script and upload it as a gist, or if we want to be able to repeat a benchmark over time, contribute it to the benchmarks/ directory in scikit-learn.

While this means we only test one user's platform at a time, it means we can do things like flexibly plot multiple benchmarks.

@jnothman
Copy link
Member

jnothman commented Jan 4, 2017

I'd rather see plots showing the size of the effect rather than just a passed test (which may fail if the machine is having a bad day, which is no good for continuous integration)

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 5, 2017

I have added the Benchmark test in the benchmarks/ folder. Please have a look.

@jnothman
Copy link
Member

jnothman commented Jan 6, 2017

I can only see n_jobs != 1 being an improvement on the 5th and to a lesser extent 6th kernels. (Though your measurements are not very robust as you make only one timing of each case.)

The gains so far seem to be quite marginal, and unless we can find a reasonable kernel or dataset size in which the gains are greater, perhaps this work is not of sufficient utility to merge :\

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 11, 2017

@jnothman What do you suggest me to do? Should I look for more kernels or datasizes? Or should I stop working on this issue?

@jnothman
Copy link
Member

jnothman commented Jan 11, 2017 via email

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 12, 2017

I tried using different datasizes but it was not of any use. I dont have much knowledge about kernels so I will not be working on this issue unless advised otherwise.

@jnothman
Copy link
Member

jnothman commented Jan 12, 2017 via email

@jnothman
Copy link
Member

jnothman commented Jan 12, 2017 via email

@amanp10
Copy link
Contributor Author

amanp10 commented Jan 12, 2017

I did learn a lot. Thanks to you @jnothman and @lesteve

@amanp10 amanp10 deleted the mybranch branch January 29, 2017 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants