Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Make return_std 20x faster in Gaussian Processes (includes solution) #9234
(1) Please replace this line:
yvar -= np.einsum("ki,kj,ij->k", K_trans, K_trans, K_inv)
sum1 = np.dot(K_trans,K_inv).T yvar -= np.einsum("ki,ik->k", K_trans, sum1)
For an input data set of size 800x1, the time difference is 12.7 seconds to 0.2 seconds. I have validated that the result is the same up to 1e-12 or smaller.
(2) Please cache the result of the K_inv computation. It depends only on the result of training, and can be very costly for repeated calls to the class.
Complete solution, starting here https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/gpr.py#L322
if not hasattr(self,'K_inv_'): L_inv = solve_triangular(self.L_.T, np.eye(self.L_.shape)) self.K_inv_ = L_inv.dot(L_inv.T) # Compute variance of predictive distribution y_var = self.kernel_.diag(X) sum1 = np.dot(K_trans,self.K_inv_).T y_var1 = y_var - np.einsum("ki,ik->k", K_trans, sum1) # y_var2 = y_var - np.einsum("ki,kj,ij->k", K_trans, K_trans, self.K_inv_) # assert np.all(np.abs(y_var1-y_var2)<1e-12) y_var = y_var1
This looks like it might consume some negligible extra memory but otherwise should only benefit...…
On 28 Jun 2017 2:27 am, "andrewww" ***@***.***> wrote: I'm sorry, I can't. I'm posting this from an environment where I can access the website, but none of the other GitHub tools. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#9234 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6_mL80w_0o6pjedx0c9YVVnEdww8ks5sIS1TgaJpZM4OG05k> .
On Tue, Jun 27, 2017 at 4:09 PM, Minghui Liu ***@***.***> wrote: I would like to help and make the changes if that's ok. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#9234 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADvEEDB5mqN-aHBWkvx01fApzXyFXoz6ks5sIYungaJpZM4OG05k> .
referenced this issue
Jun 28, 2017
BTW I did not observe a 20x speed up a in my tests. The speed stayed approximately the same on a 1000x5 dataset generated with
Note: the second call to predict is significantly faster because of the cached
@andrewww I would be curious to know more about the kind of data where you observed the initially reported speedup.
Hmmm.... This was literally a make-or-break change to the code for me, i.e., the code was so slow I could not actually use it without this change (the change to the .einsum() call).
Only thing I can think of is: I'm on Windows 7 x64 / Anaconda 3.1.4. Doesn't numpy sometimes behave differently on different platforms? Maybe the Windows
Also, apparently it was already fixed in an earlier issue #8591