New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] TSNE: only compute error when needed #10593

Merged
merged 6 commits into from Feb 7, 2018

Conversation

Projects
None yet
3 participants
@TomDLT
Member

TomDLT commented Feb 6, 2018

This is a first speed improvement for TSNE with both solvers.
The idea is to compute the KL-divergence only every 50 iterations, when the stopping criterion is actually checked.

With method='barnes_hut', it leads to a 40% speed improvement on the gradient descent.

master_gradient_duration = array([   1.411,    8.815,   18.606,   97.405,  208.401])
branch_gradient_duration = array([   0.836,    4.988,   11.352,   61.483,  136.631])

figure_1

The benchmark is available here, where I used --pca-components 0 and a single CPU.

@jnothman

This comment has been minimized.

Member

jnothman commented Feb 6, 2018

Tests failing... I've not looked, but hope it's minor, because this sounds great!

@TomDLT

This comment has been minimized.

Member

TomDLT commented Feb 6, 2018

Tests fixed.


With method='exact', the speed improvement is also 30%-40%:

master_duration = array([    1.902,    21.813,    73.14 ,  1624.989])
branch_duration = array([    2.096,    15.586,    51.036,  1002.764])

figure_1

@TomDLT TomDLT changed the title from TSNE: only compute error when needed to [MRG] TSNE: only compute error when needed Feb 6, 2018

@jnothman

This comment has been minimized.

Member

jnothman commented Feb 6, 2018

Nice!

@jnothman

Very nice!

@@ -108,7 +109,8 @@ cdef float compute_gradient_positive(float[:] val_P,
float dof,
double sum_Q,
np.int64_t start,
int verbose) nogil:
int verbose,
int compute_error) nogil:

This comment has been minimized.

@jnothman

jnothman Feb 6, 2018

Member

should be a bint for clarity?

@@ -52,7 +52,8 @@ cdef float compute_gradient(float[:] val_P,
float theta,
float dof,
long start,
long stop) nogil:
long stop,
int compute_error) nogil:

This comment has been minimized.

@jnothman

jnothman Feb 6, 2018

Member

should be a bint for clarity?

@jnothman

Sorry, a few points...

@@ -336,6 +348,11 @@ def _gradient_descent(objective, p0, it, n_iter,
tic = time()
for i in range(it, n_iter):
check_convergence = (i + 1) % n_iter_check == 0
# only compute the error when needed
if 'compute_error' in kwargs:

This comment has been minimized.

@jnothman

jnothman Feb 6, 2018

Member

why do we need this? What is the case where kwargs does not include compute_error? Do we need to be passing it in at all or just setting it here?

This comment has been minimized.

@TomDLT

TomDLT Feb 6, 2018

Member

It was to deal with this test.
I though it was safer to put this safe guard, but I am not sure anymore since _gradient_descent is private

This comment has been minimized.

@jnothman

jnothman Feb 6, 2018

Member

How about just put it comute_error=True in the default kwargs set above when kwargs is None? Then the logic in the loop is clearer and independent of the call params.

check_convergence = (i + 1) % n_iter_check == 0
# only compute the error when needed
if 'compute_error' in kwargs:
kwargs['compute_error'] = check_convergence

This comment has been minimized.

@jnothman

jnothman Feb 6, 2018

Member

I think you need or i == n_iter - 1 here so that the verbose-printed KL-divergence and that set in kl_divergence_ is accurate.

@@ -806,8 +823,10 @@ def _tsne(self, P, degrees_of_freedom, n_samples, X_embedded,
opt_args['kwargs']['angle'] = self.angle
# Repeat verbose argument for _kl_divergence_bh
opt_args['kwargs']['verbose'] = self.verbose
opt_args['kwargs']['compute_error'] = True

This comment has been minimized.

@jnothman

jnothman Feb 6, 2018

Member

we can set this above where opt_args is defined, as it is invariant to method.

@@ -795,7 +811,8 @@ def _tsne(self, P, degrees_of_freedom, n_samples, X_embedded,
"min_grad_norm": self.min_grad_norm,
"learning_rate": self.learning_rate,
"verbose": self.verbose,
"kwargs": dict(skip_num_points=skip_num_points),
"kwargs": dict(skip_num_points=skip_num_points,
compute_error=True),

This comment has been minimized.

@jnothman

jnothman Feb 6, 2018

Member

I don't think this is needed anymore, as the default True will be upheld.

kl_divergence = 2.0 * np.dot(
P, np.log(np.maximum(P, MACHINE_EPSILON) / Q))
else:
kl_divergence = 0.

This comment has been minimized.

@jnothman

jnothman Feb 6, 2018

Member

maybe return NaN so that this value is not mishandled?

@jnothman jnothman changed the title from [MRG] TSNE: only compute error when needed to [MRG+1] TSNE: only compute error when needed Feb 7, 2018

@glemaitre glemaitre merged commit 6af51e6 into scikit-learn:master Feb 7, 2018

8 checks passed

ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: python2 Your tests passed on CircleCI!
Details
ci/circleci: python3 Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 94.48%)
Details
codecov/project 94.48% (+<.01%) compared to 77418a0
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details
@glemaitre

This comment has been minimized.

Contributor

glemaitre commented Feb 7, 2018

Merging. Thanks!!!

cperales added a commit to cperales/scikit-learn that referenced this pull request Feb 13, 2018

@TomDLT TomDLT deleted the TomDLT:tsne_compute_error branch Jun 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment