New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] TSNE: Avoid power computation if exponent is 1 #10610

Merged
merged 6 commits into from Feb 13, 2018

Conversation

Projects
None yet
5 participants
@TomDLT
Member

TomDLT commented Feb 9, 2018

Follows #10593
This is a second speed improvement for TSNE with "barnes_hut" solver.

The idea is to avoid an exponent computation.
This exponent is almost always equal to -1.
Indeed, the embedded space is equal to 2 in almost every application, and:

n_components = 2  # Dimension of the embedded space. Default = 2
degrees_of_freedom = max(self.n_components - 1.0, 1)
exponent = (degrees_of_freedom + 1.0) / -2.0

This PR changes exponent into -exponent which is now equal to 1 in most cases.

It leads to a 24% speed improvement on the gradient descent.

master_gradient_duration = array([  0.442,   3.06 ,   8.352,  40.11 ,  94.581])
branch_gradient_duration = array([  0.34 ,   2.271,   6.384,  32.006,  67.879])
relative_improvement = array([ 0.23076923,  0.25784314,  0.23563218,  0.20204438,  0.28231886])

figure_1

or total TSNE duration including NearestNeighbors (NN):
figure_1

The benchmark is available here, where I used --pca-components 0 and a single CPU.

TomDLT added some commits Feb 9, 2018

@@ -202,12 +204,14 @@ cdef void compute_gradient_negative(float[:, :] pos_reference,
# for the digits dataset, walking the tree
# is about 10-15x more expensive than the
# following for loop
exponent = (dof + 1.0) / -2.0
exponent = (dof + 1.0) / 2.0

This comment has been minimized.

@jnothman

jnothman Feb 10, 2018

Member

It doesn't look like this has a cdef!! It should also be set at the beginning of the function.

Should probably use cython -a to check for other untyped variables...

@TomDLT TomDLT changed the title from TSNE: Avoid power computation if exponent is 1 to [MRG] TSNE: Avoid power computation if exponent is 1 Feb 10, 2018

@jnothman

Thanks. I'm not sure if things like m deserve typing, nor if summary would've been implicitly typed because it's assigned a pointer... But this is good. I assume those typings didn't make a big dent on the benchmark

@jnothman jnothman changed the title from [MRG] TSNE: Avoid power computation if exponent is 1 to [MRG+1] TSNE: Avoid power computation if exponent is 1 Feb 10, 2018

@@ -140,7 +141,9 @@ cdef float compute_gradient_positive(float[:] val_P,
for ax in range(n_dimensions):
buff[ax] = pos_reference[i, ax] - pos_reference[j, ax]
dij += buff[ax] * buff[ax]
qij = ((1.0 + dij / dof) ** exponent)
qij = dof / (dof + dij)
if exponent != 1:

This comment has been minimized.

@glemaitre

glemaitre Feb 13, 2018

Contributor

Careful, comparison with a float, that could suck :)

@glemaitre

This comment has been minimized.

Contributor

glemaitre commented Feb 13, 2018

I was checking a bit more and there is no real reasoning to have dof to be a float. dof is defined from the n_components:

degrees_of_freedom = max(self.n_components - 1.0, 1)

So basically, it will always be an integer. Then the comparison that you made will be ok. It required to change a bit more code but this is internal code. My only issue is when computing qij = dof / (dof + dij), then we need to cast dof to a float it seems which would introduce an overhead. The bench should show what is the best. IMO, having integer around will be cleaner :)

@TomDLT

This comment has been minimized.

Member

TomDLT commented Feb 13, 2018

Is it better @glemaitre ?

@glemaitre

This comment has been minimized.

Contributor

glemaitre commented Feb 13, 2018

@TomDLT You forgot to change the signature of def gradient(..., int dof, ...) I think.

@glemaitre

This comment has been minimized.

Contributor

glemaitre commented Feb 13, 2018

and def compute_gradient(..., int dof, ...)

@@ -50,7 +50,7 @@ cdef float compute_gradient(float[:] val_P,
float[:, :] tot_force,
quad_tree._QuadTree qt,
float theta,
float dof,
int dof,

This comment has been minimized.

@TomDLT

TomDLT Feb 13, 2018

Member

isn't it already here?

@@ -240,7 +242,7 @@ def gradient(float[:] val_P,
float theta,
int n_dimensions,
int verbose,
float dof = 1.0,
int dof=1,

This comment has been minimized.

@TomDLT

TomDLT Feb 13, 2018

Member

and here?

@sklearn-lgtm

This comment has been minimized.

sklearn-lgtm commented Feb 13, 2018

This pull request introduces 37 alerts and fixes 18 - view on lgtm.com

new alerts:

  • 37 for Explicit export is not defined

fixed alerts:

  • 6 for Mismatch between signature and use of an overridden method
  • 3 for Missing call to init during object initialization
  • 3 for Potentially uninitialized local variable
  • 2 for Comparison using is when operands support eq
  • 2 for Wrong number of arguments for format
  • 1 for Conflicting attributes in base classes
  • 1 for Mismatch between signature and use of an overriding method

Comment posted by lgtm.com

@glemaitre

This comment has been minimized.

Contributor

glemaitre commented Feb 13, 2018

Ups sorry. Too much gradient everywhere. I misread :)

@glemaitre

LGTM apart of one revert think. @TomDLT before to merge, can you run the bench to see if something changed?

@@ -161,7 +161,7 @@ def _kl_divergence(params, P, degrees_of_freedom, n_samples, n_components,
# Q is a heavy-tailed distribution: Student's t-distribution
dist = pdist(X_embedded, "sqeuclidean")
dist /= degrees_of_freedom
dist /= float(degrees_of_freedom)

This comment has been minimized.

@glemaitre

glemaitre Feb 13, 2018

Contributor

I would let it as is ensuring that we have from __future__ import division on the top

@glemaitre

This comment has been minimized.

Contributor

glemaitre commented Feb 13, 2018

by the way, I don't get why LGTM is complaining :)

@TomDLT

This comment has been minimized.

Member

TomDLT commented Feb 13, 2018

Rerunning the benchmark (on a different computer): The ~24% speed improvement is still there.

master_gradient_duration = array([   0.808,    5.053,   11.637,   61.819,  136.813])
branch_gradient_duration = array([   0.679,    3.538,    8.616,   46.186,  104.56 ])
relative_improvement = array([ 0.15965347,  0.29982189,  0.25960299,  0.25288342,  0.23574514])

figure_2

or total TSNE duration including NearestNeighbors (NN):
figure_1

@glemaitre

This comment has been minimized.

Contributor

glemaitre commented Feb 13, 2018

Great merging when it's green

@sklearn-lgtm

This comment has been minimized.

sklearn-lgtm commented Feb 13, 2018

This pull request introduces 37 alerts and fixes 18 - view on lgtm.com

new alerts:

  • 37 for Explicit export is not defined

fixed alerts:

  • 6 for Mismatch between signature and use of an overridden method
  • 3 for Missing call to init during object initialization
  • 3 for Potentially uninitialized local variable
  • 2 for Comparison using is when operands support eq
  • 2 for Wrong number of arguments for format
  • 1 for Conflicting attributes in base classes
  • 1 for Mismatch between signature and use of an overriding method

Comment posted by lgtm.com

@glemaitre glemaitre merged commit 43eb87e into scikit-learn:master Feb 13, 2018

5 of 6 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: python2 Your tests passed on CircleCI!
Details
ci/circleci: python3 Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python 37 new alerts and 18 fixed alerts
Details
@glemaitre

This comment has been minimized.

Contributor

glemaitre commented Feb 13, 2018

Thanks Tom

@TomDLT

This comment has been minimized.

Member

TomDLT commented Feb 13, 2018

Thanks for the reviews !

@jnothman

This comment has been minimized.

Member

jnothman commented Feb 13, 2018

@sjvs

This comment has been minimized.

sjvs commented Feb 13, 2018

Thanks for flagging this up @jnothman — I'll try to figure out what went wrong here on the lgtm.com side.

@TomDLT TomDLT deleted the TomDLT:tsne_exponent branch Jun 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment