New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] NMF speed-up for beta_loss = 0 #9277

Merged
merged 6 commits into from Jul 5, 2017

Conversation

Projects
None yet
4 participants
@hongkahjun
Contributor

hongkahjun commented Jul 4, 2017

Suggestion for speeding up IS divergence in NMF mu update:

WH_safe_X_data **= -1
WH_safe_X_data **= 2

is much faster than

 WH_safe_X_data **= beta_loss - 2

Using line_profiler on ipython to time the lines,

seconds
4363077           WH_safe_X_data **= beta_loss - 2

vs

219524            WH_safe_X_data **= -1
33966             WH_safe_X_data **= 2

test code below:

from sklearn.decomposition.nmf import non_negative_factorization
from sklearn.decomposition.nmf import _multiplicative_update_w
from sklearn.datasets import make_classification
import time
from IPython import get_ipython
import numpy as np

ipython = get_ipython()
np.random.seed(10)
t0 = time.time()
all_samples, all_targets = make_classification(n_samples=1000, n_features=513, n_informative=511,
                                               n_redundant=2, n_repeated=0, n_classes=2,
                                               n_clusters_per_class=1, random_state=0)
all_samples += 5000
ipython.magic(
    "lprun -f _multiplicative_update_w non_negative_factorization(all_samples, n_components=16, solver='mu', beta_loss='itakura-saito', max_iter=100)")
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 4, 2017

Member
Member

jnothman commented Jul 4, 2017

@hongkahjun

This comment has been minimized.

Show comment
Hide comment
@hongkahjun

hongkahjun Jul 5, 2017

Contributor

Hi

Sorry if I was not clear but

WH_safe_X_data **= -2

yields

4217895     WH_safe_X_data **= -2

Also, not sure why it is much faster, but seems like it has something to do with how numpy calculates powers that are not positive integers.

Contributor

hongkahjun commented Jul 5, 2017

Hi

Sorry if I was not clear but

WH_safe_X_data **= -2

yields

4217895     WH_safe_X_data **= -2

Also, not sure why it is much faster, but seems like it has something to do with how numpy calculates powers that are not positive integers.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member
Member

jnothman commented Jul 5, 2017

@hongkahjun

This comment has been minimized.

Show comment
Hide comment
@hongkahjun

hongkahjun Jul 5, 2017

Contributor
WH_safe_X_data **= -2 yields 4,217,895
while
WH_safe_X_data **= -1 yields 219,524            
WH_safe_X_data **= 2 yields 33,966             
Contributor

hongkahjun commented Jul 5, 2017

WH_safe_X_data **= -2 yields 4,217,895
while
WH_safe_X_data **= -1 yields 219,524            
WH_safe_X_data **= 2 yields 33,966             
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member
Member

jnothman commented Jul 5, 2017

@hongkahjun

This comment has been minimized.

Show comment
Hide comment
@hongkahjun

hongkahjun Jul 5, 2017

Contributor

Hi,

I am using 1.11.3.

Contributor

hongkahjun commented Jul 5, 2017

Hi,

I am using 1.11.3.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member
Member

jnothman commented Jul 5, 2017

@TomDLT

This comment has been minimized.

Show comment
Hide comment
@TomDLT

TomDLT Jul 5, 2017

Member

Numpy uses different functions for power internally:

  • When the exponent is in {-1, 0, 0.5, 1, 2}, it uses respectively {reciprocal, one_like, sqrt, ~identity, square}.
  • For any other exponent, it uses a much slower routine. This is why a **= 2; a **= -1 is much faster than a **= -2.

A benchmark on a **= b; a **= -1 versus a **= -b gives me (v1.11.3):
figure_1

(Click on details to show the script)

import numpy as np
from time import time
import matplotlib.pyplot as plt

n_points = int(1e6)
power_range = np.arange(0, 4.1, 0.1)
durations = np.zeros((2, power_range.size))

array = np.random.randn(n_points)
np.abs(array, array)

for i, power in enumerate(power_range):
    array_copy = array.copy()
    start = time()
    array_copy **= -power
    durations[0, i] = time() - start

    array_copy = array.copy()
    start = time()
    array_copy **= power
    array_copy **= -1
    durations[1, i] = time() - start


plt.figure(figsize=(10, 4))
ax = plt.gca()
ax.plot(power_range, durations[0], '-o', label='one operation')
ax.plot(power_range, durations[1], '-o', label='two operations')
ax.set(xlabel='power', ylabel='time', title='Elementwise power in Numpy')
ax.legend()
plt.show()
Member

TomDLT commented Jul 5, 2017

Numpy uses different functions for power internally:

  • When the exponent is in {-1, 0, 0.5, 1, 2}, it uses respectively {reciprocal, one_like, sqrt, ~identity, square}.
  • For any other exponent, it uses a much slower routine. This is why a **= 2; a **= -1 is much faster than a **= -2.

A benchmark on a **= b; a **= -1 versus a **= -b gives me (v1.11.3):
figure_1

(Click on details to show the script)

import numpy as np
from time import time
import matplotlib.pyplot as plt

n_points = int(1e6)
power_range = np.arange(0, 4.1, 0.1)
durations = np.zeros((2, power_range.size))

array = np.random.randn(n_points)
np.abs(array, array)

for i, power in enumerate(power_range):
    array_copy = array.copy()
    start = time()
    array_copy **= -power
    durations[0, i] = time() - start

    array_copy = array.copy()
    start = time()
    array_copy **= power
    array_copy **= -1
    durations[1, i] = time() - start


plt.figure(figsize=(10, 4))
ax = plt.gca()
ax.plot(power_range, durations[0], '-o', label='one operation')
ax.plot(power_range, durations[1], '-o', label='two operations')
ax.set(xlabel='power', ylabel='time', title='Elementwise power in Numpy')
ax.legend()
plt.show()
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member
Member

jnothman commented Jul 5, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member

@TomDLT will you report at Numpy?

Member

jnothman commented Jul 5, 2017

@TomDLT will you report at Numpy?

Hong Kah Jun
@hongkahjun

This comment has been minimized.

Show comment
Hide comment
@hongkahjun

hongkahjun Jul 5, 2017

Contributor

All right, I added a comment stating that code is using numpy's reciprocal function for exponent -1

Contributor

hongkahjun commented Jul 5, 2017

All right, I added a comment stating that code is using numpy's reciprocal function for exponent -1

@jnothman jnothman changed the title from [MRG] NMF speed-up for beta_loss = 0 to [MRG+1] NMF speed-up for beta_loss = 0 Jul 5, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 5, 2017

Member

LGTM

Member

jnothman commented Jul 5, 2017

LGTM

Hong Kah Jun
@ogrisel

ogrisel approved these changes Jul 5, 2017

+1 for merge once CI is green.

@jnothman jnothman merged commit a8306d4 into scikit-learn:master Jul 5, 2017

2 of 3 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman
Member

jnothman commented Jul 5, 2017

Thanks @hongkahjun

massich added a commit to massich/scikit-learn that referenced this pull request Jul 13, 2017

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment