Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Batching in nmf/sparse_dot to prevent MemoryError #15257

Merged
merged 8 commits into from Oct 27, 2019
Merged

[MRG+1] Batching in nmf/sparse_dot to prevent MemoryError #15257

merged 8 commits into from Oct 27, 2019

Conversation

Maocx
Copy link
Contributor

@Maocx Maocx commented Oct 15, 2019

Reference Issues/PRs

Fixes #15242 .

What does this implement/fix? Explain your changes.

Batching whilst calculating the sparse dot prevents the allocation of an array with shape (#non-zero elements, rank). I propose to do the batching such that we limit the allocation to an array with shape (#non-zero elements / rank, rank).

Any other comments?

Benchmark code to compare the difference in time:

import time

import math
import numpy as np
import scipy.sparse as sp


def batched_simple(W, H, X):
    """Computes np.dot(W, H), only where X is non zero."""
    if sp.issparse(X):
        ii, jj = X.nonzero()
        n_vals = ii.shape[0]
        dot_vals = np.empty(n_vals)
        index = 0
        rank = W.shape[1]
        batch_size = math.floor(n_vals / rank)
        while index < n_vals:
            selector = index + batch_size if index + batch_size <= n_vals else n_vals
            dot_vals[index:selector] = np.multiply(W[ii[index:selector], :], H.T[jj[index:selector], :]).sum(axis=1)
            index = selector

        WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
        return WH.tocsr()
    else:
        return np.dot(W, H)


def original_sparse_dot(W, H, X):
    """Computes np.dot(W, H), only where X is non zero."""
    if sp.issparse(X):
        ii, jj = X.nonzero()
        dot_vals = np.multiply(W[ii, :], H.T[jj, :]).sum(axis=1)
        WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
        return WH.tocsr()
    else:
        return np.dot(W, H)


def generate_values():
    # From the unittests
    n_samples = 1000
    n_features = 50
    n_components = 30
    rng = np.random.mtrand.RandomState(42)
    X = rng.randn(n_samples, n_features)
    np.clip(X, 0, None, out=X)
    X_csr = sp.csr_matrix(X)

    W = np.abs(rng.randn(n_samples, n_components))
    H = np.abs(rng.randn(n_components, n_features))
    return W, H, X_csr


def benchmark():
    W, H, X_csr = generate_values()
    rounds = 100

    def test_func(func, name, rounds, W, H, X_csr):
        start_time = time.clock()
        for i in range(rounds):
            WH = func(W, H, X_csr)
        end_time = time.clock()
        print(name + ": " + str((end_time - start_time) / rounds * 1000) + "ms per loop.")

    test_func(original_sparse_dot, "original version", rounds, W, H, X_csr)
    test_func(batched_simple, "batched_simple", rounds, W, H, X_csr)


if __name__ == '__main__':
    benchmark()

Output:

original version: 11.216264ms per loop.
batched_simple: 4.231038999999999ms per loop.

I think this to be a strange result: I would have expected one big batch to be better. Discussion welcome :-)

@Maocx Maocx changed the title Batching in nmf/sparse_dot to prevent MemoryError [MRG] Batching in nmf/sparse_dot to prevent MemoryError Oct 16, 2019
Copy link
Member

@TomDLT TomDLT left a comment

Thanks for the pull-request and the benchmark, I made a few comments.
Could you also add an entry in doc/whats-new/0.22.rst ?

Running the benchmark with %timeit:

%timeit original_sparse_dot(W, H, X_csr)
# 6.77 ms ± 3.18 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit batched_simple(W, H, X_csr)
# 2.26 ms ± 10 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

sklearn/decomposition/nmf.py Outdated Show resolved Hide resolved
sklearn/decomposition/nmf.py Outdated Show resolved Hide resolved
sklearn/decomposition/nmf.py Outdated Show resolved Hide resolved
sklearn/decomposition/nmf.py Outdated Show resolved Hide resolved
TomDLT
TomDLT approved these changes Oct 18, 2019
Copy link
Member

@TomDLT TomDLT left a comment

LGTM, thanks !

doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved
@TomDLT TomDLT changed the title [MRG] Batching in nmf/sparse_dot to prevent MemoryError [MRG+1] Batching in nmf/sparse_dot to prevent MemoryError Oct 18, 2019
Copy link
Member

@jnothman jnothman left a comment

Otherwise Lgtm

doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved
batch_size = max(n_components, n_vals // n_components)
for start in range(0, n_vals, batch_size):
batch = slice(start, start + batch_size)
dot_vals[batch] = np.multiply(W[ii[batch], :],
Copy link
Member

@jnothman jnothman Oct 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done with np.dot or einsum?

Copy link
Contributor Author

@Maocx Maocx Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with np.dot as far as I'm aware, np.einsum should be possible like this:
dot_vals[batch] = np.einsum_path("ij,ik->i",W[ii[batch], :],H.T[jj[batch], :])
but I'm receiving a value error. As I had to do quite some reading up on einsum, in my opinion this would also reduce the readability of the function :)

Code to reproduce error:


import time

import numpy as np
import scipy.sparse as sp


def batched_simple(W, H, X):
    """Computes np.dot(W, H), only where X is non zero."""
    if sp.issparse(X):
        ii, jj = X.nonzero()
        n_vals = ii.shape[0]
        dot_vals = np.empty(n_vals)
        n_components = W.shape[1]

        batch_size = max(n_components, n_vals // n_components)
        for start in range(0, n_vals, batch_size):
            batch = slice(start, start + batch_size)
            print(W[ii[batch], :].shape)
            print(H.T[jj[batch], :].shape)
            dot_vals[batch] = np.einsum_path("ij,ik->i", W[ii[batch], :],
                                             H.T[jj[batch], :])
            print(dot_vals)

        WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
        return WH.tocsr()
    else:
        return np.dot(W, H)


def original_sparse_dot(W, H, X):
    """Computes np.dot(W, H), only where X is non zero."""
    if sp.issparse(X):
        ii, jj = X.nonzero()
        dot_vals = np.multiply(W[ii, :], H.T[jj, :]).sum(axis=1)
        WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
        return WH.tocsr()
    else:
        return np.dot(W, H)


def generate_values():
    # From the unittests
    n_samples = 1000
    n_features = 50
    n_components = 30
    rng = np.random.mtrand.RandomState(42)
    X = rng.randn(n_samples, n_features)
    np.clip(X, 0, None, out=X)
    X_csr = sp.csr_matrix(X)

    W = np.abs(rng.randn(n_samples, n_components))
    H = np.abs(rng.randn(n_components, n_features))
    return W, H, X_csr


def benchmark():
    W, H, X_csr = generate_values()
    rounds = 100

    def test_func(func, name, rounds, W, H, X_csr):
        start_time = time.clock()
        for i in range(rounds):
            WH = func(W, H, X_csr)
        end_time = time.clock()
        print(name + ": " + str((end_time - start_time) / rounds * 1000) + "ms per loop.")

    # test_func(original_sparse_dot, "original version", rounds, W, H, X_csr)
    test_func(batched_simple, "batched_simple", rounds, W, H, X_csr)


if __name__ == '__main__':
    benchmark()

@jnothman jnothman merged commit ff3af8d into scikit-learn:master Oct 27, 2019
19 checks passed
@jnothman
Copy link
Member

@jnothman jnothman commented Oct 27, 2019

Thanks @Maocx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants