Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Batching in nmf/sparse_dot to prevent MemoryError #15257

Merged
merged 8 commits into from Oct 27, 2019

Conversation

@Maocx
Copy link
Contributor

Maocx commented Oct 15, 2019

Reference Issues/PRs

Fixes #15242 .

What does this implement/fix? Explain your changes.

Batching whilst calculating the sparse dot prevents the allocation of an array with shape (#non-zero elements, rank). I propose to do the batching such that we limit the allocation to an array with shape (#non-zero elements / rank, rank).

Any other comments?

Benchmark code to compare the difference in time:

import time

import math
import numpy as np
import scipy.sparse as sp


def batched_simple(W, H, X):
    """Computes np.dot(W, H), only where X is non zero."""
    if sp.issparse(X):
        ii, jj = X.nonzero()
        n_vals = ii.shape[0]
        dot_vals = np.empty(n_vals)
        index = 0
        rank = W.shape[1]
        batch_size = math.floor(n_vals / rank)
        while index < n_vals:
            selector = index + batch_size if index + batch_size <= n_vals else n_vals
            dot_vals[index:selector] = np.multiply(W[ii[index:selector], :], H.T[jj[index:selector], :]).sum(axis=1)
            index = selector

        WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
        return WH.tocsr()
    else:
        return np.dot(W, H)


def original_sparse_dot(W, H, X):
    """Computes np.dot(W, H), only where X is non zero."""
    if sp.issparse(X):
        ii, jj = X.nonzero()
        dot_vals = np.multiply(W[ii, :], H.T[jj, :]).sum(axis=1)
        WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
        return WH.tocsr()
    else:
        return np.dot(W, H)


def generate_values():
    # From the unittests
    n_samples = 1000
    n_features = 50
    n_components = 30
    rng = np.random.mtrand.RandomState(42)
    X = rng.randn(n_samples, n_features)
    np.clip(X, 0, None, out=X)
    X_csr = sp.csr_matrix(X)

    W = np.abs(rng.randn(n_samples, n_components))
    H = np.abs(rng.randn(n_components, n_features))
    return W, H, X_csr


def benchmark():
    W, H, X_csr = generate_values()
    rounds = 100

    def test_func(func, name, rounds, W, H, X_csr):
        start_time = time.clock()
        for i in range(rounds):
            WH = func(W, H, X_csr)
        end_time = time.clock()
        print(name + ": " + str((end_time - start_time) / rounds * 1000) + "ms per loop.")

    test_func(original_sparse_dot, "original version", rounds, W, H, X_csr)
    test_func(batched_simple, "batched_simple", rounds, W, H, X_csr)


if __name__ == '__main__':
    benchmark()

Output:

original version: 11.216264ms per loop.
batched_simple: 4.231038999999999ms per loop.

I think this to be a strange result: I would have expected one big batch to be better. Discussion welcome :-)

Mart added 4 commits Oct 15, 2019
…llocating an array with size (#non-zero elements, rank) to (#non-zero elements / rank, rank).
Mart
Mart
@Maocx Maocx changed the title Batching in nmf/sparse_dot to prevent MemoryError [MRG] Batching in nmf/sparse_dot to prevent MemoryError Oct 16, 2019
Copy link
Member

TomDLT left a comment

Thanks for the pull-request and the benchmark, I made a few comments.
Could you also add an entry in doc/whats-new/0.22.rst ?

Running the benchmark with %timeit:

%timeit original_sparse_dot(W, H, X_csr)
# 6.77 ms ± 3.18 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit batched_simple(W, H, X_csr)
# 2.26 ms ± 10 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
sklearn/decomposition/nmf.py Outdated Show resolved Hide resolved
sklearn/decomposition/nmf.py Outdated Show resolved Hide resolved
sklearn/decomposition/nmf.py Outdated Show resolved Hide resolved
sklearn/decomposition/nmf.py Outdated Show resolved Hide resolved
@TomDLT
TomDLT approved these changes Oct 18, 2019
Copy link
Member

TomDLT left a comment

LGTM, thanks !

doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved
@TomDLT TomDLT changed the title [MRG] Batching in nmf/sparse_dot to prevent MemoryError [MRG+1] Batching in nmf/sparse_dot to prevent MemoryError Oct 18, 2019
Copy link
Member

jnothman left a comment

Otherwise Lgtm

doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved
batch_size = max(n_components, n_vals // n_components)
for start in range(0, n_vals, batch_size):
batch = slice(start, start + batch_size)
dot_vals[batch] = np.multiply(W[ii[batch], :],

This comment has been minimized.

Copy link
@jnothman

jnothman Oct 20, 2019

Member

Can this be done with np.dot or einsum?

This comment has been minimized.

Copy link
@Maocx

Maocx Oct 21, 2019

Author Contributor

Not with np.dot as far as I'm aware, np.einsum should be possible like this:
dot_vals[batch] = np.einsum_path("ij,ik->i",W[ii[batch], :],H.T[jj[batch], :])
but I'm receiving a value error. As I had to do quite some reading up on einsum, in my opinion this would also reduce the readability of the function :)

Code to reproduce error:


import time

import numpy as np
import scipy.sparse as sp


def batched_simple(W, H, X):
    """Computes np.dot(W, H), only where X is non zero."""
    if sp.issparse(X):
        ii, jj = X.nonzero()
        n_vals = ii.shape[0]
        dot_vals = np.empty(n_vals)
        n_components = W.shape[1]

        batch_size = max(n_components, n_vals // n_components)
        for start in range(0, n_vals, batch_size):
            batch = slice(start, start + batch_size)
            print(W[ii[batch], :].shape)
            print(H.T[jj[batch], :].shape)
            dot_vals[batch] = np.einsum_path("ij,ik->i", W[ii[batch], :],
                                             H.T[jj[batch], :])
            print(dot_vals)

        WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
        return WH.tocsr()
    else:
        return np.dot(W, H)


def original_sparse_dot(W, H, X):
    """Computes np.dot(W, H), only where X is non zero."""
    if sp.issparse(X):
        ii, jj = X.nonzero()
        dot_vals = np.multiply(W[ii, :], H.T[jj, :]).sum(axis=1)
        WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
        return WH.tocsr()
    else:
        return np.dot(W, H)


def generate_values():
    # From the unittests
    n_samples = 1000
    n_features = 50
    n_components = 30
    rng = np.random.mtrand.RandomState(42)
    X = rng.randn(n_samples, n_features)
    np.clip(X, 0, None, out=X)
    X_csr = sp.csr_matrix(X)

    W = np.abs(rng.randn(n_samples, n_components))
    H = np.abs(rng.randn(n_components, n_features))
    return W, H, X_csr


def benchmark():
    W, H, X_csr = generate_values()
    rounds = 100

    def test_func(func, name, rounds, W, H, X_csr):
        start_time = time.clock()
        for i in range(rounds):
            WH = func(W, H, X_csr)
        end_time = time.clock()
        print(name + ": " + str((end_time - start_time) / rounds * 1000) + "ms per loop.")

    # test_func(original_sparse_dot, "original version", rounds, W, H, X_csr)
    test_func(batched_simple, "batched_simple", rounds, W, H, X_csr)


if __name__ == '__main__':
    benchmark()

@jnothman jnothman merged commit ff3af8d into scikit-learn:master Oct 27, 2019
19 checks passed
19 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 97.2%)
Details
codecov/project Absolute coverage decreased by -1.05% but relative coverage increased by +2.79% compared to d0d8f20
Details
scikit-learn.scikit-learn Build #20191021.7 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda_mkl) Linux pylatest_conda_mkl succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Oct 27, 2019

Thanks @Maocx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.