# [MRG+1] Batching in nmf/sparse_dot to prevent MemoryError #15257

Merged
merged 8 commits into from Oct 27, 2019
Merged

# [MRG+1] Batching in nmf/sparse_dot to prevent MemoryError#15257

merged 8 commits into from Oct 27, 2019

## Conversation

Contributor

### Maocx commented Oct 15, 2019 • edited

Fixes #15242 .

#### What does this implement/fix? Explain your changes.

Batching whilst calculating the sparse dot prevents the allocation of an array with shape (#non-zero elements, rank). I propose to do the batching such that we limit the allocation to an array with shape (#non-zero elements / rank, rank).

Benchmark code to compare the difference in time:

```import time

import math
import numpy as np
import scipy.sparse as sp

def batched_simple(W, H, X):
"""Computes np.dot(W, H), only where X is non zero."""
if sp.issparse(X):
ii, jj = X.nonzero()
n_vals = ii.shape
dot_vals = np.empty(n_vals)
index = 0
rank = W.shape
batch_size = math.floor(n_vals / rank)
while index < n_vals:
selector = index + batch_size if index + batch_size <= n_vals else n_vals
dot_vals[index:selector] = np.multiply(W[ii[index:selector], :], H.T[jj[index:selector], :]).sum(axis=1)
index = selector

WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
return WH.tocsr()
else:
return np.dot(W, H)

def original_sparse_dot(W, H, X):
"""Computes np.dot(W, H), only where X is non zero."""
if sp.issparse(X):
ii, jj = X.nonzero()
dot_vals = np.multiply(W[ii, :], H.T[jj, :]).sum(axis=1)
WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
return WH.tocsr()
else:
return np.dot(W, H)

def generate_values():
# From the unittests
n_samples = 1000
n_features = 50
n_components = 30
rng = np.random.mtrand.RandomState(42)
X = rng.randn(n_samples, n_features)
np.clip(X, 0, None, out=X)
X_csr = sp.csr_matrix(X)

W = np.abs(rng.randn(n_samples, n_components))
H = np.abs(rng.randn(n_components, n_features))
return W, H, X_csr

def benchmark():
W, H, X_csr = generate_values()
rounds = 100

def test_func(func, name, rounds, W, H, X_csr):
start_time = time.clock()
for i in range(rounds):
WH = func(W, H, X_csr)
end_time = time.clock()
print(name + ": " + str((end_time - start_time) / rounds * 1000) + "ms per loop.")

test_func(original_sparse_dot, "original version", rounds, W, H, X_csr)
test_func(batched_simple, "batched_simple", rounds, W, H, X_csr)

if __name__ == '__main__':
benchmark()```

Output:

``````original version: 11.216264ms per loop.
batched_simple: 4.231038999999999ms per loop.
``````

I think this to be a strange result: I would have expected one big batch to be better. Discussion welcome :-)

added 4 commits Oct 15, 2019 ``` Implement batching whilst calculating _special_sparse_dot to reduce a… ```
``` 95fa08e ```
`…llocating an array with size (#non-zero elements, rank) to (#non-zero elements / rank, rank).` ``` Comply with 79 character limit ```
``` 32783a5 ``` ``` Revert the selector assignment ```
``` 64b12a6 ``` ``` Avoid multiple statements on one line ```
``` 89045f2 ``` changed the title Batching in nmf/sparse_dot to prevent MemoryError [MRG] Batching in nmf/sparse_dot to prevent MemoryError Oct 16, 2019
reviewed
Member

### TomDLT left a comment

 Thanks for the pull-request and the benchmark, I made a few comments. Could you also add an entry in `doc/whats-new/0.22.rst` ? Running the benchmark with `%timeit`: ```%timeit original_sparse_dot(W, H, X_csr) # 6.77 ms ± 3.18 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit batched_simple(W, H, X_csr) # 2.26 ms ± 10 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)```
sklearn/decomposition/nmf.py Outdated Show resolved Hide resolved
sklearn/decomposition/nmf.py Outdated Show resolved Hide resolved
sklearn/decomposition/nmf.py Outdated Show resolved Hide resolved
sklearn/decomposition/nmf.py Outdated Show resolved Hide resolved
added 3 commits Oct 18, 2019 ``` Add changes to nmf._special_sparse_dot() to whats new ```
``` 00bde80 ``` ``` Incorporate useful suggestions from @TomDLT ```
``` fe353a2 ``` ``` Comply with the character limit, again :) ```
``` 57a9ce8 ```
approved these changes
Member

### TomDLT left a comment

 LGTM, thanks !
doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved changed the title [MRG] Batching in nmf/sparse_dot to prevent MemoryError [MRG+1] Batching in nmf/sparse_dot to prevent MemoryError Oct 18, 2019
approved these changes
Member

### jnothman left a comment

 Otherwise Lgtm
doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved
 batch_size = max(n_components, n_vals // n_components) for start in range(0, n_vals, batch_size): batch = slice(start, start + batch_size) dot_vals[batch] = np.multiply(W[ii[batch], :],

#### jnothman Oct 20, 2019

Member

Can this be done with np.dot or einsum?

#### Maocx Oct 21, 2019

Author Contributor

Not with np.dot as far as I'm aware, np.einsum should be possible like this:
` dot_vals[batch] = np.einsum_path("ij,ik->i",W[ii[batch], :],H.T[jj[batch], :])`
but I'm receiving a value error. As I had to do quite some reading up on einsum, in my opinion this would also reduce the readability of the function :)

Code to reproduce error:

``````
import time

import numpy as np
import scipy.sparse as sp

def batched_simple(W, H, X):
"""Computes np.dot(W, H), only where X is non zero."""
if sp.issparse(X):
ii, jj = X.nonzero()
n_vals = ii.shape
dot_vals = np.empty(n_vals)
n_components = W.shape

batch_size = max(n_components, n_vals // n_components)
for start in range(0, n_vals, batch_size):
batch = slice(start, start + batch_size)
print(W[ii[batch], :].shape)
print(H.T[jj[batch], :].shape)
dot_vals[batch] = np.einsum_path("ij,ik->i", W[ii[batch], :],
H.T[jj[batch], :])
print(dot_vals)

WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
return WH.tocsr()
else:
return np.dot(W, H)

def original_sparse_dot(W, H, X):
"""Computes np.dot(W, H), only where X is non zero."""
if sp.issparse(X):
ii, jj = X.nonzero()
dot_vals = np.multiply(W[ii, :], H.T[jj, :]).sum(axis=1)
WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
return WH.tocsr()
else:
return np.dot(W, H)

def generate_values():
# From the unittests
n_samples = 1000
n_features = 50
n_components = 30
rng = np.random.mtrand.RandomState(42)
X = rng.randn(n_samples, n_features)
np.clip(X, 0, None, out=X)
X_csr = sp.csr_matrix(X)

W = np.abs(rng.randn(n_samples, n_components))
H = np.abs(rng.randn(n_components, n_features))
return W, H, X_csr

def benchmark():
W, H, X_csr = generate_values()
rounds = 100

def test_func(func, name, rounds, W, H, X_csr):
start_time = time.clock()
for i in range(rounds):
WH = func(W, H, X_csr)
end_time = time.clock()
print(name + ": " + str((end_time - start_time) / rounds * 1000) + "ms per loop.")

# test_func(original_sparse_dot, "original version", rounds, W, H, X_csr)
test_func(batched_simple, "batched_simple", rounds, W, H, X_csr)

if __name__ == '__main__':
benchmark()

`````` ``` Update what's changed to avoid mentioning private function. ```
``` fd9645d ``` merged commit `ff3af8d` into scikit-learn:master Oct 27, 2019
19 checks passed
19 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 97.2%)
Details
codecov/project Absolute coverage decreased by -1.05% but relative coverage increased by +2.79% compared to d0d8f20
Details
scikit-learn.scikit-learn Build #20191021.7 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda_mkl) Linux pylatest_conda_mkl succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
Member

### jnothman commented Oct 27, 2019

 Thanks @Maocx!