New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Batching in nmf/sparse_dot to prevent MemoryError #15257
Conversation
…llocating an array with size (#non-zero elements, rank) to (#non-zero elements / rank, rank).
Thanks for the pull-request and the benchmark, I made a few comments.
Could you also add an entry in doc/whats-new/0.22.rst
?
Running the benchmark with %timeit
:
%timeit original_sparse_dot(W, H, X_csr)
# 6.77 ms ± 3.18 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit batched_simple(W, H, X_csr)
# 2.26 ms ± 10 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
batch_size = max(n_components, n_vals // n_components) | ||
for start in range(0, n_vals, batch_size): | ||
batch = slice(start, start + batch_size) | ||
dot_vals[batch] = np.multiply(W[ii[batch], :], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be done with np.dot or einsum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not with np.dot as far as I'm aware, np.einsum should be possible like this:
dot_vals[batch] = np.einsum_path("ij,ik->i",W[ii[batch], :],H.T[jj[batch], :])
but I'm receiving a value error. As I had to do quite some reading up on einsum, in my opinion this would also reduce the readability of the function :)
Code to reproduce error:
import time
import numpy as np
import scipy.sparse as sp
def batched_simple(W, H, X):
"""Computes np.dot(W, H), only where X is non zero."""
if sp.issparse(X):
ii, jj = X.nonzero()
n_vals = ii.shape[0]
dot_vals = np.empty(n_vals)
n_components = W.shape[1]
batch_size = max(n_components, n_vals // n_components)
for start in range(0, n_vals, batch_size):
batch = slice(start, start + batch_size)
print(W[ii[batch], :].shape)
print(H.T[jj[batch], :].shape)
dot_vals[batch] = np.einsum_path("ij,ik->i", W[ii[batch], :],
H.T[jj[batch], :])
print(dot_vals)
WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
return WH.tocsr()
else:
return np.dot(W, H)
def original_sparse_dot(W, H, X):
"""Computes np.dot(W, H), only where X is non zero."""
if sp.issparse(X):
ii, jj = X.nonzero()
dot_vals = np.multiply(W[ii, :], H.T[jj, :]).sum(axis=1)
WH = sp.coo_matrix((dot_vals, (ii, jj)), shape=X.shape)
return WH.tocsr()
else:
return np.dot(W, H)
def generate_values():
# From the unittests
n_samples = 1000
n_features = 50
n_components = 30
rng = np.random.mtrand.RandomState(42)
X = rng.randn(n_samples, n_features)
np.clip(X, 0, None, out=X)
X_csr = sp.csr_matrix(X)
W = np.abs(rng.randn(n_samples, n_components))
H = np.abs(rng.randn(n_components, n_features))
return W, H, X_csr
def benchmark():
W, H, X_csr = generate_values()
rounds = 100
def test_func(func, name, rounds, W, H, X_csr):
start_time = time.clock()
for i in range(rounds):
WH = func(W, H, X_csr)
end_time = time.clock()
print(name + ": " + str((end_time - start_time) / rounds * 1000) + "ms per loop.")
# test_func(original_sparse_dot, "original version", rounds, W, H, X_csr)
test_func(batched_simple, "batched_simple", rounds, W, H, X_csr)
if __name__ == '__main__':
benchmark()
Thanks @Maocx! |
Reference Issues/PRs
Fixes #15242 .
What does this implement/fix? Explain your changes.
Batching whilst calculating the sparse dot prevents the allocation of an array with shape (#non-zero elements, rank). I propose to do the batching such that we limit the allocation to an array with shape (#non-zero elements / rank, rank).
Any other comments?
Benchmark code to compare the difference in time:
Output:
I think this to be a strange result: I would have expected one big batch to be better. Discussion welcome :-)