Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH safe_sparse_dot work on 3D+ arrays #14538

Merged
merged 8 commits into from Sep 10, 2019

Conversation

@jeremiedbb
Copy link
Member

jeremiedbb commented Aug 1, 2019

  • safe_sparse_dot docstring was not accurate regarding dense_output parameter. If only one of them is sparse, it would always return a dense array.
    I don't think it was an intended behavior anyway since I saw several

    C = safe_sparse_dot(A, B)
    C += <a scalar>

    which would break if the output was sparse. Moreover, it's very unlikely that dense @ sparse has many zeros. So I changed the docstring to match the behavior.

  • It did not handle the case when one of the arrays is 3D+ and the other is sparse.

  • When arrays are 1D or 2D, matmul is preferred over dot, and is a little bit faster. I replace dot by @ when applicable.

  • Function was untested.

@@ -117,29 +117,40 @@ def density(w, **kwargs):
def safe_sparse_dot(a, b, dense_output=False):
"""Dot product that handle the sparse matrix case correctly
Uses BLAS GEMM as replacement for numpy.dot where possible
to avoid unnecessary copies.

This comment has been minimized.

Copy link
@jeremiedbb

jeremiedbb Aug 1, 2019

Author Member

Removed that: it does not do that.

if dense_output and hasattr(ret, "toarray"):
ret = ret.toarray()
return ret
if a.ndim > 2 or b.ndim > 2:

This comment has been minimized.

Copy link
@rth

rth Aug 1, 2019

Member

Does this really work for ND arrays? I would restrict it to 3D, which is what we are using, and raise a NotImplementedError otherwise. We are not testing 4D+ case anyway.

This comment has been minimized.

Copy link
@jeremiedbb

jeremiedbb Aug 1, 2019

Author Member

Yes it does :)
I updated the tests to ensure that. I don't see any reason to raise an error since it's implemented and works.

This comment has been minimized.

Copy link
@amueller

amueller Aug 2, 2019

Member

where do we need 3d?

This comment has been minimized.

Copy link
@jeremiedbb

jeremiedbb Aug 2, 2019

Author Member

At least in coordinate descent (see diff) between 3D dense and sparse.

ret = ret.toarray()
return ret
if a.ndim > 2 or b.ndim > 2:
if sparse.issparse(a):

This comment has been minimized.

Copy link
@rth

rth Aug 1, 2019

Member

So implicitly here a is 2D, b 3D+ might be worth adding a comment

b_2d = b_.reshape((b.shape[-2], -1))
ret = a @ b_2d
ret = ret.reshape(a.shape[0], *b_.shape[1:])
elif sparse.issparse(b):

This comment has been minimized.

Copy link
@rth

rth Aug 1, 2019

Member

Same

Copy link
Member

jnothman left a comment

I wouldn't have been sure about supporting >=3d, but I'm happy with this. I'd like to change the title to not say FIX... it's a DOC fix because the documented behaviour is unreasonable ;)

Add what's new?

@jeremiedbb jeremiedbb changed the title [MRG] FIX safe_sparse_dot docstring & ENH work on 3D+ arrays [MRG] DOC fix dense_output param in safe_sparse_dot & ENH work on 3D+ arrays Aug 2, 2019
@jeremiedbb

This comment has been minimized.

Copy link
Member Author

jeremiedbb commented Sep 9, 2019

@rth I think I adressed your comments

@rth
rth approved these changes Sep 9, 2019
Copy link
Member

rth left a comment

Minor comment otherwise LTGM.

sparse matrix.
|Fix| The documented behavior of the ``dense_output`` parameter of
:func:`utils.safe_sparse_dot` was not the intended behavior and now matches
the actual behavior of the function.

This comment has been minimized.

Copy link
@rth

rth Sep 9, 2019

Member

So the second part is a documentation fix? Not sure it needs a what's new entry?

This comment has been minimized.

Copy link
@jeremiedbb

jeremiedbb Sep 10, 2019

Author Member

I was not sure about that. It's true that the behavior did not change, only it's documentation. I removed it.

@rth rth changed the title [MRG] DOC fix dense_output param in safe_sparse_dot & ENH work on 3D+ arrays ENH safe_sparse_dot work on 3D+ arrays Sep 10, 2019
@rth rth merged commit 61add4d into scikit-learn:master Sep 10, 2019
18 checks passed
18 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 95.85%)
Details
codecov/project Absolute coverage decreased by -1.48% but relative coverage increased by +4.14% compared to 5cf88db
Details
scikit-learn.scikit-learn Build #20190910.22 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.