Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: sparse in-place operations is not in-place (change data-type and object) #7826

Open
zerothi opened this issue Sep 4, 2017 · 13 comments
Labels
needs-decision Items that need further discussion before they are merged or closed scipy.sparse

Comments

@zerothi
Copy link
Contributor

zerothi commented Sep 4, 2017

When doing inplace additions of sparse matrices the resulting data-type is not retained.

I.e. the in-place sparse matrix gets up-casted.

Reproducing code example:

import numpy as np
import scipy.sparse as sp

cc = sp.csr_matrix((10, 10), dtype=np.float64)
cc[0, 0] = 1.

# WRONG
cf = sp.csr_matrix((10, 10), dtype=np.float32)
cf[0, 0] = 1.
print('float32 == ', cf.dtype)
cf += cc
print('float32 == ', cf.dtype)

# WORKS AS EXPECTED
cf = sp.csr_matrix((10, 10), dtype=np.float32)
cf[0, 0] = 1.
print('float32 == ', cf.dtype)
cf[:, :] += cc
print('float32 == ', cf.dtype)

Error message:

There is no error message, it simply upcasts the arrays, also when adding a complex array (say only for real part).

Scipy/Numpy/Python version information:

scipy: 0.19.1
numpy: 1.13.1
Python: 2.7.13 and 3.6.2

@pv
Copy link
Member

pv commented Sep 4, 2017 via email

@zerothi
Copy link
Contributor Author

zerothi commented Sep 4, 2017

So you suggest that it works as expected? I.e. upcasting is an accepted side-effect?

My expectations would be that of numpy (where the above thing works as expected, cf is always np.float32).

@pv
Copy link
Member

pv commented Sep 4, 2017 via email

@matthew-brett
Copy link
Contributor

csr_array etc sounds like an excellent way forwards ...

@zerothi
Copy link
Contributor Author

zerothi commented Sep 4, 2017

I agree. This sounds like a better game-plan.

In my own code I had to implement a replacement for csr_matrix, i.e. dim == 3 where the first 2 are the sparse matrix and the last any number to bypass some of the problems I had with the csr_matrix. However, it has other shortcomings... :(

@denis-bz
Copy link

It would be nice if += could do the equivalent of

def inc( X, Y, c=1. ):
""" X += c * Y, X Y sparse or dense
NB not tested on all combinations of: ndarray () (1,) (1,1), numpy matrix, sparse matrix
"""
if (not hasattr( X, "indices" ) # dense += sparse
and hasattr( Y, "indices" )):
# inc an ndarray view, because ndarry += sparse -> matrix --
X = getattr( X, "A", X ).squeeze()
X[Y.indices] += c * Y.data
else:
X += c * Y # sparse + different sparse: SparseEfficiencyWarning
return X

(A matrix of all "X op Y" combinations that work / don't work / upcast
would I think help users and implementers too.)

@zerothi
Copy link
Contributor Author

zerothi commented Nov 8, 2018

I have just refound this bug (time forgets).

I think the proper solution (which may break backward compatibility) would be to raise NotImplementedError when users try to use in-place operations.

I.e. a user may expect:

import numpy as np
from scipy.sparse import csr_matrix, diags
def func(A, b):
    A += b
A = csr_matrix((10, 10))
A[1, 2] = 1
b = diags(np.arange(10))
func(A, b)
print(A)

this will yield:

  (1, 2)	1.0

it should however yield:

 (1, 2)	1.0
  (1, 1)	1.0
  (1, 2)	1.0
  (2, 2)	2.0
  (3, 3)	3.0
  (4, 4)	4.0
  (5, 5)	5.0
  (6, 6)	6.0
  (7, 7)	7.0
  (8, 8)	8.0
  (9, 9)	9.0

The above func will work correctly for numpy arrays.

PS. for others, my current fix is this (which only works for CSC/CSR matrices):

def func(A, b):
    AA = A + b
    # Restore data in the A array
    A.indices = AA.indices
    A.indptr = AA.indptr
    A.data = AA.data
    del AA

@zerothi zerothi changed the title BUG: Inplace additions of sparse matrices changes data-type BUG: sparse in-place operations is not in-place (change data-type and object) Nov 8, 2018
@rgommers
Copy link
Member

I don't think we want to break backwards compat at this point. At most maybe a warning if a user uses an in-place operation and gets a different dtype back (if this can be checked without costing too much performance).

pydata/sparse#146 adds support for in-place operations to pydata/sparse. I don't see an explicit test for preserving dtype though, and the implementation is similarly a convenience rather than a real in-place operation. So it may suffer from the same issue. @hameerabbasi do you know?

@rgommers rgommers added the needs-decision Items that need further discussion before they are merged or closed label Nov 11, 2018
@hameerabbasi
Copy link
Contributor

hameerabbasi commented Nov 11, 2018

Hi! There’s no real way that I know of to do in-place addition on Sparse arrays. The reason is that the size of the arrays may change so there’s no real way to do it.

That said, PyData/Sparse does “faux in-place” operations like @pv described, mainly for compatibility with NumPy.

The way it’s laid out currently, we do not preserve the dtype, but preserving that is a few line change at the very most. Since PyData/Sparse prefers NumPy compatibility over all else, doing this is certainly desired for us.

@ev-br
Copy link
Member

ev-br commented Nov 11, 2018

It's certainly possible to do in-place binops on some formats of sparse arrays. E.g. DOK-type arrays, https://github.com/ev-br/sparr/blob/master/sparr/sp_map.h#L312

@zerothi
Copy link
Contributor Author

zerothi commented Nov 11, 2018

@hameerabbasi I don't see a problem with doing in-place additions since the problem is that the object retaining the data should be the same, see e.g. my func which is basically __iadd__. I see that you cannot ensure the sparsity pattern to be the same, but that is fine.

@hameerabbasi
Copy link
Contributor

Yes, the object remains the same, but all the arrays and memory inside it change. That's why I call them "faux in-place" operations.

As for maintaining the dtype, I've added an issue pydata/sparse#205 and a PR pydata/sparse#206 to fix it and bring behavior in line with NumPy.

@zerothi
Copy link
Contributor Author

zerothi commented Nov 11, 2018

Ok, I see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-decision Items that need further discussion before they are merged or closed scipy.sparse
Projects
None yet
Development

No branches or pull requests

8 participants