New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems converting dask chunked sparse arrays greater size than 256x256 #32

Closed
ljdursi opened this Issue Sep 1, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@ljdursi

ljdursi commented Sep 1, 2017

Presumably related to dask/dask#2586 : Computing a sparse array output from a chunked dask sparse array fails if the resulting sparse array is larger than 256x256 - coord type is uint16, but it looks like maybe intermediate calculations overflow:

import dask.array as da
import sparse


def print_sparse_nonidentity(m, name):
    valid = True
    for i, j, v in zip(m.coords[0], m.coords[1], m.data):
        if not i == j:
            print name, i, j, v
            valid = False
    if valid:
        print "OK"


def identity_da(size, chunksize):
    p = {(i, i): 1. for i in range(size)}

    # single chunk
    a = da.from_array(sparse.COO(p), chunks=(chunksize, chunksize), asarray=False)
    return a


print "Should be identity matrix"
size = 256
print "size = ", size
a = identity_da(256, size//2)
c = a.compute()
print c
print_sparse_nonidentity(c, "256 chunked")

size = 258
print "size = ", size, " unchunked"
a = identity_da(size, size)
c = a.compute()
print c
print_sparse_nonidentity(c, "258 unchunked")

print "size = ", size, " chunked"
a = identity_da(size, size//2)
c = a.compute()
print c
print_sparse_nonidentity(c, "258 chunked")

Running this works for 256x256 matrix chunked, or 258x258 with only one chunk, but 258x258 broken into 4 chunks fails:

$ python fail-conversion.py
Should be identity matrix
size =  256
<COO: shape=(256, 256), dtype=float64, nnz=256, sorted=False, duplicates=False>
OK
size =  258  unchunked
<COO: shape=(258, 258), dtype=float64, nnz=258, sorted=False, duplicates=False>
OK
size =  258  chunked
<COO: shape=(258, 258), dtype=float64, nnz=258, sorted=False, duplicates=False>
258 chunked 256 0 1.0
258 chunked 257 1 1.0

Versions are as follow:

$ python
Python 2.7.13 (default, Dec 18 2016, 07:03:39)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dask
>>> print dask.__version__
0.15.2
>>> import sparse
>>> print sparse.__version__
0.1.1
@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented Dec 24, 2017

Since dask/dask#2586 was closed, it should be okay to close this as well?

@ljdursi

This comment has been minimized.

ljdursi commented Dec 27, 2017

As far as I know, it remains a problem - I referenced dask/dask#2586 only because that was a similar problem.

@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented Dec 31, 2017

Fixed with #51.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment