Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse array being cast into dense array #16941

Open
kangzhiq opened this issue Jun 1, 2019 · 11 comments

Comments

Projects
None yet
3 participants
@kangzhiq
Copy link
Contributor

commented Jun 1, 2019

This is a overview about all cases related to this issue, which can be refered to while I am opening PRs.

Sparse array has a better performance when it comes to a array whose values are mostly zero. However, there are various cases where a sparse array, explicitly or implicitly, is cast to a dense array. This behavior will not only cause a unnecessary cost of memory but also slow down the code. Thus, the purpose of this issue and the PRs associated is to find all the cases and recontruct them so that, hopefully, an improvement of performance can be brought into Array module.

I will list out all cases that I found below. Please fell free to comment on it.

@kangzhiq

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2019

  • derive_by_array
    A sparse array is not distinguished from NDimArray so it is cast to a dense array.
>>> a = MutableSparseNDimArray.zeros(10000, 20000)
>>> a[1, 1] = i
>>> d = derive_by_array(a, i)
Memory error
@kangzhiq

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2019

  • tensorproduct
    The parameters of this method are cast to a list. So it would slow down the code or every trigger a MemoryError when there are sparse arrays.
>>> a = MutableSparseNDimArray.zeros(10000, 20000)
>>> b = MutableSparseNDimArray.zeros(20000, 10000, 20000, 4000)
>>> tensorproduct(a, b)
MemoryError
  • Foreseeable difficulties

Need to to figure out how to perform a tensor product using the dictionary of index and value inside the sparse array

@kangzhiq

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2019

  • tensorcontraction
    The operation is slow, need to be determined which step cause it.
>>> b = MutableSparseNDimArray.zeros(20000, 10000, 20000, 4000)
>>> tensorcontraction(b, (0, 2))
  • Foreseeable difficulties
    Contraction using the dictionary of index and value
@kangzhiq

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2019

  • permutedims
    A list of None is initiated with the length of the array, which is unnecessary for a sparse array. It only needs to calculate the new indice after permutation.
>>> b = MutableSparseNDimArray.zeros(20000, 10000, 20000, 4000)
>>> permutedims(b, (0, 2))
MemoryError
  • Foreseeable difficulties
    A algorithm to calculate the index after the permutation
@kangzhiq

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2019

  • applyfunc
    The opration is slow. A map is applied to every element of the array. which can be unnecessary for sparse array. In contrast, if the function change zero to other value, e.g. x=1, 3^x, cosx, then it would be inevitable to calculate the value for each term. In addition, it would be faisible to cast it into a dense array since most of values are not zeros any more.
>>> a = MutableSparseNDimArray.zeros(10000, 20000)
>>> a.applyfunc(lambda i: 2*i)
@kangzhiq

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2019

  • transpose
    Basically it is the same case as permutedims because permitedims is being called within transpose

@sylee957 sylee957 added the tensor label Jun 2, 2019

@kangzhiq

This comment has been minimized.

Copy link
Contributor Author

commented Jun 2, 2019

  • ==
    array is cast to list inside __eq__ if the shape is equal
>>> a = MutableSparseNDimArray.zeros(10000, 20000)
>>> b = MutableSparseNDimArray.zeros(10000, 20000)
>>> a == b
MemoryError
  • Foreseeable difficulties
    Overridding __eq__ in sparse array can partially solve the problem. Partially because if it's a dense array who triggers __eq__ with a sparse array, we have to cast the sparse array to dense array.
@Upabjojr

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

This looks like a nice list to start working on.

If you use __eq__ between a sparse and dense array, I fear you have to go through all the elements. Anyway, if one of the two arrays is dense, that means that there is enough memory to represent that kind of size.

@kangzhiq

This comment has been minimized.

Copy link
Contributor Author

commented Jun 4, 2019

@Upabjojr I have solved the issue about __eq__ in #16937. Could you please have a look at it?
I will continue to update this list. :-)

@kangzhiq

This comment has been minimized.

Copy link
Contributor Author

commented Jun 8, 2019

  • __mul__ and __rmul__
    This operation is slow. The mutiplication is cast to a list to store every element of the resulting array, which is unnecessary for sparse array. Because only the no-zero values will be changed by a multiplication.
>>> a= MutableSparseNDimArray.zeros(10000, 20000)
>>> a*3

Here there isn't a MemoryError thank to the lazy-evaluation of iterator in NDimArray. But the operation is significantly slow.

@kangzhiq

This comment has been minimized.

Copy link
Contributor Author

commented Jun 11, 2019

  • __div__ and __neg__
    These operators are like __mul__ whose element is iterated one by one to calculate the value. Since 0/x =0 and -0 = 0, for a sparse array, only the non-zero value need to be calculated.

Just for comparison, __sub__ and __add__ will perform exactly the same for sparse array and dense array because the zero value is change to non-zero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.