New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lots of indexing features. #57
Conversation
@nils-werner, your input here would be valuable. |
cc @mrocklin Since you wrote the original |
Thank you for doing this. I look forward to reviewing. Just as a warning,
I'll probably be busy for the next 24 hours or so.
…On Sun, Dec 31, 2017 at 1:16 PM, Hameer Abbasi ***@***.***> wrote:
cc @mrocklin <https://github.com/mrocklin> Since you wrote the original
__getindex__, it'd be helpful if you took a look at this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/mrocklin/sparse/pull/57#issuecomment-354624124>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszEbTrwq79hwVu-7Eh_U2ZY3ldNRjks5tF_mYgaJpZM4RP29e>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Some questions/comments below.
sparse/core.py
Outdated
coords.append(self.coords[i, idx[0]]) | ||
|
||
for i in range(1, np.ndim(data)): | ||
coords.append(idx[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the purpose of this. Can you help explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Custom dtype
s can have multiple dimensions inside a single scalar 'field'. For example, np.dtype([('grades', np.float64, (2, 2))])
will have a (2, 2)
element "inside" each "scalar" element. self.data['grades']
gets a (nnz, 2, 2)
array in this case, but the resulting COO
array has to be self.shape + (2, 2)
. We first use np.where
to get the nonzero indices. Then we append the matching parts of the original coordinates above, we get those from idx[0]
. After this, we get the parts that come from the dimensions of the field itself, and append those to coords
.
Finally, we flatten the data since we're done calculating all the coords
of the resulting array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I see. I had expected the dtype to continue being of shape (2, 2)
but I see that NumPy rolls this into the normal array's shape
In [1]: import numpy as np
In [2]: np.ones((5,), dtype=np.dtype([('grades', np.float64, (2, 2))]))
Out[2]:
array([([[ 1., 1.], [ 1., 1.]],), ([[ 1., 1.], [ 1., 1.]],),
([[ 1., 1.], [ 1., 1.]],), ([[ 1., 1.], [ 1., 1.]],),
([[ 1., 1.], [ 1., 1.]],)],
dtype=[('grades', '<f8', (2, 2))])
In [3]: x = np.ones((5,), dtype=np.dtype([('grades', np.float64, (2, 2))]))
In [4]: x.shape
Out[4]: (5,)
In [5]: x['grades'].shape
Out[5]: (5, 2, 2)
In [6]: x['grades'].dtype
Out[6]: dtype('float64')
sparse/core.py
Outdated
coords = [] | ||
|
||
for i in range(self.ndim): | ||
coords.append(self.coords[i, idx[0]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just self.coords[:, idx[0]]
?
sparse/core.py
Outdated
return self | ||
mask = np.ones(self.nnz, dtype=bool) | ||
mask = np.ones(self.nnz, dtype=np.bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't do anything
In [5]: np.bool is bool
Out[5]: True
In [6]: np.bool_
Out[6]: numpy.bool_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intent was to make it more performant but looks like Numpy treats them the same...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just Numpy, these are exactly the same objects, the is
operator is object identity testing in Pyhton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant np.bool_
and bool
. If you create arrays with both they end up being the same dtype
.
sparse/core.py
Outdated
step = ind.step if ind.step is not None else 1 | ||
if step > 0: | ||
start = ind.start if ind.start is not None else 0 | ||
start = max(start, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if someone does something like x[-10:]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be useful to separate out a normalize_slice
function that reduces slice objects to a canonical form. It looks like there might be some useful functions in dask/array/slicing.py
that might be helpful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The normalize_*
and check_index
functions there might be helpful in particular
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything similar for bool
indexing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a brief look I'm not seeing much.
sparse/core.py
Outdated
for i in range(1, np.ndim(data)): | ||
coords.append(idx[i]) | ||
|
||
return COO(coords, data.flatten(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data.flatten()
call seems odd to me here. This is for struct dtypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it should be data[idx].flatten()
. We have to filter out nonzero entries.
Whoops. Looks like some old build results need to be cleaned. |
It turns out boolean slices were already handled in |
Merging at 20:00 German time if there are no other proposed changes. |
Adds support for:
x[1, 1]
ifx
is 2-D is now scalar.Ellipsis
, e.g.x[1, 1, ...]
ifx
is 2-D is now a()
-shapedCOO
object.None
and1
(even negative steps).dtype
s. So ifx
has a customdtype
,x['field']
is now supported.IndexError
s consistent with Numpy for string and out of range indices.