ENH: Make it possible to call .view on object arrays #8514

eric-wieser · 2017-01-21T14:36:35Z

Right now you can do this with primitive types:

>>> np.zeros((2, 3), dtype=np.int).view([('a', np.int,3)])
array([[([0, 0, 0],)],
       [([0, 0, 0],)]], 
      dtype=[('a', '<i4', (3,))])

This adds

>>> np.zeros((2, 3), dtype=object).view([('a', object,3)])
array([[([0, 0, 0],)],
       [([0, 0, 0],)]], 
      dtype=[('a', 'O', (3,))])

Which would previously error

This makes it possible to use np.unique on 2d object arrays

seberg · 2017-01-21T14:39:18Z

Not sure what exactly is in it, but you might want to check #5508 before continuing.

seberg · 2017-01-21T14:41:19Z

Ah, sorry, nvm. I remembered there was sometihng about relaxing view, but that one is not about objects.

eric-wieser · 2017-01-21T14:42:34Z

This is simply aiming to change _view_is_safe, which is coming in a future commit

eric-wieser · 2017-01-21T15:03:24Z

numpy/core/_internal.py

@@ -355,15 +355,37 @@ def _view_is_safe(oldtype, newtype):
        If the new type is incompatible with the old type.

    """
+    from numpy.lib.type_check import find_dtype_offsets


This import feels kinda hacky, but it can't go globally. Should I put this function somewhere else?

eric-wieser · 2017-01-21T15:04:02Z

numpy/core/_internal.py

+    oldoffsets = find_dtype_offsets((oldtype, oldrepeats), object_)
+
+    # everything matches - we're good
+    if len(newoffsets) == len(oldoffsets) and all(newoffsets == oldoffsets):


Is there a better way of doing this check? Converting to list?

Useful for finding the location of custom dtype objects, or np.object_ members to improve the behaviour of .view

…n objects This makes unique work on object arrays

ahaldane · 2017-01-22T19:03:44Z

You can also look at #5548, which was an attempt to do something similar. However it was largely reverted in #6562 because it turned out to be too big a performance hit to look up all the object positions, the way I implemented it. I welcome an attempt to fix this, btw.

In #5548, the methods _get_all_field_offsets is probably similar to your find_dtype_offsets, and I also implemented _view_is_safe quite similarly to how you do here.

There are also tests there I wrote which may be useful here.

eric-wieser · 2017-01-23T02:26:26Z

However it was largely reverted in #6562 because it turned out to be too big a performance hit to look up all the object positions

What even is a "performance hit" when the result previous to that patch was an exception? Is it simply the lack of the hasobject short circuit present here that reverted it?

ahaldane · 2017-01-23T07:36:11Z

The problem was due to "hidden" objects, as I discussed with an example in the first comment in #5548. The possibility of hidden objects means I needed to do a computationally expensive overlap check even if the HASOBJECTS flag is false. In #6208 I tried to limit the cases in which the overlap needs to be checked, but it wasn't enough, see the function _may_have_objects there.

As I vaguely recall, when the performance problem became a big enough issue I realized that the other changes I had made over the course of those PRs actually fixed most of the issues I had originally indended to fix, without needing object views. Therefore I was perfectly happy to simply disable object views and remove the safety checks.

I think it would be great to re-enable object views, though. I think it would fix a number of open issues. It's very likely there are fixes to the hidden object problems I encountered which I didn't explore properly, and some fresh eyes on the problem are very welcome.

Maybe a solution is to make the HASOBJECTS flag better reflect whether the array contains objects....? I'm not sure, because that would probably also mess with the reference counting.

eric-wieser · 2017-01-23T11:08:15Z

So right now, this is a somewhat more restrictive version of your PR, as it doesn't allow object members to be hidden in a view, whereas yours just stopped members being reinserted - but as a result, presumably comes at less of a performance cost.

Your issue was the extra time in executing _getfield_is_safe, I assume? Am I right in thinking I am OK to leave that untouched if aiming to simply fix views?

ahaldane · 2017-01-23T16:05:04Z

That's true, a more restrictive version could avoid the problems.

And yes, the performance cost was in _getfield_is_safe, but particularly in the bye-by-byte checks I did which you might not need in the restricted version.

Also, I think a perfomance hit is acceptable as long as it only affects object arrays. My problem was it affected non-object structured arrays too.

eric-wieser · 2017-01-23T16:19:53Z

@ahaldane I'll go ahead and copy the tests from that pull request then.

Would you prefer me to make find_dtype_offsets not part of the public API, as you did? Or change it back to your variant that returns unfiltered fields? There's definitely a performance boost for not wasting time finding the offsets of (np.float, (64, 64, 64)) when all you care about is object arrays

Such as find_dtype_offsets( (float, (1024, 1024, 1024), object )

Adds back tests for object array views. Marks the ones that deliberately fail with knownfailif, for now

eric-wieser · 2017-01-23T17:07:00Z

@ahaldane: Ok, your tests are copied across

ahaldane · 2017-01-23T17:33:46Z

I don't have a strong opinion about where find_dtype_offsets should go, if you think it should be public that's fine with me.

I needed the non-object fields to do byte-overlap checks; if you don't need them it sounds good to optimize them out.

eric-wieser · 2017-02-19T16:18:51Z

@ahaldane: So do you think this needs further changes?

ahaldane · 2017-02-19T17:29:44Z

Oh I didn't realize it was finished.. I'm taking a look now.

ahaldane · 2017-02-19T19:06:13Z

numpy/core/_internal.py

-    return
+    # no object members is fine
+    if not newtype.hasobject and not oldtype.hasobject:
+        return


This is the part I am still thinking about, becuase of the problem of "hidden objects".

It's true that currently, I don't think there is a way to create hidden objects because most operations involving objects are disallowed or make copies, but we want to relax those restrictions.

We will have a problem when trying to solve #5994, as I attempt in my PR #6053. That is, once we allow multi-field views like:

>>> a = np.zeros(5, dtype='O,O,O') >>> b = a[['f0','f2']] >>> b.dtype dtype({'names':['f0','f2'], 'formats':['O','O'], 'offsets':[0,16], 'itemsize':24})

then b will have a hidden object at byte 8. That will be true of any implementation of #5994, besides #6053. (Currently the second line returns a copy instead of a view, thus avoiding hidden objects, but that is what we want to change).

I am trying to figure out the easiest way to get both this PR and multi-field views to interact safely. (this may involve setting limitations on the plans in #5994).

Note how this PR would allow this view, using b from my example above:

>>> b.view('O,p,O') >>> b['f1'] = 0 # segfault?

Perhaps the solution is to disallow multi-field views if it would create hidden objects? That also seems a little wonky. I need to think a bit more about it.

I don't think you meant to comment on this line, because it matches the old behaviour here.

I think I see the problem now. It doesn't exist yet, right, but would be a consequence of allowing partial views.

But as it stands, this pr doesn't allow the creation of those partial views in the first place. I believe this pr allows a strict subset of the things a complete partial-view implementation would allow

Although I guess the result would be that all this gets thrown out when partial views do make it

Perhaps the best solution is just to store a reference to the original dtype for partial views, or simply a list of hidden object offsets?

What about this:

First, I think we only need some relatively minor updates (in a future PR, when multi-field views are enabled) so that the hasobject field is guaranteed to be True if the dtype has objects including hidden objects. Thus, whenever taking a view of an array with objects, the view dtype will have hasobject=true even if the objects are hidden.

This is actually often the case as implemented in #6053 - with c = np.zeros(4, dtype="p,O,p")[['f0','f2']], I see that c.dtype.hasobject is True. However, I'm pretty sure I saw other cases where I can get hasobject to be false if there are hidden objects. Those would need to be fixed.

Then, this PR also needs a modification: The view should only be allowed if all the fields (not just object fields) are present at the right position in the view. This way b.view('O,p,O') is disallowed since the p is missing in a. (And if hasobject=False for both inputs, the view is allowed).

I think that change to this PR still allows most of the use-cases you've pointed out.

charris · 2018-09-25T15:42:33Z

@eric-wieser Needs rebase. What do you want to do with this?

eric-wieser · 2019-05-26T18:47:55Z

numpy/lib/type_check.py

+            return (base_offsets[:,None] + sub_offsets).ravel()
+
+    # record type - combine the fields
+    if dtype.fields:


for when i get back to this: Needs is not None, and a test for np.dtype([])

eric-wieser force-pushed the view-compound-object branch from 35f82ef to f0ecabb Compare January 21, 2017 14:42

eric-wieser commented Jan 21, 2017

View reviewed changes

eric-wieser force-pushed the view-compound-object branch 2 times, most recently from 7bc42b9 to a558e1a Compare January 21, 2017 15:17

ENH: add np.find_dtype_offsets

0e6f9ee

Useful for finding the location of custom dtype objects, or np.object_ members to improve the behaviour of .view

eric-wieser changed the title ~~Make it possible to call .view on object arrays~~ ENH: Make it possible to call .view on object arrays Jan 21, 2017

eric-wieser force-pushed the view-compound-object branch from a558e1a to 93ba54b Compare January 21, 2017 15:28

ENH: Allow object arrays to be .view'd, providing object fields remai…

abff3ab

…n objects This makes unique work on object arrays

eric-wieser force-pushed the view-compound-object branch from 93ba54b to abff3ab Compare January 21, 2017 16:24

MAINT: Remove unnecessary try/catch

99ce4b1

eric-wieser mentioned this pull request Jan 21, 2017

Add axis argument to numpy.unique #7742

Merged

charris added 01 - Enhancement component: numpy._core labels Jan 21, 2017

DOC: add find_type_offsets to list of dtype routines

31793a0

MAINT: Optimization for unwanted subdtypes

eb6d0e5

Such as find_dtype_offsets( (float, (1024, 1024, 1024), object )

eric-wieser force-pushed the view-compound-object branch from 8678575 to eb6d0e5 Compare January 23, 2017 16:28

REV, TST: Partially revert 086d42d

38ecaf3

Adds back tests for object array views. Marks the ones that deliberately fail with knownfailif, for now

ahaldane reviewed Feb 19, 2017

View reviewed changes

eric-wieser mentioned this pull request Jul 4, 2017

recfunctions.append_fields fails on arrays containing objects (Trac #1751) #2346

Closed

eric-wieser mentioned this pull request Apr 5, 2019

BUG: Change dtype comparison in _view_is_safe to be independent of name fields. #13261

Closed

eric-wieser mentioned this pull request May 26, 2019

rfn.rename_fields fails if arrays contains objects #13213

Open

eric-wieser commented May 26, 2019

View reviewed changes

Base automatically changed from master to main March 4, 2021 02:03

anntzer mentioned this pull request Aug 17, 2021

PERF: Rely on C-level str conversions in loadtxt for up to 2x speedup #19687

Closed

charris added the 52 - Inactive Pending author response label Apr 6, 2022

charris closed this Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Make it possible to call .view on object arrays #8514

ENH: Make it possible to call .view on object arrays #8514

eric-wieser commented Jan 21, 2017 •

edited

Loading

seberg commented Jan 21, 2017

seberg commented Jan 21, 2017

eric-wieser commented Jan 21, 2017

eric-wieser Jan 21, 2017

eric-wieser Jan 21, 2017

ahaldane commented Jan 22, 2017 •

edited

Loading

eric-wieser commented Jan 23, 2017 •

edited

Loading

ahaldane commented Jan 23, 2017 •

edited

Loading

eric-wieser commented Jan 23, 2017 •

edited

Loading

ahaldane commented Jan 23, 2017 •

edited

Loading

eric-wieser commented Jan 23, 2017

eric-wieser commented Jan 23, 2017

ahaldane commented Jan 23, 2017

eric-wieser commented Feb 19, 2017

ahaldane commented Feb 19, 2017

ahaldane Feb 19, 2017 •

edited

Loading

eric-wieser Feb 19, 2017

eric-wieser Feb 19, 2017

eric-wieser Feb 19, 2017

ahaldane Feb 20, 2017 •

edited

Loading

charris commented Sep 25, 2018

eric-wieser May 26, 2019

ENH: Make it possible to call .view on object arrays #8514

ENH: Make it possible to call .view on object arrays #8514

Conversation

eric-wieser commented Jan 21, 2017 • edited Loading

seberg commented Jan 21, 2017

seberg commented Jan 21, 2017

eric-wieser commented Jan 21, 2017

eric-wieser Jan 21, 2017

Choose a reason for hiding this comment

eric-wieser Jan 21, 2017

Choose a reason for hiding this comment

ahaldane commented Jan 22, 2017 • edited Loading

eric-wieser commented Jan 23, 2017 • edited Loading

ahaldane commented Jan 23, 2017 • edited Loading

eric-wieser commented Jan 23, 2017 • edited Loading

ahaldane commented Jan 23, 2017 • edited Loading

eric-wieser commented Jan 23, 2017

eric-wieser commented Jan 23, 2017

ahaldane commented Jan 23, 2017

eric-wieser commented Feb 19, 2017

ahaldane commented Feb 19, 2017

ahaldane Feb 19, 2017 • edited Loading

Choose a reason for hiding this comment

eric-wieser Feb 19, 2017

Choose a reason for hiding this comment

eric-wieser Feb 19, 2017

Choose a reason for hiding this comment

eric-wieser Feb 19, 2017

Choose a reason for hiding this comment

ahaldane Feb 20, 2017 • edited Loading

Choose a reason for hiding this comment

charris commented Sep 25, 2018

eric-wieser May 26, 2019

Choose a reason for hiding this comment

eric-wieser commented Jan 21, 2017 •

edited

Loading

ahaldane commented Jan 22, 2017 •

edited

Loading

eric-wieser commented Jan 23, 2017 •

edited

Loading

ahaldane commented Jan 23, 2017 •

edited

Loading

eric-wieser commented Jan 23, 2017 •

edited

Loading

ahaldane commented Jan 23, 2017 •

edited

Loading

ahaldane Feb 19, 2017 •

edited

Loading

ahaldane Feb 20, 2017 •

edited

Loading