Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix broken pickle in MaskedArray when dtype is object (Return list instead of string when dtype is object) #8122

Merged
merged 5 commits into from
Oct 6, 2016

Conversation

raghavrv
Copy link
Contributor

@raghavrv raghavrv commented Oct 5, 2016

Fixes #1940 and #8113

Is this fix correct? Should this be extended to mrecords too?

@pierregm @teoliphant

@amueller
Copy link

amueller commented Oct 5, 2016

I'm pretty sure the days when @teoliphant had time to review code are over ;)

@raghavrv
Copy link
Contributor Author

raghavrv commented Oct 5, 2016

I randomly pinged him as I saw him in the Authors list. :p

@raghavrv
Copy link
Contributor Author

raghavrv commented Oct 5, 2016

Any one of @juliantaylor @rgommers or @seberg can take a look at this maybe? :)

@shoyer
Copy link
Member

shoyer commented Oct 5, 2016

Is it possible to implement this with super(MaskedArray, self).__getstate__ instead? That's probably a more robust solution.

@raghavrv
Copy link
Contributor Author

raghavrv commented Oct 5, 2016

I thought about that but apparently the ndarray does not implement a __getstate__ method... The other option would be to use the __reduce__ and extract the state from that. But it feels hackier. Suggestions?

assert_equal(a_pickled._mask, a._mask)
assert_equal(a_pickled._data, a._data)
if dtype is not object:
assert_equal(a_pickled.fill_value, dtype(999))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason not to test this for object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch... I missed an else here :( Fixed it in the recent commit...

@jnothman
Copy link
Member

jnothman commented Oct 5, 2016

@raghavrv I wouldn't usually ping anyone. Most projects have someone - or a rabble - to do triage on new issues/PRs.

@jnothman
Copy link
Member

jnothman commented Oct 5, 2016

@shoyer wrote:

Is it possible to implement this with super(MaskedArray, self).__getstate__ instead? That's probably a more robust solution.

Would that maintain backwards compatibility like the proposed patch? At least this patch appears to be a good fix, even if a more robust solution is possible at a later point.

@shoyer
Copy link
Member

shoyer commented Oct 6, 2016

The first five elements of MaskedArray.__getstate__ are exactly the state produced by nparray.__reduce__:

In [49]: x = np.ma.MaskedArray([1, 2, 3, 4], [False, True, False, False])

In [50]: x.__getstate__()
Out[50]:
(1,
 (4,),
 dtype('int64'),
 False,
 b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00',
 b'\x00\x01\x00\x00',
 None)

In [51]: _, _, state = super(np.ma.MaskedArray, x).__reduce__()

In [52]: state
Out[52]:
(1,
 (4,),
 dtype('int64'),
 False,
 b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00')

So I think that's what we want to use. It is a little weird to pull it out of __reduce__, but it's actually less code for MaskedArray and makes sense considering the implementation of __setstate__ (which already calls the super method).

@seberg
Copy link
Member

seberg commented Oct 6, 2016

Just to note (not sure if it has much to do), but currently it seems masked arrays do not preserve subclasses during pickling. If you use the data's reduce, I am not sure if there might be an effect.

# self._data.tolist(),
getmaskarray(self).tobytes(cf),
data_state = super(MaskedArray, self).__reduce__()[2]
print(data_state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops sorry!!

getmaskarray(self).tobytes(cf),
data_state = super(MaskedArray, self).__reduce__()[2]
print(data_state)
state = (data_state[0], # version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just do data_state + (getmaskarray(self).tobytes(cf), self._fill_value)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@raghavrv
Copy link
Contributor Author

raghavrv commented Oct 6, 2016

@shoyer @jnothman Have addressed the comments. Let me know if anything else needs to be changed...

self._fill_value,
)
return state
data_state = super(MaskedArray, self).__reduce__()[2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer, does this really make more sense given that we're making assumptions about super.__reduce__'s return value, and discarding all elements but one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty comfortable with this, given that (1) the spec of __reduce__ is mostly prescribed by Python and this MaskedArray is defined in NumPy itself, which means that if any ever changes how ndarray.__reduce__ works our unit tests will catch the issue for MaskedArray.

@charris charris changed the title [MRG] FIX broken pickle in MaskedArray when dtype is object (Return list instead of string when dtype is object) BUG: Fix broken pickle in MaskedArray when dtype is object (Return list instead of string when dtype is object) Oct 6, 2016
@shoyer
Copy link
Member

shoyer commented Oct 6, 2016

This looks good to me. @charris any concerns? If not, let's merge (and mark this one for a potential backport, if we do another bug fix release).

@charris
Copy link
Member

charris commented Oct 6, 2016

@shoyer I'll leave it up to you. The change seems small and nothing breaks, so probably good. I don't plan on any more bug fix releases, I'd actually like to get the first rc of 1.12.0 out this month.

@charris
Copy link
Member

charris commented Oct 6, 2016

Note that super only works for new style classes, which may, or may not, be an issue for Python 2.7.

@raghavrv
Copy link
Contributor Author

raghavrv commented Oct 6, 2016

Do you want it to be ndarray.__reduce__(self) instead?

@charris
Copy link
Member

charris commented Oct 6, 2016

@raghavrv Not to worry, looks like MaskedArray is a new style class

In [4]: str(ma.MaskedArray)
Out[4]: "<class 'numpy.ma.core.MaskedArray'>"

Note that the string begins with <class.

@raghavrv
Copy link
Contributor Author

raghavrv commented Oct 6, 2016

Ah. Sweet then! Thanks for the information!

@shoyer shoyer merged commit 8888a76 into numpy:master Oct 6, 2016
@shoyer
Copy link
Member

shoyer commented Oct 6, 2016

OK, in it goes. Thanks @raghavrv!

@raghavrv raghavrv deleted the masked_array_pickle branch October 6, 2016 21:36
@jnothman
Copy link
Member

jnothman commented Oct 6, 2016

Good work, @raghavrv.

On 7 October 2016 at 08:31, Stephan Hoyer notifications@github.com wrote:

OK, in it goes. Thanks @raghavrv https://github.com/raghavrv!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#8122 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAEz6zGgpJ5uJo4uCS0e66DhVQBUPPgfks5qxWitgaJpZM4KPASm
.

@raghavrv
Copy link
Contributor Author

raghavrv commented Oct 6, 2016

Thanks for the reviews @jnothman @shoyer and @charris :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pickling fails on some masked arrays (Trac #1342)
6 participants