-
-
Notifications
You must be signed in to change notification settings - Fork 9
Add support for pickling ASCIIDtype and StringDType instances #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if you can't slash this all down and just rely on stable API for the class, otherwise should be fine (didn't scan for refcount leaks)
|
||
PyTuple_SET_ITEM(state, 0, PyLong_FromLong(PICKLE_VERSION)); | ||
|
||
PyTuple_SET_ITEM(ret, 2, state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think:
Py_BuildValue("O(l)", Py_TYPE(self) self->size,self->size)
is good enough? presumably we can consider ASCIIDType(length)
to be stable API? And you made the type pickable using the copyreg, so no need for the reconstruct function? Plus, if we have a reconstruction function, then we certainly can assume stable API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you didn't do the copyreg for the ASCIIDtype, I think. But maybe that is nicer? For the StringDType, this seems true even more so? We can assume StringDType()
will always give the same thing, so no need for __setstate__
logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not using copyreg
at all in either dtype. It is a good point though that we don't actually need to save the dtype itself at all though, thanks for the hint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I misread... somehow I thought one of the reconstructs was to make the class work. So I guess you need the reconstruct helper until we fix the class pickling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly right. I originally had it as you suggested, but that didn't work because ASCIIDType
is an instance of DTypeMeta
, so python checks the type, sees it's in the copyreg
, and dutifully calls the pickler numpy registers for the type.
} | ||
|
||
Py_RETURN_NONE; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not convinced we need it, and I would try to avoid it :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying we don't need to save a pickle version? I thought that was generally useful for forward compatibility, just in case we ever want to change what gets pickled in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am saying there is no point in __setstate__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then how do I check the pickle version when I load the pickle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if you want a version, then it is not possible. But, you don't need a version if you assume your API is stable, and if you have a loader function, you can keep the old one around also?
Anyway, not a strong opinion, the ndarray.__setstate__
in NumPy is just so messy, that it probably makes me want to not use it if possible :).
Oops, thought the nightly wheels had gotten updated but it looks like I'll still need to wait a bit before I can merge this. |
pickling arrays is currently broken and needs support in numpy to work
Since StringDType doesn't take any parameters there's no need to save the dtype along with the scalar instance. We can always create a new StringDType instance on-the-fly.
79948e2
to
73c8f64
Compare
Personally would still avoid setstate (since I feel |
I agree they probably won’t change, I’d just prefer to have the flexibility just in case. |
This adds
__setstate__
and__reduce__
implementations forStringDType
andASCIIDType
.Currently picklingStringDType
arrays is broken. It will work within a single python session so long as the original array still exists, but won't work between python sessions. To get it to work we'll need a hook in the DType API to serialize the string data.It turns out if we add the
NPY_LIST_PICKLE
flag to new instances everything "just works" along with the changes toStringScalar
. I could have added e.g.__setstate__
and__reduce__
implementations toStringScalar
and keptdtype
as an attribute, but decided it's better just to simplifyStringScalar
's API and not store thedtype
at all. There's no need to do so sinceStringDType
isn't parametric.