Skip to content

Conversation

ngoldbaum
Copy link
Member

@ngoldbaum ngoldbaum commented Feb 1, 2023

This adds __setstate__ and __reduce__ implementations for StringDType and ASCIIDType.

Currently pickling StringDType arrays is broken. It will work within a single python session so long as the original array still exists, but won't work between python sessions. To get it to work we'll need a hook in the DType API to serialize the string data.

It turns out if we add the NPY_LIST_PICKLE flag to new instances everything "just works" along with the changes to StringScalar. I could have added e.g. __setstate__ and __reduce__ implementations to StringScalar and kept dtype as an attribute, but decided it's better just to simplify StringScalar's API and not store the dtype at all. There's no need to do so since StringDType isn't parametric.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if you can't slash this all down and just rely on stable API for the class, otherwise should be fine (didn't scan for refcount leaks)


PyTuple_SET_ITEM(state, 0, PyLong_FromLong(PICKLE_VERSION));

PyTuple_SET_ITEM(ret, 2, state);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think:

Py_BuildValue("O(l)", Py_TYPE(self) self->size,self->size)

is good enough? presumably we can consider ASCIIDType(length) to be stable API? And you made the type pickable using the copyreg, so no need for the reconstruct function? Plus, if we have a reconstruction function, then we certainly can assume stable API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you didn't do the copyreg for the ASCIIDtype, I think. But maybe that is nicer? For the StringDType, this seems true even more so? We can assume StringDType() will always give the same thing, so no need for __setstate__ logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not using copyreg at all in either dtype. It is a good point though that we don't actually need to save the dtype itself at all though, thanks for the hint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misread... somehow I thought one of the reconstructs was to make the class work. So I guess you need the reconstruct helper until we fix the class pickling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly right. I originally had it as you suggested, but that didn't work because ASCIIDType is an instance of DTypeMeta, so python checks the type, sees it's in the copyreg, and dutifully calls the pickler numpy registers for the type.

}

Py_RETURN_NONE;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced we need it, and I would try to avoid it :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying we don't need to save a pickle version? I thought that was generally useful for forward compatibility, just in case we ever want to change what gets pickled in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am saying there is no point in __setstate__.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then how do I check the pickle version when I load the pickle?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if you want a version, then it is not possible. But, you don't need a version if you assume your API is stable, and if you have a loader function, you can keep the old one around also?
Anyway, not a strong opinion, the ndarray.__setstate__ in NumPy is just so messy, that it probably makes me want to not use it if possible :).

@ngoldbaum
Copy link
Member Author

Oops, thought the nightly wheels had gotten updated but it looks like I'll still need to wait a bit before I can merge this.

pickling arrays is currently broken and needs support in numpy
to work
Since StringDType doesn't take any parameters there's no
need to save the dtype along with the scalar instance. We
can always create a new StringDType instance on-the-fly.
@seberg
Copy link
Member

seberg commented Feb 6, 2023

Personally would still avoid setstate (since I feel StringDType() and ASCIDType(3) can both be considered stable API), but fair.
I just manually retrigger the wheel build, it runs only twice a week by default.

@ngoldbaum
Copy link
Member Author

I agree they probably won’t change, I’d just prefer to have the flexibility just in case.

@ngoldbaum ngoldbaum merged commit 254696b into numpy:main Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants