-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Reorganize string promotion, add object_fallback=False
#19101
Conversation
…i private) This reorganizes promotion during array-coercion to allow flagging for an object fallback mode. Here only "exposed" through: np.core._multiarray_umath._discover_array_parameters( [1, "2"], object_fallback=True) Note that as of now, it is a bit stricter then previously. The object fallback actually hides fewer errors, not more! (e.g. say promotion fails for two different structured voids, or datetimes). But that would be trivial to allow for now... We could probably thread this flag through by (ab)using the array-flags that `PyArray_FromAny` allows passing.
object_fallback=False
(sem…object_fallback=False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! A small comment is that it might make sense to use npy_bool object_fallback
everywhere instead of switching between int and bool - calls with npy_false
immediately make clear that it is a bool being pass around.
p.s. Not sure whether the coverage misses are false positives, but obviously would be good to ensure the added code is covered!
@@ -35,13 +72,19 @@ | |||
* | |||
* TODO: Before exposure, we should review the return value (e.g. no error | |||
* when no common DType is found). | |||
* Further, the `object_fallback` is probably only useful for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are fairly sure that in some places this will no longer be needed after the string deprecation, maybe be more explicit about it, otherwise in all likelihood this will stay around for much longer than needed!
* when no common DType is found. This is currently only needed for the | ||
* string and number promotion deprecation! | ||
* | ||
* TODO: After this deprecation is over, the `object_fallback` may well be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be best if this could just change from "may well be useless!" to "should be removed"
On this one, Matti suggested in the meeting (if I got it right), that we could make the parameter-discovery function public but not the flag: I.e. the function would always use the object-fallback "future" behaviour. Or we allow this flag in
The alternative, or parallel(?) question is what to do about the deprecation: We could delay the deprecation: I am not too fond of it, since string promotion is bound to create annoying corner cases once we get string ufuncs, but 🤷. Unless we are planning to just skip the deprecation now or in the next release. Maybe as an example of where this oddity can strike:
(Which currently would gives a FutureWarning. Oddly, it seems it would actually work in the "future" path) |
We have a nep about ragged arrays, and in one of the issues or PRs around this something like I prefer that
Would |
Thanks Matti. I am happy with that proposal. The question is really if that will help pandas enough. And I am still wondering if Pandas can just accept passing this warning on to the end-user in a few places.
Luckily, array coercion ignores value based promotion: |
The idea of separate routine that inspects input and returns a But I'm a bit confused about the relation to both this PR and NEP 34, i.e., does anything influence string promotion, and will |
Maybe a bit about the current code layout:
Currently, even if you do
The first discovery function already has a bunch of flags and we could expose those/add more. Adding one to check strictly for lists should be pretty straight forward.
We have an internal flag that says "a tuple is considered element". We can't infer the correct structured dtype really, so if you want a structured dtype, you need to pass
Currently you must write But |
Thanks! Now understanding the path currently taken more, I like @mattip's suggestion even more of exposing the dtype/shape inference function. And that also means we might as well postpone discussion of any |
(This is off topic, about how tuple inference would be expected to work now)
You can already do as much of this as I am comfortable with, probably. But its not quite public API yet. The current design idea is that you create a new (possibly abstract!)
The user must then pass |
That off-topic piece would be very useful! |
object_fallback=False
object_fallback=False
@seberg Is this still a work in progress? |
No, unfortunately not. It relies on the string promotion behaviour, so will collide with that delay/reversal (unless we make that a backport-only). I could expose the function first and not worry about string promotion for now (since its a new function and string promotion warning/error should follow soon anyway). |
@charris ah, sorry, removed the backport candidate tag, if we undo string promotion, there is no real point. |
I will build 1.21.0rc2 next Sunday. If we are going to undo anything it would be good to get it in. |
Going to close this for now. I am thinking of picking this up as two separate PRs (mainly, I would like to do something about our broken promotion...):
In either case, there may be some good stuff here, but I doubt we will put this in very similar to how it is. |
(semi private)
This reorganizes promotion during array-coercion to allow flagging for
an object fallback mode. Here only "exposed" through:
Note that as of now, it is a bit stricter then previously. The object fallback
actually hides fewer errors, not more! (e.g. say promotion fails for two
different structured voids, or datetimes).
But that would be trivial to allow for now...
We could probably thread this flag through by (ab)using the array-flags
that
PyArray_FromAny
allows passing.@jorisvandenbossche I am thinking about something like this to fix the issue. But its fairly annoying, at least unless we aim to make this a full new keyword argument to
np.array
as well (or similar)...@numpy/numpy We had discussed a few times to add something like
np.array(..., dtype="allow_object")
. This is practically that, and I can make it work (the code organization isn't pretty, but most of it is due to the FutureWarning).