-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initializing arrays with subarray dtype #17511
Comments
Never mind. This behaviour is more ingrained than I thought, so will be hard to change, I think. Let me just close this. |
Well, one issue is that In that case, I think we would have to tag on a |
Actually, let me re-open, as I got only more and more confused: Why is it OK that
|
I guess if the semantics are really that the shape of a single subarray gets appended to the shape of the data, then maybe it is best to just try to be more consistent in that? |
Yes, the semantics of appending the dimension is well defined. The problem is the sematics of assigning the data to the array. If we assign after "appending" the dtype shape into the array shape, you basically have incorrect broadcasting. This currently works fine if you include tuples in a nested list (the new code has more difficulties with it). But when there is an array-like (or nested array-likes), the problem is that:
becomes:
Which means that For arbitrary input, the shape can be discovered with The new code currently has slight issues with that. The old code basically just worked in that case, as long as the tuples had the right "depth" (number of dimensions), since Now for the desired behaviour: Arguably, the correct behaviour is this:
Which I can do for the tuple case, but behaves different from the (arguably broken) case array case. I am happy to try that, we just have to make a choice to:
In either case, we internally need the (Hopefully this made sense.) |
From the purely selfish astropy perspective, it would be nice if More generally, it would seem better to move further towards the correct behaviour you describe. I think warning on any array-like input might make sense - it would seem better to restrict for those rathre than include warning for cases like the example in #17173, which seems unambiguous. But I obviously haven't really thought this through... Separately, it would be nice if there were an option to get the dtype one actually asked for, even if it were a subarray (though I guess one can always fake it with things like |
OK, still good to know that this affects projects like astropy, have to put this on my pre 1.20 todo list... Giving a warning may for array-likes may be slightly annoying, but we could gamble that it is fine to give it pretty indiscriminately and a bit imprecise (i.e. no matter if the result is currently an error and will start working, or will change behaviour). The main issue is that I do not see how to opt in to the new behaviour. I agree the list of tuples case is clear, and (hopefully) well defined. The way it works in 1.19, is not the way I say above, but that probably ends up doing the same thing (or so I hope). I don't really feel like going into the depth of actually allowing such dtypes to be attached, you can get far with a structure with a single element, but specifically for coercion from lists of tuples that has different behaviour (expects one extra nesting level in the tuple). |
A warning just for sub-array dtype and just array-likes or non-simple tuples would at least be less intrusive. Agreed that it is hard to see an opt-in path forward apart from a special dtype. And perhaps it also matters less than I thought: I realize that I was wrong about it being so hard to get the right |
This currently appends the subarray dtype dimensions first and then tries to assign to the result array which uses incorrect broadcasting (broadcasting against the subarray dimensions instead of repeating each element according to the subarray dimensions). This also fixes the python scalar pathway `np.array(2, dtype="(2)f4,")` which previously only filled the first value. I consider that a clear bug fix. Closes numpygh-17511
#17419, which deprecated the use of structured arrays composed of just a subarray, is causing a few problems in astropy (see astropy/astropy#10815). While these can be fixed with the suggestion made by @seberg, one of the fixes is not very nice: In our case, it turns out all problems are with
np.array([], dtype=subarray)
, which means we have to special-case an empty list.But I wonder if it doesn't make sense to try to move towards the case where
always returns an array with the requested dtype, and for
data=[]
it would be equivalent tonp.empty(0, dtype)
. Certainly, it is strange that the first of the following works but the second does not:Of course, I realize it is not easy to move from arguably mistaken behaviour to more logical one, but looking at our problem cases at least, I realize that in those we basically assumed the more logical one (i.e., they were bugs that somehow failed to get exposed)...
Concretely, I wonder if one could warn and move to more logical behaviour in one go (or give the option to use the more logical behaviour or so).
The text was updated successfully, but these errors were encountered: