Create a cast method for Arrays? #291

scott-griffiths · 2023-09-02T09:28:03Z

Changing the dtype of an Array just changes the interpretation of the underlying data. This is fine, and is a O(1) operation which fits with changing a property, but some users might want or expect it to recast the data to the new type.

To cast to a new dtype you need to do this:

a = Array('u8', [1, 2, 3, 4, 5, 6, 7, 8])

b = Array('float64', a.tolist())

which is OK, and explicit, but adding a new method could make it clearer and give more options:

b = a.cast('float64')

I don't think it's good to do it in place - there's no performance gain. We can now also deal with things like overflows better:

c = b.cast('u16', clip=True)

so the user can choose whether to get a ValueError or to clip values or whatever (divide by zero would be another one).

The text was updated successfully, but these errors were encountered:

scott-griffiths · 2023-09-02T20:25:40Z

Probably should be called astype to copy numpy.

The numpy method has a casting parameter which can be one of:

‘no’ means the data types should not be cast at all. [Not sure what the point of this option is!]
‘equiv’ means only byte-order changes are allowed. [Reasonable I guess]
‘safe’ means only casts which can preserve values are allowed. [Only widening casts or unsigned to signed?]
‘same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed. [
‘unsafe’ means any data conversions may be done.

From experimentation, if it doesn't have room to store the full value it simple truncates the binary representation, so for example an int of 2000 becomes an uint8 of 208, which is not exactly obvious or helpful (but admittedly will be fast!)

If you ask for safe casting it just exits with a TypeError.

Maybe our options should be:

clip - values that are too large get clipped to the nearest representable value.
safe - If values can't be preserved a ValueError is raised (but it still tries).

The others are more checks on the dtypes, rather than the data, which the user can easily do themselves. If there are two options that boils down to a flag:

clip: If True out of range values are clipped to the nearest representable value, otherwise a ValueError will be raised. Defaults to False.

Which is back to where we started.

scott-griffiths · 2023-09-02T20:43:25Z

It might be cool to allow the clip to happen as a function call. This would allow it to be used more widely, for example when performing other ops on Arrays. Right now it's hard to add a flag to a y = x*5 command, and y = Array.multiply(x, 5, clip=True) is pretty ugly. Not sure how it actually works in practice though.

a = b*1000   # Throws a ValueError
a = clip(b*1000)    # Magically doesn't and clips instead. Somehow.

Perhaps better would be (b*1000).clip(), but I it's not obvious how it can be implemented.

If we could, the astype would be just c = b.astype('u8').clip()

scott-griffiths · 2023-09-02T20:46:52Z

with Array.Clipping:
    a = b*1000

is perhaps more obvious and easier to actually code.

scott-griffiths · 2023-09-07T20:23:06Z

astype method added in 4.1.2. No alternative casting methods yet, so leaving this open.

scott-griffiths self-assigned this Sep 5, 2023

scott-griffiths added the enhancement ✨ label Sep 5, 2023

scott-griffiths added the Priority-Low label Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a cast method for Arrays? #291

Create a cast method for Arrays? #291

scott-griffiths commented Sep 2, 2023

scott-griffiths commented Sep 2, 2023

scott-griffiths commented Sep 2, 2023

scott-griffiths commented Sep 2, 2023

scott-griffiths commented Sep 7, 2023

Create a cast method for Arrays? #291

Create a cast method for Arrays? #291

Comments

scott-griffiths commented Sep 2, 2023

scott-griffiths commented Sep 2, 2023

scott-griffiths commented Sep 2, 2023

scott-griffiths commented Sep 2, 2023

scott-griffiths commented Sep 7, 2023