Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a cast method for Arrays? #291

Open
scott-griffiths opened this issue Sep 2, 2023 · 4 comments
Open

Create a cast method for Arrays? #291

scott-griffiths opened this issue Sep 2, 2023 · 4 comments

Comments

@scott-griffiths
Copy link
Owner

Changing the dtype of an Array just changes the interpretation of the underlying data. This is fine, and is a O(1) operation which fits with changing a property, but some users might want or expect it to recast the data to the new type.

To cast to a new dtype you need to do this:

a = Array('u8', [1, 2, 3, 4, 5, 6, 7, 8])

b = Array('float64', a.tolist())

which is OK, and explicit, but adding a new method could make it clearer and give more options:

b = a.cast('float64')

I don't think it's good to do it in place - there's no performance gain. We can now also deal with things like overflows better:

c = b.cast('u16', clip=True)

so the user can choose whether to get a ValueError or to clip values or whatever (divide by zero would be another one).

@scott-griffiths
Copy link
Owner Author

Probably should be called astype to copy numpy.

The numpy method has a casting parameter which can be one of:

‘no’ means the data types should not be cast at all. [Not sure what the point of this option is!]
‘equiv’ means only byte-order changes are allowed. [Reasonable I guess]
‘safe’ means only casts which can preserve values are allowed. [Only widening casts or unsigned to signed?]
‘same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed. [
‘unsafe’ means any data conversions may be done.

From experimentation, if it doesn't have room to store the full value it simple truncates the binary representation, so for example an int of 2000 becomes an uint8 of 208, which is not exactly obvious or helpful (but admittedly will be fast!)

If you ask for safe casting it just exits with a TypeError.

Maybe our options should be:

clip - values that are too large get clipped to the nearest representable value.
safe - If values can't be preserved a ValueError is raised (but it still tries).

The others are more checks on the dtypes, rather than the data, which the user can easily do themselves. If there are two options that boils down to a flag:

clip: If True out of range values are clipped to the nearest representable value, otherwise a ValueError will be raised. Defaults to False.

Which is back to where we started.

@scott-griffiths
Copy link
Owner Author

It might be cool to allow the clip to happen as a function call. This would allow it to be used more widely, for example when performing other ops on Arrays. Right now it's hard to add a flag to a y = x*5 command, and y = Array.multiply(x, 5, clip=True) is pretty ugly. Not sure how it actually works in practice though.

a = b*1000   # Throws a ValueError
a = clip(b*1000)    # Magically doesn't and clips instead. Somehow.

Perhaps better would be (b*1000).clip(), but I it's not obvious how it can be implemented.

If we could, the astype would be just c = b.astype('u8').clip()

@scott-griffiths
Copy link
Owner Author

with Array.Clipping:
    a = b*1000

is perhaps more obvious and easier to actually code.

@scott-griffiths
Copy link
Owner Author

astype method added in 4.1.2. No alternative casting methods yet, so leaving this open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant