Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for units #2468

Open
HealthyPear opened this issue May 23, 2023 · 3 comments
Open

Support for units #2468

HealthyPear opened this issue May 23, 2023 · 3 comments
Labels
feature New feature or request

Comments

@HealthyPear
Copy link

Description of new feature

(Apologies if this is already supported in some way I didn't find it in the docs or in the suggested links when opening the issue)

I am working with a data format in which each event is a row in a table and it's a ragged array.

For context, this is how each event is written to file:

  • some columns relate to the general properties of the event
  • 2 of these columns, say "A" and "B" are integers
  • a second set of columns pertaining to properties related to A
  • a third set of columns pertain to properties related to B

As you can guess:

  • columns of A contain arrays long how much A says and the same for B
  • each event is independent, so a column "A_bla" becomes automatically a ragged array
  • most columns describe physical quantities which have units stored in the header of the file

I am currently opening this file as a dictionary of numpy arrays and building 3 astropy QTables: "event_table", "A_table", "B_table". Of course from there one can play around with masking, filtering, etc...so my (immediate) use case is kind of fixed.

Said this, I cannot stop thinking that one of these files is basically an " awkward table"!

At the same time:

Soooo....what about adding units, maybe compatible with astropy quantities or with a similar object? 😄

@HealthyPear HealthyPear added the feature New feature or request label May 23, 2023
@jpivarski
Copy link
Member

This sounds like a good use-case for #1391. That would propagate units, though it would not convert them.

Or maybe it could be some custom behaviors? https://awkward-array.org/doc/main/reference/ak.behavior.html That would even make it possible to convert appropriately. (The NumpyArray nodes would have two parameters, __array__: "units" and __units__: "light year" and there would be a custom ak.Array subclass with some mathematical operations overloaded, such as np.add—maybe only addition.)

I can try to write a prototype. This may be complex enough that we should build it in, rather than making users implement it—so the new feature would be a new set of behaviors in src/awkward/behaviors. It would be the only __array__ behaviors that are not string-like, and @agoose77 and I were trying to think if there would be anything like that.

@agoose77
Copy link
Collaborator

@gpiert thanks for opening #2788 regarding this!

It's planned that we add support for units through pint. We're working our way through features to get there :)

@jpivarski
Copy link
Member

Cross posted from #2788 (comment):

That was probably in private conversations, then: we're thinking of using a Pint UnitRegistry as a source of truth about units and their relationships, but some of the handling would have to be manual. (For example, we have to implement reducers ourselves. If an array has units, ak.sum would preserve those units but ak.prod shouldn't even be possible. ak.any and ak.all would drop the units when converting numbers into booleans...)

Thus, we're recognizing Pint as the standard way to express units, to the exclusion of any other libraries that might do the same thing, and we'll try to reuse code in Pint as much as possible (e.g. in unit conflicts, which of the two should be converted to the other, and what do we multiply by to get that conversion?), but there will be limits and some things will need to be computed by hand in Awkward.

@jpivarski jpivarski added this to Unprioritized in Finalization Jan 19, 2024
@jpivarski jpivarski moved this from Unprioritized to P2 in Finalization Jan 20, 2024
@jpivarski jpivarski moved this from P2 to Set aside in Finalization May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Finalization
Set aside (don't do)
Development

Successfully merging a pull request may close this issue.

3 participants