Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify & unify ak.type() #2096

Closed
agoose77 opened this issue Jan 9, 2023 · 4 comments · Fixed by #2116
Closed

Simplify & unify ak.type() #2096

agoose77 opened this issue Jan 9, 2023 · 4 comments · Fixed by #2116
Assignees
Labels
bug The problem described is something that must be fixed cleanup Not broken, but could be more streamlined

Comments

@agoose77
Copy link
Collaborator

agoose77 commented Jan 9, 2023

As discussed in #2070, ak.type would benefit from accepting a highlevel and behavior argument to control whether a composable ak.types.Type is returned in favour of the outer ak.types.ArrayType.

I also think we should simplify ak.type to deal exclusively with array-like objects. #1840 added support for some more scalars, but now that I think about it, we don't have any way of expressing these at the Awkward level; you can't have a bare scalar in an Awkward type, only a length-1 array. I think ak.type should just deal with arrays / array-like objects, and conventional isinstance() etc can be used for other types.

We can make changes to ak.type non-breaking using ak._util.deprecate and requiring users to explicitly pass highlevel=True/False. Or, we can just declare it in the wiki, and write it in the changelog so that users don't need to write ak.type(..., highlevel=True).

@agoose77 agoose77 added bug The problem described is something that must be fixed cleanup Not broken, but could be more streamlined labels Jan 9, 2023
@agoose77 agoose77 assigned jpivarski and agoose77 and unassigned jpivarski Jan 9, 2023
@agoose77
Copy link
Collaborator Author

agoose77 commented Jan 9, 2023

@jpivarski I started working on this, but realised we should discuss.

@jpivarski
Copy link
Member

Okay.

@jpivarski
Copy link
Member

As a result of this conversation, we settled on:

  • need to add a ScalarType, which is in the same category as ArrayType in that it's a "wine cork" to close off the nested Type structure, but it doesn't have a length
  • in ak.type, highlevel=True implies that the output is either wrapped with ArrayType (because it's a collection, something with a length) or it is wrapped with ScalarType (because it's a single record or a number)
  • the type property of ak.Array wraps with ArrayType, the type property of ak.Record wraps with ScalarType, and the low-level types (type property of Form) don't have wrappers
  • if an array has type string/bytes, which is ListType/RegularType with __array__: "string"/"bytestring" containing a NumpyType with __array__: "char"/"byte", then array[0] is not a scalar: it's an array of char/byte, which is to say NumpyType with __array__: "char"/"byte". The reason we've been carrying around two parameters to represent strings (one on the ListType/RegularType and the other on the NumpyType) was to be able to keep the stringiness of it when it gets unpacked.

In Awkward, strings are not fundamentally scalars. They have some scalar-like behaviors that we add in by hand.

@agoose77
Copy link
Collaborator Author

To add ­— what we're demonstrating with ak.type("string") is that the "string"-ness is a property of the list, not an independent atom. A bare string is just a collection of char. In Python, that corresponds to str more naturally than ["s", "t", "r", "i", "n", "g"].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed cleanup Not broken, but could be more streamlined
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants