Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add new __list__ parameter for list-based array types #2549

Merged
merged 19 commits into from
Jun 29, 2023

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Jun 29, 2023

This PR adds a new __list__ nominal type parameter to associate with behaviors. It should only be added to lists. In addition to setting an array-class for ListOffsetArray (et al.) via layout.parameter("__list__"), we can also lookup the list in a purelist sense, allowing list-of-type to have an array class too.

Setting __list__ and __record__ an error for non-intended types, but that should be another PR (there are fixes that need to be made to both validity_error and broadcasting).

@codecov
Copy link

codecov bot commented Jun 29, 2023

Codecov Report

Merging #2549 (dcd86b3) into main (1f6041e) will increase coverage by 0.38%.
The diff coverage is 89.06%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/_parameters.py 82.95% <57.14%> (+0.19%) ⬆️
src/awkward/_connect/numpy.py 91.86% <80.00%> (+0.06%) ⬆️
src/awkward/record.py 83.53% <80.00%> (-0.42%) ⬇️
src/awkward/forms/emptyform.py 84.26% <83.33%> (+0.17%) ⬆️
src/awkward/forms/numpyform.py 95.83% <83.33%> (-0.81%) ⬇️
src/awkward/forms/bitmaskedform.py 85.41% <85.71%> (+0.31%) ⬆️
src/awkward/forms/bytemaskedform.py 84.44% <85.71%> (+0.35%) ⬆️
src/awkward/forms/unionform.py 87.50% <85.71%> (+0.26%) ⬆️
src/awkward/forms/unmaskedform.py 84.61% <85.71%> (+0.40%) ⬆️
src/awkward/contents/content.py 76.16% <87.50%> (+0.13%) ⬆️
... and 9 more

... and 12 files with indirect coverage changes

@agoose77 agoose77 temporarily deployed to docs-preview June 29, 2023 11:08 — with GitHub Actions Inactive
@agoose77 agoose77 marked this pull request as ready for review June 29, 2023 11:51
@agoose77 agoose77 temporarily deployed to docs-preview June 29, 2023 12:02 — with GitHub Actions Inactive
@agoose77 agoose77 temporarily deployed to docs-preview June 29, 2023 12:21 — with GitHub Actions Inactive
@agoose77 agoose77 requested a review from jpivarski June 29, 2023 13:49
@agoose77 agoose77 temporarily deployed to docs-preview June 29, 2023 13:57 — with GitHub Actions Inactive
@agoose77
Copy link
Collaborator Author

The addition of the new __list__ nominal type parameter provides the ability to set __typestr__ for a list type. However, unlike records which can sensibly be considered atoms, lists are usually part of a deeper structure that we care about seeing the type of. Having
N * my-list instead of N * my-list * int64 is worse. I wonder if we should think about this before merging this PR.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I like the new documentation examples, too.

Since this PR is more complicated and you might think of something you need to add at the last minute, I'll let you merge it. I think it's done.

docs/reference/ak.behavior.md Outdated Show resolved Hide resolved
@@ -4,6 +4,8 @@

from awkward._typing import JSONMapping, JSONSerializable

TYPE_PARAMETERS = ("__array__", "__list__", "__record__", "__categorical__")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

@agoose77 agoose77 temporarily deployed to docs-preview June 29, 2023 22:12 — with GitHub Actions Inactive
@jpivarski
Copy link
Member

The addition of the new __list__ nominal type parameter provides the ability to set __typestr__ for a list type. However, unlike records which can sensibly be considered atoms, lists are usually part of a deeper structure that we care about seeing the type of. Having N * my-list instead of N * my-list * int64 is worse. I wonder if we should think about this before merging this PR.

Sorry that I missed this question!

A __typestr__ for __list__ would be useful because a common reason to override a list would be to define an opaque binary blob. The developer of that library would want the type of the hidden data to show up as "my-list", without any hint about the contents. So, for instance,

1000 * var * PNG-image

for an array of lists of overloaded bytestrings that are interpreted as PNG images.

The __typestr__ was introduced to make it possible to label strings as string and adhere to Datashape, as part of the goal of making string entirely user-definable. I think that was already gone at the time when Awkward 1 was converted to Awkward 2.

Since then, the __typestr__ was useful for overloaded records, in which the library developer wanted to hide the details of the record structure. (I think this was in some version of Coffea.)

The bytestring use-case (I can't think of any others, involving __list__) would also want to hide the internal structure of the list, for the same reason. In fact, that was also true when __typestr__ was set to "string".

So yes, I think it would be a good addition, but it can be applied in the same way that it is for __record__, which is simpler than trying to let the inner contents through (or parameterize/templatize __typestr__ in some way).

@agoose77 agoose77 merged commit e360fc9 into main Jun 29, 2023
@agoose77 agoose77 deleted the agoose77/feat-list-nominal-type branch June 29, 2023 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants