Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prefer known to unknown lengths in broadcasting #2561

Merged
merged 8 commits into from
Jul 3, 2023

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Jul 3, 2023

This PR is a first pass over the broadcasting logic to change direction from "unknown lengths are infectious" to "known lengths are infectious".

As we've discussed here previously, typetracer should only fail for ahead-of-time known errors. We defer validation of the unknown data to runtime, via a second pass using a known-data backend. As such, we can rewrite assertions of the form assert unknown_value == x with unknown_value = x, rather than propagating unknown values everywhere.

i.e. operations like

require_equal_lengths(contents)
next_length = unknown_length

become

require_equal_lengths(contents)
next_length = contents[0].length

I think we might have assumed this before; I'm just getting around to changing the code after re-orienting my thinking a while back.

Relatedly, slicing typetracer arrays should assume that the length succeeds, and use the concrete length, for obvious reason.

src/awkward/_broadcasting.py Outdated Show resolved Hide resolved
@agoose77 agoose77 requested a review from jpivarski July 3, 2023 15:17
@agoose77 agoose77 temporarily deployed to docs-preview July 3, 2023 15:22 — with GitHub Actions Inactive
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, "known lengths are infectious" is the right direction: if we try to broadcast length N with length unknown, then the broadcasted result is length N because unknown might be equal to 1. If it's not, we'll find out when a Dask worker tries to actually do it with no unknown lengths.

@agoose77 agoose77 merged commit eaf60bf into main Jul 3, 2023
36 checks passed
@agoose77 agoose77 deleted the agoose77/refactor-broadcasting-lengths branch July 3, 2023 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants