Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add user-defined types #177

Merged
merged 19 commits into from
Apr 10, 2022
Merged

Add user-defined types #177

merged 19 commits into from
Apr 10, 2022

Conversation

eriknw
Copy link
Member

@eriknw eriknw commented Mar 15, 2022

UDTs can be any fixed-length numpy type that doesn't have Python objects.
Basically, UDFs that operate on the UDTs need to be able to be compiled with @numba.njit.

Still lots to do, but many things are already working!

UDTs can be *any* fixed-length numpy type that doesn't have Python objects.
Basically, UDFs that operate on the UDTs need to be able to be compiled with `@numba.njit`.
Also:
- make dtypes hashable
- change operator attributes to use dtypes, not dtype names
- update `==` between UDTs (ignore bytes where appropriate)
Some usage of `lookup_dtype` is not unnecessary now that ops use dtypes instead of dtype names.
Also, using `get_typed_op` is preferred over `unify` where possible.
@eriknw eriknw changed the title WIP: begin adding user-defined types Add user-defined types Apr 8, 2022
@eriknw
Copy link
Member Author

eriknw commented Apr 9, 2022

Whew!

Okay, I think this is pretty much done. Tests and coverage are as solid as I can reasonably make them.
Nevertheless, a PR this large and invasive is likely to have a bug or oversight or three.
I also don't know how it'll impact dask-grblas (CC @ParticularMiner).

Final thoughts and future possibilities:

  • We could allow default operators to work on array dtypes of normal dtypes (such as np.dtype("(3,)uint8"))
    • For example, plus(A | A) could add subarrays together
    • This could also leverage normal NumPy broadcasting rules for the subarrays (i.e., numba's default)
  • (in)equality checking of UDTs could probably be faster
    • array dtypes could probably be specialized
    • should be benchmarked to determine what is best
  • NumPy string dtypes (such as np.dtype("S5")) are not tested (but I played around with them in my preliminary experiments)
  • Should we require dtypes to be registered? Can things "just work" if given a numpy dtype?
    • In this PR, I kept to the standard register_new and register_anonymout pattern we use elsewhere.
  • Ideally, we should document UDTs better in docstrings and online documentation. (We'll get there someday, but I'm wiped!)

This PR opens up a whole universe of possibilities. Record dtypes and array dtypes support different use cases, and both can be super-important.

Anyway, I think this PR should be merged ASAP, but feedback is welcome for any who are courageous enough to review it (CC @jim22k). Although I think things are pretty good, we can always smooth any rough edges we find once we actually implement workloads that needs UDTs.

@eriknw eriknw merged commit 8e9acb9 into python-graphblas:main Apr 10, 2022
@eriknw eriknw mentioned this pull request Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant