Add user-defined types #177

eriknw · 2022-03-15T22:39:33Z

UDTs can be any fixed-length numpy type that doesn't have Python objects.
Basically, UDFs that operate on the UDTs need to be able to be compiled with @numba.njit.

Still lots to do, but many things are already working!

UDTs can be *any* fixed-length numpy type that doesn't have Python objects. Basically, UDFs that operate on the UDTs need to be able to be compiled with `@numba.njit`.

Also: - make dtypes hashable - change operator attributes to use dtypes, not dtype names - update `==` between UDTs (ignore bytes where appropriate)

Some usage of `lookup_dtype` is not unnecessary now that ops use dtypes instead of dtype names. Also, using `get_typed_op` is preferred over `unify` where possible.

eriknw · 2022-04-09T18:32:02Z

Whew!

Okay, I think this is pretty much done. Tests and coverage are as solid as I can reasonably make them.
Nevertheless, a PR this large and invasive is likely to have a bug or oversight or three.
I also don't know how it'll impact dask-grblas (CC @ParticularMiner).

Final thoughts and future possibilities:

We could allow default operators to work on array dtypes of normal dtypes (such as np.dtype("(3,)uint8"))
- For example, plus(A | A) could add subarrays together
- This could also leverage normal NumPy broadcasting rules for the subarrays (i.e., numba's default)
(in)equality checking of UDTs could probably be faster
- array dtypes could probably be specialized
- should be benchmarked to determine what is best
NumPy string dtypes (such as np.dtype("S5")) are not tested (but I played around with them in my preliminary experiments)
Should we require dtypes to be registered? Can things "just work" if given a numpy dtype?
- In this PR, I kept to the standard register_new and register_anonymout pattern we use elsewhere.
Ideally, we should document UDTs better in docstrings and online documentation. (We'll get there someday, but I'm wiped!)

This PR opens up a whole universe of possibilities. Record dtypes and array dtypes support different use cases, and both can be super-important.

Anyway, I think this PR should be merged ASAP, but feedback is welcome for any who are courageous enough to review it (CC @jim22k). Although I think things are pretty good, we can always smooth any rough edges we find once we actually implement workloads that needs UDTs.

eriknw added 16 commits March 15, 2022 17:37

WIP: begin adding user-defined types

8cfd869

UDTs can be *any* fixed-length numpy type that doesn't have Python objects. Basically, UDFs that operate on the UDTs need to be able to be compiled with `@numba.njit`.

UDFs on UDTs! Also, get return type of UDFs from numba.

c3268a5

Add recipes for first, any, eq, etc. on UDTs

b03daea

Merge branch 'main' into udts

1956bc8

Merge branch 'main' into udts

0279bfe

Test UDTs

1e3317f

Also: - make dtypes hashable - change operator attributes to use dtypes, not dtype names - update `==` between UDTs (ignore bytes where appropriate)

Allow positional operators to work on UDTs; also, coverage

9e40381

Aggregators on UDT

77433c4

Merge branch 'main' into udts

758473f

Clean up usage of lookup_dtype and unify

0b898ab

Some usage of `lookup_dtype` is not unnecessary now that ops use dtypes instead of dtype names. Also, using `get_typed_op` is preferred over `unify` where possible.

Merge branch 'main' into udts

6414b69

A little bit of progress

6e92f4f

Fix failing test

37caac4

Merge branch 'main' into udts

12b9daa

Get UDFs to work on array UDTs

6eeabfa

Coverage

c105b92

eriknw changed the title ~~WIP: begin adding user-defined types~~ Add user-defined types Apr 8, 2022

eriknw added 3 commits April 8, 2022 22:06

More coverage; better handling of array subdtypes

1ce8421

More coverage. Anything else needed?

c9e8772

A few finishing touches

21f483b

eriknw merged commit 8e9acb9 into python-graphblas:main Apr 10, 2022

eriknw mentioned this pull request Apr 10, 2022

ENH: User-defined types #153

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add user-defined types #177

Add user-defined types #177

eriknw commented Mar 15, 2022

eriknw commented Apr 9, 2022 •

edited

Add user-defined types #177

Add user-defined types #177

Conversation

eriknw commented Mar 15, 2022

eriknw commented Apr 9, 2022 • edited

eriknw commented Apr 9, 2022 •

edited