Best practices for vectorising GraphBLAS #298

rciric · 2022-10-21T01:16:54Z

rciric
Oct 21, 2022

Hello all, and thanks for the incredible work.

The problem

I have several use cases for GraphBLAS that involve either multi-channel graphs or batched graphs, where the sparsity pattern is the same across a vector of observations. For instance, this occurs when we have multiple data channels associated with each edge in a fixed mesh or grid, or when we have a batch of graphs whose connectivity structure is identical but whose weights are different.

I’m not sure if there’s an “out-of-the-box” or recommended way of using python-graphblas for this class of problem, but the most direct approach (expectedly) fails with an error like 'plus_times does not work with FP32[100]' (for batch dim 100 and float32 datatype). To address these use cases, I’m thus using the UDT utilities of python-graphblas to create vector datatypes and define operators over those types. The SuiteSparse user guide seems to advise against this (the bit about generic kernels in Performance, section 14.2), but I’ve found we still get substantially improved performance relative to the most naive loop this way, provided that we can amortise the compilation cost (i.e., provided that the “batch” or “channel” dimension remains the same across subsequent calls).

The current solution (not great)

Directly registering new operators with the is_udt=True flag fails for my case, for several reasons related to C pointer datatypes. (I can provide more details if this problem is of interest to others. There are some basic notes in the linked gist below, but it's not a pretty piece of code.)

I’ve thus created some very hackish patches of _get_udt_wrapper and the _compile_udt methods of operator classes that allow the vectorisation to work in principle (at least for the small subset of cases I’ve tested). But these implementations are very brittle — they’re defined in terms of python-graphblas internals rather than the public API, and I presume the internals could be subject to change at any time. (In fact, their location has changed since I installed python-graphblas.)

Is there a better way of doing this that I’m missing, which doesn’t rely on patching the internals? Is this use case something that python-graphblas supports in principle? Any suggestions would be welcome. Thanks!

Asides:

I also found that SelectOps aren’t compatible with my UDTs because SelectOp doesn’t implement a _compile_udt method. Not sure if this is intentional.
I was unable to get a custom IndexUnaryOp working without overriding find_opclass, since that function isn’t subclass-friendly. There were a other few places (e.g., _build methods) where I needed to define patched classes because of strict type checks that didn’t accommodate subclasses.

eriknw · 2022-10-21T04:21:05Z

eriknw
Oct 21, 2022
Maintainer

Hi @rciric! Thanks for the questions.

Wow, I am beyond impressed. This is a challenging part of the codebase. What you want to do makes sense to me, and I suspect you're right that it would give good performance for UDTs of this size. Using UDTs like this would be the "GraphBLAS way" to do things IMHO.

I added UDTs to python-graphblas earlier this year in #177, and, as you have seen, the job isn't quite done. Some things don't work yet. At the time, I couldn't figure out how to have generic UDFs return a UDT, which, uh, might be important.

Thanks so much for the gist ❤️ . Looks great and surprisingly readable at a glance (beauty is in the eye of the beholder ;) ). I will take a closer look at your code ASAP, and then we can decide how we can get the functionality into python-graphblas. I really, really want vectorized or "batched" UDFs over array UDTs to work in python-graphblas. Are you willing to work together to get this in?

In my final comment #177 (comment), I call out the possibility of default operators to "just work" on array UDTs, such as plus(A | B) or plus_times(A @ B) or A + 1 where the dtypes are e.g. FP32[100]. If we can get vectorized UDFs to work, then I think it would be straightforward to make operators super-nice to use on UDTs.

Feel free to share any more details in this thread that you think may be useful. I'll read it and try to use it.

Do you know of any shortcomings to your current solution? Are you blocked by anything, and how urgent are things to get unblocked or improved? We try to be responsive to user needs.

But these implementations are very brittle — they’re defined in terms of python-graphblas internals rather than the public API, and I presume the internals could be subject to change at any time. (In fact, their location has changed since I installed python-graphblas.)

Sorry about that! We try to give a deprecation cycle of at least six months for breaking changes if possible. Moving things to graphblas.core should actually help us improve stability in the future, and we won't move things again without very good reason. graphblas.core is now considered part of the public API for advanced users such as yourself.

Takeaways and actions

Yes, this is something python-graphblas supports in principle
I will review your code
Let's plan next steps
- Join us on discord (#graphblas in "community packages"): https://discord.com/invite/vur45CbwMz
  - This is brand new and part of the broader "scientific-python" community
- or join us on slack: https://join.slack.com/t/thegraphblas/shared_invite/zt-1iatqh6eq-39HFogw1U8kKRKo2YQOFzQ
  - Not very active except for pinging people who are late for meetings :-P
- and/or join our community call: COMMUNITY: Regular Python-graphblas community conference calls (Open to all) #247
  - if this is too early, @jim22k and I were just discussing moving some of the calls to the afternoon
  - I'm also curious to hear about your use cases
- and/or keep replying to this thread :)
Please raise issues for your "asides" (and any other issues you find); I think these were oversights

Thanks again @rciric, and welcome to our community 🎉

0 replies

eriknw · 2022-10-22T03:45:31Z

eriknw
Oct 22, 2022
Maintainer

To clarify, this is an example workflow that I want to "just work" in python-graphblas:

>>> import graphblas as gb
>>> A = gb.Matrix("int[3]", 2, 2)
>>> A << (1, 2, 3)
>>> A
"M_0"      nvals  nrows  ncols     dtype       format
gb.Matrix      4      2      2  INT64[3]  fullr (iso)
-----------------------------------------------------
           0          1
0  [1, 2, 3]  [1, 2, 3]
1  [1, 2, 3]  [1, 2, 3]

>>> B = gb.Matrix("int[3]", 2, 2)
>>> B << (2, 3, 4)
>>> B
"M_1"      nvals  nrows  ncols     dtype       format
gb.Matrix      4      2      2  INT64[3]  fullr (iso)
-----------------------------------------------------
           0          1
0  [2, 3, 4]  [2, 3, 4]
1  [2, 3, 4]  [2, 3, 4]

>>> (A @ B).new()

"M_2"      nvals  nrows  ncols     dtype       format
gb.Matrix      4      2      2  INT64[3]  fullr (iso)
-----------------------------------------------------
             0            1
0  [4, 12, 24]  [4, 12, 24]
1  [4, 12, 24]  [4, 12, 24]

The final expression will be the most difficult to do, but this is my target end-state.

The earlier expressions need these PRs-- #299 and #300 --which make using UDTs extra-nice imho.

0 replies

rciric · 2022-10-22T05:40:40Z

rciric
Oct 22, 2022
Author

Hello, and thanks for being so responsive!

It’s great to hear you’ve planned on supporting this functionality — I would definitely be happy to work with you all to get a solution (particularly for the last case above) into the code base. I can also open issues for some of the specific troublespots I mentioned. There are a couple of other projects I’m working on simultaneously, so I will try to work with you all toward a preliminary PR in the next couple weeks, but no promises yet ☺️. And thanks for the invites — I’ll sign up for the Discord / Slack and see if I can perhaps make the community meeting as well!

IMO the biggest deficiencies in my implementation, relative to the above and your #177 comment, are (1) that I’ve only hard-coded solutions for one elementary operator of each type, and (2) that the implementation defines and registers all new ops as part of the creation of the vector/array UDT, which is done explicitly. I’ll say a little bit about each of these issues below, and my working model of how they might be fixed.

On first pass, I suspect that the right thing to do here might be to retrieve a numpy primitive function corresponding to each operator from numpy namespaces like gb.binary.numpy._np (using _graphblas_to_numpy dictionaries, when those are implemented), and then pass the matched numpy function to a vectorisation map like the ones that I wrote in the gist, thereby returning a vector UDF. This can be done iteratively over all operators when a new vector/array UDT is instantiated, or better…
If I'm understanding correctly, in line with your previous comment when introducing UDTs and your new post here, a better solution might define each vector UDF automatically at call time, as a typed implementation of an existing operator (rather than as a new operator). So in summary: when plus is called on a previously unregistered vector datatype, rather than throw an error, the numpy primitive corresponding to plus is retrieved from the gb.binary.numpy._np submodule. Then, the vectorisation map for binary operators is called to transform the primitive and compile it, returning the vectorised plus operator for that array type. This vectorised operator is then registered as a typed implementation of plus, so that it can be dispatched automatically the next time the same vector/array datatype is encountered.

Or, perhaps there is a better way to do this (e.g., that wouldn't require a separate JIT for each new array shape)?

Preliminarily, I suspect another source of difficulty integrating my changes would likely stem from reconciling some of the patches with existing code. Off the top of my head, there are the patched op classes, Batched~, which I pretty much introduced to accommodate explicit return type specification for UDFs over vector UDTs. I made this decision because it would get me to a working solution fast, but there might be a better way that extends your Numba implementation of return type inference. That said, another option might be allowing an explicit return type for custom ops as a temporary workaround.

There’s definitely no urgency/blocking here — there are plenty of other features I’m also experimenting with for my use case, etc. (To summarise briefly without going off-topic, the application is something DL-adjacent but not quite DL — I can share more details at the meeting / on the chat, and maybe we can see whether / how it might feasibly fit into the GraphBLAS ecosystem.)

Thanks again, and happy to contribute however I can!

1 reply

eriknw Oct 24, 2022
Maintainer

That's exactly right. We'll use _graphblas_to_numpy to map builtin operators to UDFs, and we should JIT compile the UDFs on-demand at call time. We should be pragmatic and try to get something working, then we can iterate and try to make it better (and I suspect we'll need to compile for each new UDT including shape). I think allowing explicit input and output dtypes is something we should add to our UDF operators anyway. If we need to require this for certain kinds of UDFs on UDTs, then so be it.

Thanks for your thoughtfulness and willingness to contribute. We're in no particular rush on this feature other than I'm eager for it :). I'm also curious to hear how your application might fit into the GraphBLAS ecosystem. If you want or need anything from us, please ask. Cheers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices for vectorising GraphBLAS #298

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Best practices for vectorising GraphBLAS #298

rciric Oct 21, 2022

The problem

The current solution (not great)

Asides:

Replies: 3 comments · 1 reply

eriknw Oct 21, 2022 Maintainer

Takeaways and actions

eriknw Oct 22, 2022 Maintainer

rciric Oct 22, 2022 Author

eriknw Oct 24, 2022 Maintainer

rciric
Oct 21, 2022

Replies: 3 comments 1 reply

eriknw
Oct 21, 2022
Maintainer

eriknw
Oct 22, 2022
Maintainer

rciric
Oct 22, 2022
Author

eriknw Oct 24, 2022
Maintainer