New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design an approach for vertex and edge masking #2218
Comments
This issue has been labeled |
Summarizing the design details we've discussed so far so we can close the issue: This design will introduce an optional (preferably immutable) vertex and edge mask object (currently calling Tagging @ChuckHastings @BradReesWork @seunghwak and @rlratzel just to make sure this correctly summarizes what we've discussed so far. I've purposefully withheld the justification of the approach just to keep the summary of the planned approach short and sweet. |
In response to some initial questions from @seunghwak :
I plan to address that piece after being able to demonstrate the graph view propagating through an algorithm end-to-end and it looks like
I think maybe that will require us to define when a user would update the mask values so I can get a better understanding of what this flow might look like. I've been under the impression the mask would be immutable once set and tend to prefer immutable objects. Maybe this is just a lack of context on my part, though. I've been thinking that once a set of vertices and/or edges are selected for the mask, they wouldn't change throughout the execution of an algorithm / graph view instance. It certainly makes life easier anyway if this is the case, but I'd like to learn more. |
Here are some real use cases.
So for triangle counting, self-loops should be excluded and vertices that are not part of 2-core can't be a vertex in triangles. We don't have a masking support and we're currently creating a new graph object and this wastes both computing and memory. With masking support, we can mask out self-loops (with edge masking) and vertices that are not part of 2-core (with vertex masking). In this case, we may do something like the following.
and
|
So, if we want to support mask updates through a graph_view member function, I am more inclined to having a mask data as part of graph_view_t object's state. This better hides details related to masking support (e.g. using bit masks versus using one byte per vertex or edge; and we should use bit masks). |
Thanks @seunghwak, this gives me more context. I've been under the impression this masking task is primarily going to be applied once when the graph is created (eg in Python layer to mask vertices/edges of a particular type) and propagated down through the primitives layer without modification. I figured the more deterministic masking within the algorithms layer (eg ignoring self loops) were going to be done implicitly within the algorithms themselves. For example, we actually extract (copy) the lower and upper triangular matrices in LAGraph/GraphBLAS and, depending on the algorithm, use the original undirected graph or one of the triangles, as the mask for the matrix multiply. If we want to support algorithms updating the masks then I can see value in having the Though I admit I'm still a little unclear on the implementation details here- for example if the mask isn't copied when copying a graph view, at some point we would probably need to remove part of a mask but not the whole thing, right? I need to think about this a little more. In the meantime do you have other examples of wanting to use a mask in the algorithms layer? I think I'm still going back and forth as to whether the mask on the view should be mutable. |
If we want to mask out certain vertex types, this can work with the above setting as well.
Or we're getting a list of vertex IDs to be masked out (or a cuDF.series object of boolean flags),
Masking out specific edges might be tricky under the current implementation, but once we get better edge ID support, we might be able to pass edge IDs for masking. Yeah... and I agree that this masking will be applied after a graph is created. |
My current thought is more like the following.
So the mask is just a Yeah... and one thing bothering me is a concept of "a view object holding a memory block". This is a bit counter-intuitive... But the graph_t object holding a memory block for mask bars using multiple masked-out graphs with a just a single original graph object. We may consider storing masks in a separate object but the object is inherently tied with a specific graph_view_t object, so this also doesn't sound ideal... |
I do agree, I feel it's not so bad if the mask itself is its own object, but still, it owns that object and that object owns a block of memory. What I'm still struggling with is this:
After triangle counting is done, for correctness, we would expect the mask in the resulting graph view will only have the original edge mask (created by the user), right? This is why I'm struggling with the mutability / immutability of the underlying mask. If the mask was immutable, each time we added to it, we could get a new view with the resulting mask in a separate block of memory. Of course, that could get expensive though. |
So, if a const view object is passed to triangle counting, we need to create a new view object inside the triangle counting, so changes inside triangle counting will be invisible to the caller. If a non-const view object is passed to triangle counting, that implies that the view object will possibly modified, so the changes should be visible to the caller. |
Yeah... so I think one key issue is whether we should consider a mask as a temporary object or something that can last. If it is something temporary, having a separate mask object outside the graph_view_t class may make more sense. But this will make the mask object inconsistent once we started to add graph update functions (like adding/deleting vertices & edges). |
And one may consider copying a view object is cheap... but if this involves deep copying a mask block... I guess this can be surprising (which means BAD). |
Yeah... and I think it is better to consider a mask object as something temporary (so we may not expect this to survive graph updates). We have edge_partition_src|dst_property_t and this will also become invalid after graph updates. We may treat masks in a similar way. Then, my thought is to have a separate memory holding mask object and add something like |
Closing this. The work will continue in a prototyping activity during 22.08. |
Starts work for EPIC #2104
We need a design for how we are going to handle vertex and edge masking. The design should sketch out:
The text was updated successfully, but these errors were encountered: