Design an approach for vertex and edge masking #2218

ChuckHastings · 2022-04-11T16:31:05Z

Starts work for EPIC #2104

We need a design for how we are going to handle vertex and edge masking. The design should sketch out:

Our implementation strategy
Define the API for masking
Define how the primitives will be adapted to support this feature
Define a roadmap for implementation (we

github-actions · 2022-05-11T17:15:43Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

cjnolet · 2022-05-24T17:40:24Z

Summarizing the design details we've discussed so far so we can close the issue:

This design will introduce an optional (preferably immutable) vertex and edge mask object (currently calling graph_mask_t but we should discuss this name further) that can be added to the graph view and propagated through algorithms, restricting the set of vertices or edges in subsequent output graphs and computations without forcing a copy or the inducing of a subgraph. Keeping this object on the graph_view_t object causes it to be propagated all the way to the underlying graph primitives.

Tagging @ChuckHastings @BradReesWork @seunghwak and @rlratzel just to make sure this correctly summarizes what we've discussed so far. I've purposefully withheld the justification of the approach just to keep the summary of the planned approach short and sweet.

cjnolet · 2022-05-24T18:07:17Z

In response to some initial questions from @seunghwak :

Who is responsible for owning the memory of the graph_mask_t?

I plan to address that piece after being able to demonstrate the graph view propagating through an algorithm end-to-end and it looks like uniform neighborhood sampling is the top priority on that list right now. My thinking is that the graph mask object itself would be an immutable view object which would accept pointers and thus whomever constructs the graph view at the top-most layer (most often this would be Python doing a graph query or loading from disk? maybe that's a bad assumption?) would be responsible for the memory in the mask. That said, I see benefit in optimizations such as using a bit-level mask (rather than byte-level as it is today.) and somehow encapsulating the construciton of the mask behind helpers or factory functions.

How users will update the mask values?

I think maybe that will require us to define when a user would update the mask values so I can get a better understanding of what this flow might look like. I've been under the impression the mask would be immutable once set and tend to prefer immutable objects. Maybe this is just a lack of context on my part, though. I've been thinking that once a set of vertices and/or edges are selected for the mask, they wouldn't change throughout the execution of an algorithm / graph view instance. It certainly makes life easier anyway if this is the case, but I'd like to learn more.

seunghwak · 2022-05-24T21:19:31Z

How users will update the mask values?

Here are some real use cases.

cugraph/cpp/src/community/triangle_count_impl.cuh

Line 181 in 4c0531d

// 2. Exclude self-loops (FIXME: better mask-out once we add masking support).

cugraph/cpp/src/community/triangle_count_impl.cuh

Line 215 in 4c0531d

    
           // 3. Find 2-core and exclude edges that do not belong to 2-core (FIXME: better mask-out once we

So for triangle counting, self-loops should be excluded and vertices that are not part of 2-core can't be a vertex in triangles.

We don't have a masking support and we're currently creating a new graph object and this wastes both computing and memory. With masking support, we can mask out self-loops (with edge masking) and vertices that are not part of 2-core (with vertex masking).

In this case, we may do something like the following.

graph_view.mask_edges([]__device__(vertex_t src, vertex_t dst, auto src_val, auto dst_val) {
  return src == dst ? true : false; // mask-out self-loops
});

and

graph_view.mask_vertices([core_numbers = core_numbers.data(), local_vertex_partition_range_first = graph_view.local_vertex_partition_range_first()]__device__(vertex_t v, auto) {
  return core_numbers[v - local_vertex_partition_range_first] < 2 ? true : false; // mask-out vertices that are not part of 2-core.
});

seunghwak · 2022-05-24T21:23:33Z

Who is responsible for owning the memory of the graph_mask_t?

So, if we want to support mask updates through a graph_view member function, I am more inclined to having a mask data as part of graph_view_t object's state. This better hides details related to masking support (e.g. using bit masks versus using one byte per vertex or edge; and we should use bit masks).

cjnolet · 2022-05-24T22:20:58Z

Thanks @seunghwak, this gives me more context.

I've been under the impression this masking task is primarily going to be applied once when the graph is created (eg in Python layer to mask vertices/edges of a particular type) and propagated down through the primitives layer without modification.

I figured the more deterministic masking within the algorithms layer (eg ignoring self loops) were going to be done implicitly within the algorithms themselves. For example, we actually extract (copy) the lower and upper triangular matrices in LAGraph/GraphBLAS and, depending on the algorithm, use the original undirected graph or one of the triangles, as the mask for the matrix multiply.

If we want to support algorithms updating the masks then I can see value in having the graph_mask_t own the underlying memory. Also, as I mentioned earlier, I am hoping to encapsulate as much of the logic as possible for doing the mask lookups so this means it might also make sense to encapsulate updates (assuming the mask is mutable, which I'm still not sure of).

Though I admit I'm still a little unclear on the implementation details here- for example if the mask isn't copied when copying a graph view, at some point we would probably need to remove part of a mask but not the whole thing, right? I need to think about this a little more. In the meantime do you have other examples of wanting to use a mask in the algorithms layer? I think I'm still going back and forth as to whether the mask on the view should be mutable.

seunghwak · 2022-05-24T22:32:04Z

If we want to mask out certain vertex types, this can work with the above setting as well.

graph_view.mask_vertices([vertex_types = vertex_types.data(), local_vertex_partition_range_first = graph_view.local_vertex_partition_range_first(), types_to_be_masked]__device__(vertex_t v, auto) {
  return vertex_types[v - local_vertex_partition_range_first] == types_to_be_masked ? true : false;
});

Or we're getting a list of vertex IDs to be masked out (or a cuDF.series object of boolean flags),

graph_view.mask_vertices([sorted_list_of_vertices_to_be_masked_out, local_vertex_partition_range_first = graph_view.local_vertex_partition_range_first(), types_to_be_masked]__device__(vertex_t v, auto) {
  return thrust::binary_search(thrust::seq, ...)
});

graph_view.mask_vertices([boolean_flags, local_vertex_partition_range_first = graph_view.local_vertex_partition_range_first(), types_to_be_masked]__device__(vertex_t v, auto) {
  return boolean_flags[v - local_vertex_partition_range_first];
});

Masking out specific edges might be tricky under the current implementation, but once we get better edge ID support, we might be able to pass edge IDs for masking.

Yeah... and I agree that this masking will be applied after a graph is created.

seunghwak · 2022-05-24T22:38:31Z

Though I admit I'm still a little unclear on the implementation details here- for example if the mask isn't copied when copying a graph view, at some point we would probably need to remove part of a mask but not the whole thing, right? I need to think about this a little more. In the meantime do you have other examples of wanting to use a mask in the algorithms layer? I think I'm still going back and forth as to whether the mask on the view should be mutable.

My current thought is more like the following.

class graph_view_t {
  std::optional<mask_t> mask{std::nullopt};  // or we may include this in the edge_partition_device_view_t.
};

So the mask is just a nullopt by default. One may call enable_vertex|edge_mask to create a mask (or call disable_vertex|edge_mask to relase memory for mask, by default no vertices/edges are masked out).

Yeah... and one thing bothering me is a concept of "a view object holding a memory block". This is a bit counter-intuitive... But the graph_t object holding a memory block for mask bars using multiple masked-out graphs with a just a single original graph object.

We may consider storing masks in a separate object but the object is inherently tied with a specific graph_view_t object, so this also doesn't sound ideal...

cjnolet · 2022-05-24T22:49:01Z

Yeah... and one thing bothering me is a concept of "a view object holding a memory block". This is a bit counter-intuitive...

I do agree, I feel it's not so bad if the mask itself is its own object, but still, it owns that object and that object owns a block of memory.

What I'm still struggling with is this:

User creates a graph_view with an edge mask
Graph view is propagated to triangle counting algorithm
Triangle counting adds edge mask to ignore self loops

After triangle counting is done, for correctness, we would expect the mask in the resulting graph view will only have the original edge mask (created by the user), right? This is why I'm struggling with the mutability / immutability of the underlying mask. If the mask was immutable, each time we added to it, we could get a new view with the resulting mask in a separate block of memory. Of course, that could get expensive though.

seunghwak · 2022-05-24T22:52:51Z

So, if a const view object is passed to triangle counting, we need to create a new view object inside the triangle counting, so changes inside triangle counting will be invisible to the caller.

If a non-const view object is passed to triangle counting, that implies that the view object will possibly modified, so the changes should be visible to the caller.

seunghwak · 2022-05-24T22:54:30Z

Yeah... so I think one key issue is whether we should consider a mask as a temporary object or something that can last.

If it is something temporary, having a separate mask object outside the graph_view_t class may make more sense.

But this will make the mask object inconsistent once we started to add graph update functions (like adding/deleting vertices & edges).

seunghwak · 2022-05-24T22:55:29Z

And one may consider copying a view object is cheap... but if this involves deep copying a mask block... I guess this can be surprising (which means BAD).

seunghwak · 2022-05-24T23:21:29Z

Yeah... and I think it is better to consider a mask object as something temporary (so we may not expect this to survive graph updates). We have edge_partition_src|dst_property_t and this will also become invalid after graph updates. We may treat masks in a similar way.

Then, my thought is to have a separate memory holding mask object and add something like attach_mask() and clear_mask() to graph_view_t. graph_view_t will just hold a sort-of view object for a mask memory block.

ChuckHastings · 2022-06-01T16:18:05Z

Closing this. The work will continue in a prototyping activity during 22.08.

ChuckHastings added the ? - Needs Triage Need team to review and classify label Apr 11, 2022

ChuckHastings assigned cjnolet Apr 11, 2022

github-actions bot added this to Needs prioritizing in Other Issues Apr 11, 2022

ChuckHastings added 2 - In Progress and removed ? - Needs Triage Need team to review and classify labels Apr 11, 2022

ChuckHastings removed this from Needs prioritizing in Other Issues Apr 11, 2022

ChuckHastings added this to the 22.06 milestone Apr 11, 2022

ChuckHastings added this to Algorithms and Primitives in 22.06 Release Apr 11, 2022

github-actions bot added the inactive-30d label May 11, 2022

github-actions bot removed the inactive-30d label May 24, 2022

ChuckHastings closed this as completed Jun 1, 2022

22.06 Release automation moved this from Algorithms and Primitives to Done Jun 1, 2022

ChuckHastings mentioned this issue Jun 1, 2022

[FEA] Prototype vertex and edge masking approach #2322

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design an approach for vertex and edge masking #2218

Design an approach for vertex and edge masking #2218

ChuckHastings commented Apr 11, 2022

github-actions bot commented May 11, 2022

cjnolet commented May 24, 2022 •

edited

cjnolet commented May 24, 2022 •

edited

seunghwak commented May 24, 2022 •

edited

seunghwak commented May 24, 2022

cjnolet commented May 24, 2022 •

edited

seunghwak commented May 24, 2022

seunghwak commented May 24, 2022 •

edited

cjnolet commented May 24, 2022 •

edited

seunghwak commented May 24, 2022

seunghwak commented May 24, 2022 •

edited

seunghwak commented May 24, 2022

seunghwak commented May 24, 2022 •

edited

ChuckHastings commented Jun 1, 2022

Design an approach for vertex and edge masking #2218

Design an approach for vertex and edge masking #2218

Comments

ChuckHastings commented Apr 11, 2022

github-actions bot commented May 11, 2022

cjnolet commented May 24, 2022 • edited

cjnolet commented May 24, 2022 • edited

seunghwak commented May 24, 2022 • edited

seunghwak commented May 24, 2022

cjnolet commented May 24, 2022 • edited

seunghwak commented May 24, 2022

seunghwak commented May 24, 2022 • edited

cjnolet commented May 24, 2022 • edited

seunghwak commented May 24, 2022

seunghwak commented May 24, 2022 • edited

seunghwak commented May 24, 2022

seunghwak commented May 24, 2022 • edited

ChuckHastings commented Jun 1, 2022

cjnolet commented May 24, 2022 •

edited

cjnolet commented May 24, 2022 •

edited

seunghwak commented May 24, 2022 •

edited

cjnolet commented May 24, 2022 •

edited

seunghwak commented May 24, 2022 •

edited

cjnolet commented May 24, 2022 •

edited

seunghwak commented May 24, 2022 •

edited

seunghwak commented May 24, 2022 •

edited