Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design an approach for vertex and edge masking #2218

Closed
ChuckHastings opened this issue Apr 11, 2022 · 14 comments
Closed

Design an approach for vertex and edge masking #2218

ChuckHastings opened this issue Apr 11, 2022 · 14 comments
Assignees
Milestone

Comments

@ChuckHastings
Copy link
Collaborator

Starts work for EPIC #2104

We need a design for how we are going to handle vertex and edge masking. The design should sketch out:

  • Our implementation strategy
  • Define the API for masking
  • Define how the primitives will be adapted to support this feature
  • Define a roadmap for implementation (we
@ChuckHastings ChuckHastings added the ? - Needs Triage Need team to review and classify label Apr 11, 2022
@github-actions github-actions bot added this to Needs prioritizing in Other Issues Apr 11, 2022
@ChuckHastings ChuckHastings added 2 - In Progress and removed ? - Needs Triage Need team to review and classify labels Apr 11, 2022
@ChuckHastings ChuckHastings removed this from Needs prioritizing in Other Issues Apr 11, 2022
@ChuckHastings ChuckHastings added this to the 22.06 milestone Apr 11, 2022
@ChuckHastings ChuckHastings added this to Algorithms and Primitives in 22.06 Release Apr 11, 2022
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@cjnolet
Copy link
Member

cjnolet commented May 24, 2022

Summarizing the design details we've discussed so far so we can close the issue:

This design will introduce an optional (preferably immutable) vertex and edge mask object (currently calling graph_mask_t but we should discuss this name further) that can be added to the graph view and propagated through algorithms, restricting the set of vertices or edges in subsequent output graphs and computations without forcing a copy or the inducing of a subgraph. Keeping this object on the graph_view_t object causes it to be propagated all the way to the underlying graph primitives.

Tagging @ChuckHastings @BradReesWork @seunghwak and @rlratzel just to make sure this correctly summarizes what we've discussed so far. I've purposefully withheld the justification of the approach just to keep the summary of the planned approach short and sweet.

@cjnolet
Copy link
Member

cjnolet commented May 24, 2022

In response to some initial questions from @seunghwak :

  1. Who is responsible for owning the memory of the graph_mask_t?

I plan to address that piece after being able to demonstrate the graph view propagating through an algorithm end-to-end and it looks like uniform neighborhood sampling is the top priority on that list right now. My thinking is that the graph mask object itself would be an immutable view object which would accept pointers and thus whomever constructs the graph view at the top-most layer (most often this would be Python doing a graph query or loading from disk? maybe that's a bad assumption?) would be responsible for the memory in the mask. That said, I see benefit in optimizations such as using a bit-level mask (rather than byte-level as it is today.) and somehow encapsulating the construciton of the mask behind helpers or factory functions.

  1. How users will update the mask values?

I think maybe that will require us to define when a user would update the mask values so I can get a better understanding of what this flow might look like. I've been under the impression the mask would be immutable once set and tend to prefer immutable objects. Maybe this is just a lack of context on my part, though. I've been thinking that once a set of vertices and/or edges are selected for the mask, they wouldn't change throughout the execution of an algorithm / graph view instance. It certainly makes life easier anyway if this is the case, but I'd like to learn more.

@seunghwak
Copy link
Contributor

seunghwak commented May 24, 2022

How users will update the mask values?

Here are some real use cases.

// 2. Exclude self-loops (FIXME: better mask-out once we add masking support).

// 3. Find 2-core and exclude edges that do not belong to 2-core (FIXME: better mask-out once we

So for triangle counting, self-loops should be excluded and vertices that are not part of 2-core can't be a vertex in triangles.

We don't have a masking support and we're currently creating a new graph object and this wastes both computing and memory. With masking support, we can mask out self-loops (with edge masking) and vertices that are not part of 2-core (with vertex masking).

In this case, we may do something like the following.

graph_view.mask_edges([]__device__(vertex_t src, vertex_t dst, auto src_val, auto dst_val) {
  return src == dst ? true : false; // mask-out self-loops
});

and

graph_view.mask_vertices([core_numbers = core_numbers.data(), local_vertex_partition_range_first = graph_view.local_vertex_partition_range_first()]__device__(vertex_t v, auto) {
  return core_numbers[v - local_vertex_partition_range_first] < 2 ? true : false; // mask-out vertices that are not part of 2-core.
});

@seunghwak
Copy link
Contributor

Who is responsible for owning the memory of the graph_mask_t?

So, if we want to support mask updates through a graph_view member function, I am more inclined to having a mask data as part of graph_view_t object's state. This better hides details related to masking support (e.g. using bit masks versus using one byte per vertex or edge; and we should use bit masks).

@cjnolet
Copy link
Member

cjnolet commented May 24, 2022

Thanks @seunghwak, this gives me more context.

I've been under the impression this masking task is primarily going to be applied once when the graph is created (eg in Python layer to mask vertices/edges of a particular type) and propagated down through the primitives layer without modification.

I figured the more deterministic masking within the algorithms layer (eg ignoring self loops) were going to be done implicitly within the algorithms themselves. For example, we actually extract (copy) the lower and upper triangular matrices in LAGraph/GraphBLAS and, depending on the algorithm, use the original undirected graph or one of the triangles, as the mask for the matrix multiply.

If we want to support algorithms updating the masks then I can see value in having the graph_mask_t own the underlying memory. Also, as I mentioned earlier, I am hoping to encapsulate as much of the logic as possible for doing the mask lookups so this means it might also make sense to encapsulate updates (assuming the mask is mutable, which I'm still not sure of).

Though I admit I'm still a little unclear on the implementation details here- for example if the mask isn't copied when copying a graph view, at some point we would probably need to remove part of a mask but not the whole thing, right? I need to think about this a little more. In the meantime do you have other examples of wanting to use a mask in the algorithms layer? I think I'm still going back and forth as to whether the mask on the view should be mutable.

@seunghwak
Copy link
Contributor

If we want to mask out certain vertex types, this can work with the above setting as well.

graph_view.mask_vertices([vertex_types = vertex_types.data(), local_vertex_partition_range_first = graph_view.local_vertex_partition_range_first(), types_to_be_masked]__device__(vertex_t v, auto) {
  return vertex_types[v - local_vertex_partition_range_first] == types_to_be_masked ? true : false;
});

Or we're getting a list of vertex IDs to be masked out (or a cuDF.series object of boolean flags),

graph_view.mask_vertices([sorted_list_of_vertices_to_be_masked_out, local_vertex_partition_range_first = graph_view.local_vertex_partition_range_first(), types_to_be_masked]__device__(vertex_t v, auto) {
  return thrust::binary_search(thrust::seq, ...)
});
graph_view.mask_vertices([boolean_flags, local_vertex_partition_range_first = graph_view.local_vertex_partition_range_first(), types_to_be_masked]__device__(vertex_t v, auto) {
  return boolean_flags[v - local_vertex_partition_range_first];
});

Masking out specific edges might be tricky under the current implementation, but once we get better edge ID support, we might be able to pass edge IDs for masking.

Yeah... and I agree that this masking will be applied after a graph is created.

@seunghwak
Copy link
Contributor

seunghwak commented May 24, 2022

Though I admit I'm still a little unclear on the implementation details here- for example if the mask isn't copied when copying a graph view, at some point we would probably need to remove part of a mask but not the whole thing, right? I need to think about this a little more. In the meantime do you have other examples of wanting to use a mask in the algorithms layer? I think I'm still going back and forth as to whether the mask on the view should be mutable.

My current thought is more like the following.

class graph_view_t {
  std::optional<mask_t> mask{std::nullopt};  // or we may include this in the edge_partition_device_view_t.
};

So the mask is just a nullopt by default. One may call enable_vertex|edge_mask to create a mask (or call disable_vertex|edge_mask to relase memory for mask, by default no vertices/edges are masked out).

Yeah... and one thing bothering me is a concept of "a view object holding a memory block". This is a bit counter-intuitive... But the graph_t object holding a memory block for mask bars using multiple masked-out graphs with a just a single original graph object.

We may consider storing masks in a separate object but the object is inherently tied with a specific graph_view_t object, so this also doesn't sound ideal...

@cjnolet
Copy link
Member

cjnolet commented May 24, 2022

Yeah... and one thing bothering me is a concept of "a view object holding a memory block". This is a bit counter-intuitive...

I do agree, I feel it's not so bad if the mask itself is its own object, but still, it owns that object and that object owns a block of memory.

What I'm still struggling with is this:

  1. User creates a graph_view with an edge mask
  2. Graph view is propagated to triangle counting algorithm
  3. Triangle counting adds edge mask to ignore self loops

After triangle counting is done, for correctness, we would expect the mask in the resulting graph view will only have the original edge mask (created by the user), right? This is why I'm struggling with the mutability / immutability of the underlying mask. If the mask was immutable, each time we added to it, we could get a new view with the resulting mask in a separate block of memory. Of course, that could get expensive though.

@seunghwak
Copy link
Contributor

So, if a const view object is passed to triangle counting, we need to create a new view object inside the triangle counting, so changes inside triangle counting will be invisible to the caller.

If a non-const view object is passed to triangle counting, that implies that the view object will possibly modified, so the changes should be visible to the caller.

@seunghwak
Copy link
Contributor

seunghwak commented May 24, 2022

Yeah... so I think one key issue is whether we should consider a mask as a temporary object or something that can last.

If it is something temporary, having a separate mask object outside the graph_view_t class may make more sense.

But this will make the mask object inconsistent once we started to add graph update functions (like adding/deleting vertices & edges).

@seunghwak
Copy link
Contributor

And one may consider copying a view object is cheap... but if this involves deep copying a mask block... I guess this can be surprising (which means BAD).

@seunghwak
Copy link
Contributor

seunghwak commented May 24, 2022

Yeah... and I think it is better to consider a mask object as something temporary (so we may not expect this to survive graph updates). We have edge_partition_src|dst_property_t and this will also become invalid after graph updates. We may treat masks in a similar way.

Then, my thought is to have a separate memory holding mask object and add something like attach_mask() and clear_mask() to graph_view_t. graph_view_t will just hold a sort-of view object for a mask memory block.

@ChuckHastings
Copy link
Collaborator Author

Closing this. The work will continue in a prototyping activity during 22.08.

22.06 Release automation moved this from Algorithms and Primitives to Done Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
22.06 Release
  
Done
Development

No branches or pull requests

3 participants