Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Ego graphs #475

Closed
cj2001 opened this issue Aug 29, 2019 · 8 comments · Fixed by #1365
Closed

[FEA] Ego graphs #475

cj2001 opened this issue Aug 29, 2019 · 8 comments · Fixed by #1365
Assignees
Labels
feature request New feature or request
Milestone

Comments

@cj2001
Copy link

cj2001 commented Aug 29, 2019

Is your feature request related to a problem? Please describe.
It would be very helpful if cugraph could have the ability to calculate the ego graph to a provided radius, as is done in networkx via https://networkx.github.io/documentation/stable/reference/generated/networkx.generators.ego.ego_graph.html?highlight=ego_graph#networkx.generators.ego.ego_graph.

Describe the solution you'd like
Given a graph, the user would provide a series of required and optional values, just like networkx.generators.ego.ego_graph. From there, the method returns a subgraph of neighbors, centered at the given node, within a given radius.

Describe alternatives you've considered
This can be done in networkx but is rather slow, particularly if you want to do it for multiple target nodes. This would lend itself well to parallelization.

Task List
Unknown.

Additional context
None

@cj2001 cj2001 added ? - Needs Triage Need team to review and classify feature request New feature or request labels Aug 29, 2019
@pgera
Copy link
Contributor

pgera commented Sep 3, 2019

If you only need the vertices in the resultant subgraph, you can currently use SSSP's output and filter any distance ranges in cudf. That does only out-neighbours though. If you need in-neighbours as well, you would need the extra step of running SSSP on the transpose graph. If the graph is unweighted, you can replace SSSP with BFS.

@BradReesWork
Copy link
Member

Great request. We will get this added to our schedule.
A work-around would be to run BFS from the egonode, filter down to just the list of 1-hop neighbors. Then use that list, plus the edge node, to perform a Subgraph Extraction.

Since I have a personal bias towards ego-based community detection, getting ego-graphs is something that is on the roadmap.

@cj2001
Copy link
Author

cj2001 commented Sep 3, 2019

Great to hear that others might be interested in this feature! I have something that works just via numpy arrays out to a radius of 2. But sometimes you just need to be able to go beyond that, a la 6 Degrees of Kevin Bacon and want to be able to select in-neighbors, out-, or both.

@inc0
Copy link

inc0 commented Sep 3, 2019

A work-around would be to run BFS from the egonode, filter down to just the list of 1-hop neighbors. Then use that list, plus the edge node, to perform a Subgraph Extraction.

This would work, but (unless I'm mistaken) would perform a lot of unnecessary computation if it searches deeper than our ego goal.

Could we add something like max_depth argument to BFS? That + returning all searched steps would, in effect, give us ego graph.

@BradReesWork BradReesWork added this to P2 in v0.11 Release Sep 3, 2019
@BradReesWork BradReesWork removed the ? - Needs Triage Need team to review and classify label Sep 3, 2019
@BradReesWork BradReesWork removed this from P2 in v0.11 Release Sep 23, 2019
@BradReesWork
Copy link
Member

Added to release 0.11 for design dicussion

@BradReesWork BradReesWork added this to Needs Prioritization in v0.11 Release Sep 26, 2019
@BradReesWork BradReesWork removed this from Needs Prioritization in v0.11 Release Oct 30, 2019
@BradReesWork BradReesWork self-assigned this Nov 25, 2019
@BradReesWork BradReesWork moved this from P2 to Needs Prioritization in v0.12 Release Dec 2, 2019
@BradReesWork BradReesWork removed this from Needs Prioritization in v0.12 Release Jan 13, 2020
@BradReesWork BradReesWork added this to P0 in v0.13 Release via automation Jan 13, 2020
@BradReesWork BradReesWork moved this from P0 to Needs Prioritization in v0.13 Release Jan 15, 2020
@BradReesWork BradReesWork removed this from Needs Prioritization in v0.13 Release Jan 22, 2020
@geoHeil
Copy link

geoHeil commented Mar 13, 2020

Has there been any progress? I would also love to see this feature!

In case it is helpful for someone else:

karate = pd.read_csv('karate.csv', header=None, sep=' ')
karate.columns = ['src', 'dst', 'weights']
karate = cudf.from_pandas(karate)

G_karate = cugraph.DiGraph() 
G_karate.from_cudf_edgelist(karate, source='src', destination='dst', edge_attr='weights', renumber=False)

# let's construct the EGO network of a user.
ego_id = 1
ego_network_level = 2
cudf_res = cugraph.bfs(G_karate, ego_id)
ego_candidates = cudf_res[cudf_res.distance <= ego_network_level].vertex
subG = cugraph.subgraph(G_karate, ego_candidates)

print("Subgraph")
print("\tNumber of Vertices: " + str(subG.number_of_vertices()))
print("\tNumber of Edges:    " + str(subG.number_of_edges()))


edges_g_sub = subG.view_edge_list()
edges_g_sub.head()

@geoHeil
Copy link

geoHeil commented Mar 15, 2020

But: in the example above I get the expected output i.e. that the ego_id node is part of the output. However, for a directed graph (DiGraph):

ego_network_level = 2
cudf_res = cugraph.bfs(G, ego_id)
ego_candidates = cudf_res[cudf_res.distance <= ego_network_level].vertex

## it is still here
print(ego_candidates[ego_candidates.index == ego_id])

subG = cugraph.subgraph(G, ego_candidates)

print("Subgraph")
print("\tNumber of Vertices: " + str(subG.number_of_vertices()))
print("\tNumber of Edges:    " + str(subG.number_of_edges()))

edges_g_sub = subG.view_edge_list()

# but thy is it empty now?
edges_g_sub[(edges_g_sub.src == ego_id) | (edges_g_sub.dst == ego_id)]

an empty set is returned when querying for the ego_id vertex. What is wrong here?

cugraph.subgraph(G, cudf.Series([ego_id])).view_edge_list()

also returns an empty set even though:
image
there obviously are some edges for this ego node.

@BradReesWork BradReesWork added this to P0 in v0.15 Release Apr 2, 2020
@BradReesWork BradReesWork added this to the 0.15 milestone Apr 2, 2020
@geoHeil
Copy link

geoHeil commented Apr 3, 2020

here is an example code.
I had to manually filter the edges - the subgraph operation somehow is NOT including the original ego node, even though it would be included in the candidate list.

karate = pd.read_csv('karate.csv', header=None, sep=' ')
karate.columns = ['src', 'dst', 'weights']
# display(karate.head())
karate = cudf.from_pandas(karate)

G_karate = cugraph.DiGraph() 
G_karate.from_cudf_edgelist(karate, source='src', destination='dst', edge_attr='weights', renumber=False)


# let's construct the EGO network of a user.
ego_id = 1

def make_ego_csv(level):
    cudf_res = cugraph.bfs(G_karate, ego_id)
    ego_candidates = cudf_res[cudf_res.distance <= level].vertex
    print(ego_candidates.shape)
    sub_edges_g = karate[(karate.src.isin(ego_candidates) | karate.dst.isin(ego_candidates))]
    sub_edges_g.columns = ['Source', 'Target', 'Weight']
    print(sub_edges_g.shape)
    if level > 1:
        sub_edges_g.drop(['Weight']).to_csv(f"ego_{level}_graph.csv", index=False)
    else:
        sub_edges_g.to_csv(f"ego_{level}_graph.csv", index=False)
        
    print('**********')

sub_edges_g = make_ego_csv(0)
make_ego_csv(1)
make_ego_csv(2)

@BradReesWork BradReesWork moved this from P0 to P2 in v0.15 Release Apr 28, 2020
@BradReesWork BradReesWork modified the milestones: 0.15, 0.16 Jul 24, 2020
@BradReesWork BradReesWork removed this from P2 in v0.15 Release Jul 24, 2020
@BradReesWork BradReesWork removed this from the 0.16 milestone Sep 22, 2020
@BradReesWork BradReesWork added this to the 0.18 milestone Sep 22, 2020
@BradReesWork BradReesWork removed this from P2 in v0.16 Release Sep 22, 2020
@BradReesWork BradReesWork added this to Needs prioritizing in Feature Planning via automation Sep 22, 2020
@BradReesWork BradReesWork moved this from Needs prioritizing to Future release in Feature Planning Sep 22, 2020
@afender afender mentioned this issue Jan 29, 2021
@rapids-bot rapids-bot bot closed this as completed in #1365 Feb 4, 2021
Feature Planning automation moved this from Future release to Closed Feb 4, 2021
v0.18 Release automation moved this from P0 to Done Feb 4, 2021
rapids-bot bot pushed a commit that referenced this issue Feb 4, 2021
### Description
Let the egonet graph of a node x be the subgraph that includes the neighborhood of x and all edges between them. Here is a basic description (1-hop, single seed) :
- Add center node x to the graph.
- Go through all the neighbors y of this center node x, add edge (x, y) to the graph. 
- For each neighbor y of center node x, go through all the neighbors z of center node x, if there is an edge between y and z in the original graph, add edge (y, z) to our new graph. 

### Proposed solution
Rather than doing custom one/two hops features, we propose a generic k-hops solution leveraging BFS with cutoff to identify neighbors within a given radius. 

In addition to the single source version (matching what's available in Nx), we propose to handle multiple sources (seeds) at once which allows better performances.

This PR also enables a path in the experimental stack for returning multiple graphs (edge list format) from CUDA prims to python without using the legacy classes.

As future work, we want to enable concurrency for the cutoff BFS for each seed. This is dependent of #957 

Close #475

Authors:
  - Alex Fender (@afender)
  - @Iroy30

Approvers:
  - @Iroy30
  - Brad Rees (@BradReesWork)

URL: #1365
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

6 participants