New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize node_type
#1452
Vectorize node_type
#1452
Conversation
Code Climate has analyzed commit 17c6372 and detected 2 issues on this pull request. Here's the issue category breakdown:
View more on Code Climate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the HinSAGE benchmark, could you include the rest of the support code, so that someone else can run it? (Also, if you use %timeit -n 3 -r 5
or similar, instead of %time _ =
we'll get a better idea of the variance too.)
stellargraph/core/graph.py
Outdated
@@ -751,6 +751,18 @@ def node_type(self, node, use_ilocs=False): | |||
assert len(type_sequence) == 1 | |||
return type_sequence[0] | |||
|
|||
def vectorized_node_type(self, node_ilocs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about extending node_type
to work on an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt like this was cleaner compared to doubling the code paths in node_type
again, especially for something that'll be mostly a fast internal method. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a relatively controlled doubling, because it can likely be done by conditionalising the nodes = [node]
and assert ... [0]
? In addition, the function is tiny.
My concern is that it's a clunky name (e.g. it's long, we don't use "vectorised" anywhere else in our API, and doesn't sort next to the other node_type
methods; the last one can be addressed by node_type_vectorized
) and that I could easily see a user using this directly, since bulk queries are always great.
using
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These APIs are nice and composeable, and it's great how easy some of this stuff is.
This PR adds a
vectorized_node_type
function toStellarGraph
and uses this in metapath + HinSAGENodeGeneratorPR vs develop for metapath:
Running:
For the movielens graphs yields a 400x speedup for
HinSAGENodeGenerator.flow
:edit: I used timeit and pulled the
G.nodes()
call out of the timing