Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize node_type #1452

Merged
merged 3 commits into from May 7, 2020
Merged

Vectorize node_type #1452

merged 3 commits into from May 7, 2020

Conversation

kieranricardo
Copy link
Contributor

@kieranricardo kieranricardo commented May 4, 2020

This PR adds a vectorized_node_type function to StellarGraph and uses this in metapath + HinSAGENodeGenerator

PR vs develop for metapath:

------------------------------------------------------------------------------------------------------------ benchmark: 2 tests ------------------------------------------------------------------------------------------------------------
Name (time in us)                                                  Min                   Max                  Mean              StdDev                Median                 IQR            Outliers       OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_uniformrandommetapathwalk (0002_PR)            933.8500 (1.0)      4,189.5890 (1.0)      1,307.5912 (1.0)      233.8784 (1.0)      1,259.2510 (1.0)      166.0735 (1.0)         78;52  764.7650 (1.0)         631           1
test_benchmark_uniformrandommetapathwalk (0001_develop)     1,628.2610 (1.74)     5,468.9340 (1.31)     2,167.2808 (1.66)     380.1008 (1.63)     2,081.3770 (1.65)     300.8260 (1.81)        56;16  461.4077 (0.60)        343           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Running:

from stellargraph.datasets import MovieLens
from stellargraph.mapper import HinSAGENodeGenerator

G, _ = MovieLens().load()
generator = HinSAGENodeGenerator(G, 50, [8, 4], head_node_type="user")
nodes = G.nodes()[1682:]
%timeit -n 3 -r 5 generator.flow(nodes)

For the movielens graphs yields a 400x speedup for HinSAGENodeGenerator.flow:

  • develop: 108 ms ± 1.86 ms
  • this PR: 251 µs ± 26.6 µs

edit: I used timeit and pulled the G.nodes() call out of the timing

@codeclimate
Copy link

codeclimate bot commented May 4, 2020

Code Climate has analyzed commit 17c6372 and detected 2 issues on this pull request.

Here's the issue category breakdown:

Category Count
Security 2

View more on Code Climate.

@kieranricardo kieranricardo changed the title Vecotrize node_type Vectorize node_type May 4, 2020
@kieranricardo kieranricardo marked this pull request as ready for review May 4, 2020 00:45
@kieranricardo kieranricardo requested review from huonw and kjun9 and removed request for huonw May 4, 2020 00:46
Copy link
Member

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the HinSAGE benchmark, could you include the rest of the support code, so that someone else can run it? (Also, if you use %timeit -n 3 -r 5 or similar, instead of %time _ = we'll get a better idea of the variance too.)

@@ -751,6 +751,18 @@ def node_type(self, node, use_ilocs=False):
assert len(type_sequence) == 1
return type_sequence[0]

def vectorized_node_type(self, node_ilocs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about extending node_type to work on an array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt like this was cleaner compared to doubling the code paths in node_type again, especially for something that'll be mostly a fast internal method. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a relatively controlled doubling, because it can likely be done by conditionalising the nodes = [node] and assert ... [0]? In addition, the function is tiny.

My concern is that it's a clunky name (e.g. it's long, we don't use "vectorised" anywhere else in our API, and doesn't sort next to the other node_type methods; the last one can be addressed by node_type_vectorized) and that I could easily see a user using this directly, since bulk queries are always great.

@kieranricardo
Copy link
Contributor Author

using timeit -n 3 -r 5 yields:

  • develop: 108 ms ± 1.86 ms
  • this PR: 251 µs ± 26.6 µs

@kieranricardo kieranricardo requested a review from huonw May 4, 2020 05:17
Copy link
Member

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These APIs are nice and composeable, and it's great how easy some of this stuff is.

@kieranricardo kieranricardo merged commit edcb8d9 into develop May 7, 2020
@kieranricardo kieranricardo deleted the feature/1451-vec-node-types branch May 7, 2020 05:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants