New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CTDNE (time-based random walk) demo #1056
Conversation
Check out this pull request on You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB. |
Code Climate has analyzed commit 22da378 and detected 0 issues on this pull request. View more on Code Climate. |
It'd be good to land this being checked by CI rather than have to add another FIXME. I'd be happy to see the dataset added here or in a separate PR (that lands first). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
Given the results of static and dynamic are so close, do you have a sense for the variance of the metrics like the ROC AUC? As in, does it seem like temporal walks are truly out performing static ones?
" return pos, neg\n", | ||
"\n", | ||
"\n", | ||
"def sample_negative_examples(g, positive_examples):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure you realise this, but it's lucky this is a graph with only 151 nodes, since this function is O(n^2) 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess an alternative would be to just do the random sampling in a while loop and each time check that it doesn't exist in the graph - do you think that would be better?
Looks like our EdgeSplitter._sample_negative_examples_global
has a slightly different approach too, but I thought it was simple/clear enough to just do it in the notebook instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My vote would be to use our code (i.e. EdgeSplitter
) if we can, I think for this code it could be done by creating a graph from edges_other
and then using EdgeSplitter
. However, maybe that's awkward and not worth it...
Oh, and maybe it's not correct to use EdgeSplitter
: it seems like this code considers everything that's outside pos
a negative example, i.e. neg
might include some edges that are in pos_test
. Is this on purpose? Does this match what the paper does, or is this a point where it's too vague to tell? If it doesn't match (whether we know it or not), I wonder if this might explain the dramatic improvement we're seeing over the paper's numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In section 3 of the paper:
The remaining 25% are considered as positive links and
we sample an equal number of negative edges randomly
is the only reference to how this is done. I would expect the scores to improve further if negative edges were forced to not overlap with any of the positives at all? So in this case I'm not sure it explains the improvement in our code.
"source": [ | ||
"temporal_node_embeddings = temporal_model.wv.vectors\n", | ||
"static_node_embeddings = static_model.wv.vectors\n", | ||
"plot_tsne(\"TSNE visualisation of temporal node embeddings\", temporal_node_embeddings)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no labels we can use for these to make the plots coloured, and thus more interpretable (rather than just "oh, yeah, seem grouped")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess I could omit this part, I found it slightly useful to see whether the embeddings "looked informative" on the surface while I was working on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's reasonable to include, since as you say, it gives a hint about whether the embeddings are picking up signal in some form, I'm just wondering if there's a way we can augment them further... it seems not easily, from what's in the dataset, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, the temporal embeddings might cluster together nodes that have edges with similar time values, and separate those with very different time values. I can try to assign colours to nodes based on something like the average time across all its incident edges, but no guarantee that it will actually look meaningful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah. Interesting idea, although maybe too much code to be worth it, unless you think it is going to be enough of an improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not particularly convinced. It kinda seems like similar-age things (assuming that's what the colors mean) are being cluttered better, but hard to say for sure.
"* Node2Vec: 0.759 \n", | ||
"* CTDNE: 0.777\n", | ||
"\n", | ||
"Below are a set of helper functions we can use to train and evaluate a link prediction classifier. The rest of the notebook will use the binary operator defined in the cell below. Available operators in the demo are:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not personally a huge fan of adding unused code to notebooks: it's easy for the code to just be a distraction from the teaching, and it's not tested and so may break. We could delete it, but that's a bit less nice than exploring multiple styles, especially if they give very different results.
Do you think we could at least run the temporal embeddings ROC AUC evaluation with all of them, but maybe still use only L2 for static and the t-SNE plot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the operators ended up causing convergence warnings while training the classifier (although the scores still looked fine) so that's partly why I didn't go through all of them - should I just remove them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reduce it to only two that converge quickly? If we do this, and run them both unconditionally for temporal random walks and mention in text other possibilities (like "any function that combines the features can work, like the absolute (L1) distance between numbers, or the average, or the product (hadamard)"), that might be a good balance between generic (demonstrating how to use any function) without being an overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it would be l1
and l2
- I'll switch to just using both of those unconditionally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unconditional switch ended up looking a lot more verbose and I wasn't such a fan of introducing even more helper functions in the notebook.. so I ended up just going with l2
. The scores end up looking pretty much identical for l1
and l2
too
@@ -0,0 +1,680 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, the AUC for the temporal and static method are almost the same over and above being better than the paper's results.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I tried to figure out for a while whether the way I've set up is cheating in some way but decided to move forward eventually... so let me know if you spot something!
" return pos, neg\n", | ||
"\n", | ||
"\n", | ||
"def sample_negative_examples(g, positive_examples):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My vote would be to use our code (i.e. EdgeSplitter
) if we can, I think for this code it could be done by creating a graph from edges_other
and then using EdgeSplitter
. However, maybe that's awkward and not worth it...
Oh, and maybe it's not correct to use EdgeSplitter
: it seems like this code considers everything that's outside pos
a negative example, i.e. neg
might include some edges that are in pos_test
. Is this on purpose? Does this match what the paper does, or is this a point where it's too vague to tell? If it doesn't match (whether we know it or not), I wonder if this might explain the dramatic improvement we're seeing over the paper's numbers.
"\n", | ||
" plt.figure(figsize=(7, 7))\n", | ||
" plt.title(title)\n", | ||
" if y is not None:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The color choice here is a bit suboptimal because red/green colour-blindness is the most common, based on https://www.color-blindness.com/coblis-color-blindness-simulator/ it seems that this particular choice of red and green is worst for protanopia and deuteranopia which, apparently affect 1% of males (and up to 8% of males of northwestern European descent, apparently https://en.wikipedia.org/wiki/Color_blindness#Classification ), e.g. it's only just distinguishable
One way to solve this would be to choose something other than r
and g
, but I personally think a better way would be to simplify the code, e.g. replace these if
and else
statements with:
alpha = 0.7 if y is None else 0.5
plt.scatter(x_t[:, 0], x_t[:, 1], c=y, cmap="jet", alpha=alpha)
jet
is a poor general purpose colormap, but it's the most common one we use in other demos (i.e. if we're going to change it, we can change it everywhere), and it's ~ok for this binary/categorical plots (it fails for chromatopsia, though). https://matplotlib.org/tutorials/colors/colormaps.html has more options, but using many of the best ones (e.g. viridis) directly ends up with (translucent) yellow-on-white as one of the colours, which is bad for everyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for providing those links! jet
looks very monochrome-y to me.. maybe I could use the colours being used in the plot_history
util?
colors = plt.rcParams["axes.prop_cycle"].by_key()["color"]
color_train = colors[0]
color_validation = colors[1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an option, but I don't know of a good way to specify that conveniently, without a chunk of manual code. I personally think the priority here should be obvious code, meaning we shouldn't have too much user-facing tweaking code, rather than perfect "report ready" output.
If the manual code isn't too bad, or there's another way to get better colours (like using a non-jet
colormap), I'm all for it, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just gone with jet
for now, but it also meant a legend is more necessary since the colours don't match the general expectation (?) of green being true and red being false, so it does end up with slightly more code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Just a few things that stood out while reading the notebook:
Maybe this portion,
It's good to verify that the data structures we've created so far from the raw data look reasonable.
StellarGraph.info is a useful method for inspecting the graph we've created
( along with the accompanying code), can come after creating the Stellargraph object above.
Followed by the positive and negative links methods and results.
And we can also check that the number of link examples correspond to the train/test subset values we defined earlier.
For temporal walks, this is the maximum length of a walk, since walks may end up being shorter when there are not enough valid temporal edges to traverse.
valid temporal edges sounds vague to me, an alternative way of saying it would be "not enough time respecting edges to traverse."
indicate that they are uninformative."
opening quotation mark missing and closing after the full stop.
For the positive and negative link classifier, what would have been visually appealing would be that the positive link embeddings were more compact and the negative can be all over the place as they are randomly generated.
The demo has come together quite nicely!
@@ -0,0 +1,686 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that https://github.com//pull/1033 is merged, this can be slightly simplified potentially, although maybe we want to be sure we've got all the nodes using the main graph? (Also the source_column
and target_column
parameters are being set to their default and so don't seem necessary?)
graph = StellarGraph(edges=edges_graph, edge_weight_column="time")
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm it seems like the reduced graph has 2 less nodes than the full graph actually:
Number of nodes in full graph: 151
Number of nodes in training graph: 149
this is handled already for temporal walks by the function that grabs the embedding with try: ... except KeyError
, but I'd have to add the same logic for static walks too. Alternatively:
- keep passing in
nodes=pd.DataFrame(index=full_graph.nodes())
when constructing the graph - update the dataset loader to make sure every node still exists in the reduced graph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjun9 whats the reduced graph? I might have missed something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reduced graph is from taking the first 75% of edges from the full dataset as the "graph", and the remaining 25% edges for training and test examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this how they are doing it in the paper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yeah this would make sense that some nodes might be left out with the edge splitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, from the paper:
To generate a set of labeled examples for link prediction, we first sort the edges in each graph by time (ascending) and use the first 75% for representation learning. The remaining 25% are considered as positive links and we sample an equal number of negative edges randomly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another detail that the paper brushed under the rug...
Your alternative sounds more proper to me. With the note in the demo that the original method does not explain in their experimental section how they handle the missing nodes after the split and this is how you are taking care of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep passing in nodes=pd.DataFrame(index=full_graph.nodes()) when constructing the graph
Given reducing the code would likely break it, going with this option and keeping it as is seems fine to me.
Thanks everyone for the reviews! Will merge once CI passes, and i'll work on removing the experimental tag separately |
See #827
This running on CI with reduced numbers for:
@habiba-h I kept your approach of comparing biased random walks to temporal walks - there's barely any difference in the performance though 🤷♂
Also added the link prediction utilities directly in the demo - it ended up looking fine I think, so #1020 is closed