Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute filtered ranks for evaluating ComplEx #1080

Merged
merged 18 commits into from Mar 19, 2020

Conversation

huonw
Copy link
Member

@huonw huonw commented Mar 13, 2020

This adds the final piece of the evaluation required to make ComplEx non-@experimental. it extends the ranking procedure performed in #901 to also compute the "filtered" ranks. This gives the rest of the metrics in Table 2 of the ComplEx paper (http://jmlr.org/proceedings/papers/v48/trouillon16.pdf):

image

As a reminder from #901, the knowledge graph link prediction metrics for a test edge E = (s, r, o) connecting nodes s and o are calculated by ranking the prediction for that edge against all modified-source E' = (n, r, o) and modified-object E'' = (s, r, n) edges (for all nodes n in the graph). The "raw" ranks are just the rank of E against the E' and against the E''.

The "filtered" ranks exclude the modified edges E' and E'' that are known, i.e. are in the train, validation or test sets. For instance, if E = (A, x, B) has score 1, but the modified edges (A, x, C) and (A, x, D) have scores 1.3 and 1.5 respectively, E has raw modified-object rank 3. If (A, x, D) is in the train set (or validation or test) but (A, x, C) is not, it is excluded from the filtered ranking, and so E has filtered modified-object rank 2.

This has been a struggle to implement correctly, because it has been difficult to correctly use the right node nodes in the right place of the ranking procedure. For modified-object ranking, with E and E'' as above, calculating the score of E in the column of scores of every modified-object edge E'' needs to use o, but calculating the known edges similar to E needs to use (s, r, _), not (o, r, _) (the latter is meaningless). (And similarly for modified-subject ranking.) It sounds obvious when written out like this, but it's somewhat difficult to keep track of which entity needs to go where in practice. (@kieranricardo had this key insight.)

The implementation works by: start with the raw greater matrix, where each column represents the test edges E with a row for every node in the graph (i.e. row n represents swapping node n into the subject or object) and the elements of a column are True if the score of that modified edge is greater than the score of E. For each edge/column, compute the indices of the similar known edges and set those indices to false, leaving only unknown edges with scores greater than E.

See: #1060

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

@kieranricardo kieranricardo self-requested a review March 16, 2020 02:08
@huonw huonw changed the base branch from feature/862-complex-notebook to develop March 16, 2020 22:31
stellargraph/layer/knowledge_graph.py Outdated Show resolved Hide resolved
stellargraph/layer/knowledge_graph.py Outdated Show resolved Hide resolved
stellargraph/layer/knowledge_graph.py Show resolved Hide resolved
@codeclimate
Copy link

codeclimate bot commented Mar 16, 2020

Code Climate has analyzed commit 667c344 and detected 4 issues on this pull request.

Here's the issue category breakdown:

Category Count
Security 4

View more on Code Climate.

@huonw huonw marked this pull request as ready for review March 17, 2020 02:44
Copy link
Contributor

@kieranricardo kieranricardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woooo this looking really good! Great to see the results for WN18 matching the paper!! Just a couple of minor things.

The FB15K numbers still seem a little low but I don't think this is a blocking concern seeing as the WN18 results are matching - any idea whats going on?

Also not a blocking concern, just an idea for future work. I think users with powerful a CPU and/or GPU might notice a slowdown when using rank_edges_against_all_nodes because the numpy ops are single threaded. We might be able to speed this up using tensorflow ops/layer/model for the bulk ops. What do you think?

stellargraph/layer/knowledge_graph.py Outdated Show resolved Hide resolved
@@ -175,27 +172,66 @@ def rank_edges_against_all_nodes(model, test_data, known_edges_graph):

num_nodes = known_edges_graph.number_of_nodes()

def ranks(pred, true_ilocs):
batch_size = len(true_ilocs)
def ranks(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a docstring for ranks?

Also I think ranks can pulled out of ComplEx now - it doesn't seem have anything ComplEx specific and I think the only outer scope variables it uses are knowledge_graph and num_nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally resistant about adding too much documentation for scoped and private functions, because maintaining documentation is error-prone, and it just drifts. However, I will move it out of ComplEx to _ranks_from_score_columns (or something) and give it a doc string.

@huonw
Copy link
Member Author

huonw commented Mar 17, 2020

The FB15K numbers still seem a little low but I don't think this is a blocking concern seeing as the WN18 results are matching - any idea whats going on?

Unclear, I tried with a few different parameters (e.g. increasing the embedding dimension from 150 to 200), and tweaking the learning rate, and they made a bit of difference (e.g. I kept the dimension at 200), but I got bored of waiting to do this hyperparameter optimisation work. If it's deemed important, I can put more effort into it/write a script, or even just switch to the slower Adagrad optimiser (one potential problem is our regularisation doesn't quite match theirs, because Keras's BinaryCrossentropy takes the mean, but the paper and https://github.com/ttrouill/complex doesn't, so one needs to scale the L2 normalisation lambda appropriately).

Also not a blocking concern, just an idea for future work. I think users with powerful a CPU and/or GPU might notice a slowdown when using rank_edges_against_all_nodes because the numpy ops are single threaded. We might be able to speed this up using tensorflow ops/layer/model for the bulk ops. What do you think?

That's not a bad idea, but I'd prefer to do it as future work, to keep this PR focused on the filtered-ranks-computation.

Copy link
Contributor

@kieranricardo kieranricardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

I don't think you need to worry about matching the FB15K results exactly - matching all the results from every paper would be extremely time consuming and the WN18 results are enough for me at least. Whadya reckon @PantelisElinas?

@kieranricardo
Copy link
Contributor

@huonw oh one more thing, you should update the changelog because ComplEx is a new non-experimental algorithm 🎉

@huonw
Copy link
Member Author

huonw commented Mar 18, 2020

I haven't been bothering to update the CHANGELOG patch-by-patch because I find it just results in merge conflicts. Since I'm "slacking" off in this respect, I've instead made sure to go over the git log before each release like #930.

@PantelisElinas
Copy link
Contributor

@huonw just letting you know that I won't be able to review this until next week. So, if you are satisfied with the one review, then go ahead and merge.

P.

@huonw
Copy link
Member Author

huonw commented Mar 18, 2020

Thanks for letting me know 👍

I'll merge this with just @kieranricardo's review.

@huonw huonw merged commit bc0388a into develop Mar 19, 2020
@huonw huonw deleted the feature/1060-complex-filtered branch March 19, 2020 04:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants