New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute filtered ranks for evaluating ComplEx #1080
Conversation
Check out this pull request on You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB. |
Code Climate has analyzed commit 667c344 and detected 4 issues on this pull request. Here's the issue category breakdown:
View more on Code Climate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woooo this looking really good! Great to see the results for WN18 matching the paper!! Just a couple of minor things.
The FB15K numbers still seem a little low but I don't think this is a blocking concern seeing as the WN18 results are matching - any idea whats going on?
Also not a blocking concern, just an idea for future work. I think users with powerful a CPU and/or GPU might notice a slowdown when using rank_edges_against_all_nodes
because the numpy ops are single threaded. We might be able to speed this up using tensorflow ops/layer/model for the bulk ops. What do you think?
@@ -175,27 +172,66 @@ def rank_edges_against_all_nodes(model, test_data, known_edges_graph): | |||
|
|||
num_nodes = known_edges_graph.number_of_nodes() | |||
|
|||
def ranks(pred, true_ilocs): | |||
batch_size = len(true_ilocs) | |||
def ranks( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a docstring for ranks
?
Also I think ranks
can pulled out of ComplEx
now - it doesn't seem have anything ComplEx
specific and I think the only outer scope variables it uses are knowledge_graph
and num_nodes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally resistant about adding too much documentation for scoped and private functions, because maintaining documentation is error-prone, and it just drifts. However, I will move it out of ComplEx
to _ranks_from_score_columns
(or something) and give it a doc string.
Unclear, I tried with a few different parameters (e.g. increasing the embedding dimension from 150 to 200), and tweaking the learning rate, and they made a bit of difference (e.g. I kept the dimension at 200), but I got bored of waiting to do this hyperparameter optimisation work. If it's deemed important, I can put more effort into it/write a script, or even just switch to the slower Adagrad optimiser (one potential problem is our regularisation doesn't quite match theirs, because Keras's
That's not a bad idea, but I'd prefer to do it as future work, to keep this PR focused on the filtered-ranks-computation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
I don't think you need to worry about matching the FB15K results exactly - matching all the results from every paper would be extremely time consuming and the WN18 results are enough for me at least. Whadya reckon @PantelisElinas?
@huonw oh one more thing, you should update the changelog because |
I haven't been bothering to update the CHANGELOG patch-by-patch because I find it just results in merge conflicts. Since I'm "slacking" off in this respect, I've instead made sure to go over the git log before each release like #930. |
@huonw just letting you know that I won't be able to review this until next week. So, if you are satisfied with the one review, then go ahead and merge. P. |
Thanks for letting me know 👍 I'll merge this with just @kieranricardo's review. |
This adds the final piece of the evaluation required to make
ComplEx
non-@experimental
. it extends the ranking procedure performed in #901 to also compute the "filtered" ranks. This gives the rest of the metrics in Table 2 of the ComplEx paper (http://jmlr.org/proceedings/papers/v48/trouillon16.pdf):As a reminder from #901, the knowledge graph link prediction metrics for a test edge
E = (s, r, o)
connecting nodess
ando
are calculated by ranking the prediction for that edge against all modified-sourceE' = (n, r, o)
and modified-objectE'' = (s, r, n)
edges (for all nodesn
in the graph). The "raw" ranks are just the rank ofE
against theE'
and against theE''
.The "filtered" ranks exclude the modified edges
E'
andE''
that are known, i.e. are in the train, validation or test sets. For instance, ifE = (A, x, B)
has score 1, but the modified edges(A, x, C)
and(A, x, D)
have scores 1.3 and 1.5 respectively,E
has raw modified-object rank 3. If(A, x, D)
is in the train set (or validation or test) but(A, x, C)
is not, it is excluded from the filtered ranking, and soE
has filtered modified-object rank 2.This has been a struggle to implement correctly, because it has been difficult to correctly use the right node nodes in the right place of the ranking procedure. For modified-object ranking, with
E
andE''
as above, calculating the score ofE
in the column of scores of every modified-object edgeE''
needs to useo
, but calculating the known edges similar toE
needs to use(s, r, _)
, not(o, r, _)
(the latter is meaningless). (And similarly for modified-subject ranking.) It sounds obvious when written out like this, but it's somewhat difficult to keep track of which entity needs to go where in practice. (@kieranricardo had this key insight.)The implementation works by: start with the raw
greater
matrix, where each column represents the test edgesE
with a row for every node in the graph (i.e. rown
represents swapping noden
into the subject or object) and the elements of a column areTrue
if the score of that modified edge is greater than the score ofE
. For each edge/column, compute the indices of the similar known edges and set those indices to false, leaving only unknown edges with scores greater thanE
.See: #1060