Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interpretation of TOP property #25

Closed
oepen opened this issue Jan 5, 2020 · 5 comments · Fixed by #30
Closed

interpretation of TOP property #25

oepen opened this issue Jan 5, 2020 · 5 comments · Fixed by #30

Comments

@oepen
Copy link

oepen commented Jan 5, 2020

when comparing SMATCH results with the scorer from the 2019 CoNLL Shared Task on Cross-Framework Meaning Representation Parsing (MRP), we discovered that SMATCH will only consider the TOP property correct if the node labels also match. this appears to double-penalize for label mismatches and is maybe not the intended behavior? for more technical detail and a minimal test case, please see the MRP mtool issue.

@goodmami
Copy link
Contributor

goodmami commented Jan 6, 2020

Good catch. Here is an even-more-minimal test case:

$ cat i25a
(a / alpha
   :ARG0 (b / beta))
$ cat i25b  # only top concept different
(a / alternative
   :ARG0 (b / beta))
$ cat i25c  # only non-top concept different
(a / alpha
   :ARG0 (b / b-side))
$ python smatch.py --pr -f i25a i25b
Precision: 0.50
Recall: 0.50
F-score: 0.50
$ python smatch.py --pr -f i25a i25c
Precision: 0.75
Recall: 0.75
F-score: 0.75

When smatch is run with -v it shows how it considers the top triple an "attribute":

[...]
Instance triples of AMR 1: 2
[('instance', 'a0', 'alpha'), ('instance', 'a1', 'beta')]
Attribute triples of AMR 1: 1
[('TOP', 'a0', 'alpha')]
[...]

Note that the concept alpha is part of that TOP triple, when all it should probably do is indicate the top node. I tried making the attribute triple into a simple cycle on the top node (i.e., so the triple looks like ('TOP', 'a0', 'a0')) and seem to have resolved the error difference:

--- a/amr.py
+++ b/amr.py
@@ -418,7 +418,7 @@ class AMR(object):
             relation_list.append(node_rel_list)
             attribute_list.append(node_attr_list)
         # add TOP as an attribute. The attribute value is the top node value
-        attribute_list[0].append(["TOP", node_value_list[0]])
+        relation_list[0].append(["TOP", node_name_list[0]])
         result_amr = AMR(node_name_list, node_value_list, relation_list, attribute_list)
         return result_amr

Results in:

$ python smatch.py --pr -f i25a i25b
Precision: 0.75
Recall: 0.75
F-score: 0.75

Now it gets the score of 0.75 for both when the top label is different (i25b) and when the non-top label is different (i25c). In addition, it gets 0.75 when i25a is compared to the following variant which has the same triples but a different top:

(b / beta
   :ARG0-of (a / alpha))

@oepen
Copy link
Author

oepen commented Jan 12, 2020

i believe it has not consequences for the overall F1 score, but in my view the TOP property would better be considered a node attribute rather than a relation between a node and itself. treating it as an attribute would avoid (vacuously) stipulating a self-loop in the graph, and it preserves a one-to-one correspondence between relation triples and actual graph edges.

@goodmami
Copy link
Contributor

If that matters to you, then here's a diff that creates an attribute triple instead. It is probably equally unlikely to be confused with an actual graph triple (since we have no role called TOP). It leads to the same scores reported above for the TOP relation triple, at least for the 4 test files defined above.

--- a/amr.py
+++ b/amr.py
@@ -421,7 +421,7 @@ class AMR(object):
             relation_list.append(node_rel_list)
             attribute_list.append(node_attr_list)
         # add TOP as an attribute. The attribute value is the top node value
-        attribute_list[0].append(["TOP", node_value_list[0]])
+        attribute_list[0].append(["TOP", 'top'])
         result_amr = AMR(node_name_list, node_value_list, relation_list, attribute_list)
         return result_amr

@oepen
Copy link
Author

oepen commented May 28, 2020

hi again, @goodmami: the above looks just right to me (treating the TOP property as an attribute with a vacuous value). would you be set up to submit that as a pull request?

i am wondering whether we could try to work through some more of the known issues and hope to arrive at a re-release of an improved SMATCH sometime over the next few weeks?

@goodmami
Copy link
Contributor

Yes, I can put together a PR for that, but for a re-release what do people think of replacing amr.py with Penman? Doing so would solve several existing issues here. It's on PyPI, and I've recently released a 1.0 version to go with my ACL 2020 demo of the library. The 1.0 version isn't so much a big change from the previous version as it is a commitment to support its current API.

goodmami added a commit to goodmami/smatch that referenced this issue May 29, 2020
snowblink14 pushed a commit that referenced this issue May 30, 2020
* Add tests/ subdirectory with one test file

* Make TOP relation value constant

Fixes #25

* Clarify CHANGELOG entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants