Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smatch provide wrong and random scores #1870

Open
1 task done
flipz357 opened this issue Jan 21, 2024 · 2 comments
Open
1 task done

Smatch provide wrong and random scores #1870

flipz357 opened this issue Jan 21, 2024 · 2 comments
Assignees
Labels

Comments

@flipz357
Copy link

flipz357 commented Jan 21, 2024

Describe the bug
As also noted in the original Smatch repo issues, the Smatch score gives wrong and unverifiable results. This is also the case for HanLP.

Code to reproduce the issue

s = """(r / result-01
   :ARG1 (c / compete-01
            :ARG0 (w / woman)
            :mod (p / preliminary)
            :time (t / today)
            :mod (p2 / polo
                     :mod (w2 / water)))
   :ARG2 (a / and
            :op1 (d / defeat-01
                    :ARG0 (t2 / team
                              :mod (c2 / country
                                       :wiki +
                                       :name (n / name
                                                :op1 "Hungary")))
                    :ARG1 (t3 / team
                              :mod (c3 / country
                                       :wiki +
                                       :name (n2 / name
                                                 :op1 "Canada")))
                    :quant (s / score-entity
                              :op1 13
                              :op2 7))
            :op2 (d2 / defeat-01
                     :ARG0 (t4 / team
                               :mod (c4 / country
                                        :wiki +
                                        :name (n3 / name
                                                  :op1 "France")))
                     :ARG1 (t5 / team
                               :mod (c5 / country
                                        :wiki +
                                        :name (n4 / name
                                                  :op1 "Brazil")))
                     :quant (s2 / score-entity
                                :op1 10
                                :op2 9))
            :op3 (d3 / defeat-01
                     :ARG0 (t6 / team
                               :mod (c6 / country
                                        :wiki +
                                        :name (n5 / name
                                                  :op1 "Australia")))
                     :ARG1 (t7 / team
                               :mod (c7 / country
                                        :wiki +
                                        :name (n6 / name
                                                  :op1 "Germany")))
                     :quant (s3 / score-entity
                                :op1 10
                                :op2 8))
            :op4 (d4 / defeat-01
                     :ARG0 (t8 / team
                               :mod (c8 / country
                                        :wiki +
                                        :name (n7 / name
                                                  :op1 "Russia")))
                     :ARG1 (t9 / team
                               :mod (c9 / country
                                        :wiki +
                                        :name (n8 / name
                                                  :op1 "Netherlands")))
                     :quant (s4 / score-entity
                                :op1 7
                                :op2 6))
            :op5 (d5 / defeat-01
                     :ARG0 (t10 / team
                                :mod (c10 / country
                                          :wiki +
                                          :name (n9 / name
                                                    :op1 "United"
                                                    :op2 "States")))
                     :ARG1 (t11 / team
                                :mod (c11 / country
                                          :wiki +
                                          :name (n10 / name
                                                     :op1 "Kazakhstan")))
                     :quant (s5 / score-entity
                                :op1 10
                                :op2 5))
            :op6 (d6 / defeat-01
                     :ARG0 (t12 / team
                                :mod (c12 / country
                                          :wiki +
                                          :name (n11 / name
                                                     :op1 "Italy")))
                     :ARG1 (t13 / team
                                :mod (c13 / country
                                          :wiki +
                                          :name (n12 / name
                                                     :op1 "New"
                                                     :op2 "Zealand")))
                     :quant (s6 / score-entity
                                :op1 12
                                :op2 2))))
"""

if __name__ == "__main__":
     from hanlp.metrics.amr import smatch_eval
     path = "amr.tmp"
     with open(path, "w") as f:
         f.write(s)
     for _ in range(5):
        smatch_score = smatch_eval("amr.tmp", "amr.tmp")
        print(smatch_score)

Describe the current behavior
Totally wrong and random Smatch scores.

Expected behavior
A deterministic Smatch score of 100

System information

  • Linux Ubuntu 16.04
  • Python version: 3.8
  • HanLP version: current

Other info / logs
Not necessary. The problem is simply because using a hill-climber for graph matching is unsafe and intransparent, and lacks any upper-bound on the solution. This gets worse when graphs get more large than before, but can also occur on smaller graphs. A more detailed empirical study of the problem can be found here.

  • I've completed this form and searched the web for solutions.
@hankcs
Copy link
Owner

hankcs commented Jan 22, 2024

Thank you @flipz357 for reporting this. The randomness of Smatch implementations has been documented on our forum for 4 years and finally, you brought the community a solid solution. Your paper is quite dense, and I'll spend some time reading it then integrating your implementation soon.

@flipz357
Copy link
Author

Thanks @hankcs , apologies for any density in the paper, there's a few issues of current state of amr evaluation. But I think using a hill-climber for evaluation may clearly be the biggest current issue, since any of the scores from hill-climber are only lower-bounds and thus not verifiable (there are no upper-bounds), so we can never know if an output of the hill-climber is wrong, or correct (except of course if it returns 100 since then trivially it holds upper bound = lower bound).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants