New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes Issue #294 #295

Merged
merged 1 commit into from Sep 17, 2012

Conversation

Projects
None yet
4 participants
@DrDub
Contributor

DrDub commented Sep 14, 2012

Proper implementation of masi_distance per Passonneau (2006).

@stevenbird

This comment has been minimized.

Show comment
Hide comment
@stevenbird

stevenbird Sep 17, 2012

Thanks. I'll go ahead and merge this, and ask Tom Lippincott to comment if he thinks anything is amiss.

Thanks. I'll go ahead and merge this, and ask Tom Lippincott to comment if he thinks anything is amiss.

This comment has been minimized.

Show comment
Hide comment
@drevicko

drevicko Aug 9, 2016

I hate to say it, but this isn't the same as the Passoneau paper either!

She basically multiplied jaccard=(1-len_intersection/len_union) by the m factor as calculated here.

This means the function should return:

return (1 - len_intersection / float(len_union)) * m

To quote her paper (page 834):
"MASI = J*M"
"Jaccard (1908) metric (the J term) is used to weight the differences in size of two sets, independent of whether sets are monotonic."
"If two sets Q and P are identical, M is 1. If one set is a subset of the other, M is 2/3. If the intersection and the two set differences are all non-null, then M is 1/3. If the sets are disjoint, M is 0."

I hate to say it, but this isn't the same as the Passoneau paper either!

She basically multiplied jaccard=(1-len_intersection/len_union) by the m factor as calculated here.

This means the function should return:

return (1 - len_intersection / float(len_union)) * m

To quote her paper (page 834):
"MASI = J*M"
"Jaccard (1908) metric (the J term) is used to weight the differences in size of two sets, independent of whether sets are monotonic."
"If two sets Q and P are identical, M is 1. If one set is a subset of the other, M is 2/3. If the intersection and the two set differences are all non-null, then M is 1/3. If the sets are disjoint, M is 0."

This comment has been minimized.

Show comment
Hide comment
@DrDub

DrDub Nov 4, 2016

Owner

You're correct, I misplaced the parenthesis. Thanks for catching this.

@drevicko , do you want to open an bug and issue a PR for this? Alternatively I guess the NLTK committers can do a point commit and fix this.

Owner

DrDub replied Nov 4, 2016

You're correct, I misplaced the parenthesis. Thanks for catching this.

@drevicko , do you want to open an bug and issue a PR for this? Alternatively I guess the NLTK committers can do a point commit and fix this.

This comment has been minimized.

Show comment
Hide comment
@aliiae

aliiae Nov 5, 2017

Hi! Correct me if I'm wrong, but while it is true that MASI=J*M, where J is the usual Jaccard index J=len_intersection/len_union, shouldn't the distance metric equal to 1-MASI=1-J*M, like in your initial version without parentheses?
This is mentioned in both papers:

  1. "Because Krippendorff’s Alpha measures disagreements, one minus Jaccard, and one minus MASI, are used in computing Alpha." (Passonneau, R. J. (2006) ‘Measuring Agreement on Set-valued Items (MASI) for Semantic and Pragmatic Annotation’)
  2. "For set-valued scales, we use MASI for the distance metric δ. It is equal to 1-JM." (Passonneau, R., Habash, N. and Rambow, O. (2006) ‘Inter-annotator agreement on a multilingual semantic annotation task’)

Hi! Correct me if I'm wrong, but while it is true that MASI=J*M, where J is the usual Jaccard index J=len_intersection/len_union, shouldn't the distance metric equal to 1-MASI=1-J*M, like in your initial version without parentheses?
This is mentioned in both papers:

  1. "Because Krippendorff’s Alpha measures disagreements, one minus Jaccard, and one minus MASI, are used in computing Alpha." (Passonneau, R. J. (2006) ‘Measuring Agreement on Set-valued Items (MASI) for Semantic and Pragmatic Annotation’)
  2. "For set-valued scales, we use MASI for the distance metric δ. It is equal to 1-JM." (Passonneau, R., Habash, N. and Rambow, O. (2006) ‘Inter-annotator agreement on a multilingual semantic annotation task’)

This comment has been minimized.

Show comment
Hide comment
@drevicko

drevicko Nov 7, 2017

I believe you're right. (perhaps I was confused by an earlier paper by Passonneau where she used the values {0,.33,.67,1} directly as a difference metric: "Computing Reliability for Coreference Annotation."?)

A simple check: If you think of the case where the sets are disjoint, you want the distance to be at a maximum. With my proposal it'd be zero!

Would you like to make a pull request or shall I?

I believe you're right. (perhaps I was confused by an earlier paper by Passonneau where she used the values {0,.33,.67,1} directly as a difference metric: "Computing Reliability for Coreference Annotation."?)

A simple check: If you think of the case where the sets are disjoint, you want the distance to be at a maximum. With my proposal it'd be zero!

Would you like to make a pull request or shall I?

stevenbird added a commit that referenced this pull request Sep 17, 2012

@stevenbird stevenbird merged commit bdbee1a into nltk:master Sep 17, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment