New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allowed specification of the metric #dimensions #528
Conversation
In general, I think if we want to keep stats as-is, this statement should be adjusted:
ExplainaBoard/explainaboard/metrics/metric.py Line 415 in e3bca2a
Note that result.shape will be a tuple with multiple values. |
Thanks! I had actually already fixed that in 8d5471d because tests were failing. |
Cool! I found it! Last comment: Do you think the following code will ignore the case when we uses_customized_aggregate but don't perform the significance test (not batched)? Or is this on purpose?
|
Yeah, that is intentional. It catches when the batch dimension differs, but doesn't do any checks otherwise. We might want to add additional checks, but I'm not sure what they would be. |
OK, then I think it should be fine. (not sure if @odashi has other comments) |
Thanks @odashi ! To clarify, yes, I saw the mitigation strategies there. But I feel that they're relatively complicated. Even this here I feel is a bit hacky: https://github.com/neulab/ExplainaBoard/pull/528/files#diff-dc342aa901c2256e1da8308da557c5b23a17aab2d0383c0a2a795057ecbe61bdL116 The PR here seems to be a reasonable middle ground. It relaxes the dimension matching requirement a little bit but only in the case of |
@neubig This change gives unnecessarily wide permission for the Metric (not only for the problems in #527 but also any Metrics in the future) which looks too dangerous to me. I understand that the problem in #527 is exactly metric-specific, which should be resolved by the specific metric implementation itself as long as it can be applied. |
To be clear, it only gives more permission to metrics that use We could also perhaps make the check above more stringent if there is some part of the relaxed checks that is particularly worrisome. |
I think |
@neubig It means that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change is appropriate, but we should go with this change to avoid some blockers of the actual tasks. There is a comment that should be applied before merging this (see L412 in metric.py)
Thanks @odashi , and yes, I totally see what you mean. I didn't think about Metrics that rely on sufficient statistics that aren't a "reduce" (sum or mean) of the sufficient statistics of each example when I first designed the Metric interface, and I think a larger refactoring is in order. We can think of this PR as a temporary fix. |
This PR loosens the restriction that sufficient statistics must be a vector, and allows them to be a tensor with the dimension equal to
Metric.stats_ndim()
.It also demonstrates how this works on the
NLGMetaEvaluation
metric.@pfliu-nlp and @odashi : could you please check this PR as a potential solution to the discussion in #527 ?
(sorry, after sending the review request I made a change of naming from dim->ndim, which I think is more in line with the naming in numpy)