Allowed specification of the metric #dimensions #528

neubig · 2022-10-04T09:38:55Z

This PR loosens the restriction that sufficient statistics must be a vector, and allows them to be a tensor with the dimension equal to Metric.stats_ndim().

It also demonstrates how this works on the NLGMetaEvaluation metric.

@pfliu-nlp and @odashi : could you please check this PR as a potential solution to the discussion in #527 ?

(sorry, after sending the review request I made a change of naming from dim->ndim, which I think is more in line with the naming in numpy)

pfliu-nlp · 2022-10-04T12:28:57Z

In general, I think if we want to keep stats as-is, this statement should be adjusted:

assert result.shape == result_shape

ExplainaBoard/explainaboard/metrics/metric.py

Line 415 in e3bca2a

assert result.shape == result_shape, (

Note that result.shape will be a tuple with multiple values.

neubig · 2022-10-04T12:43:00Z

Thanks! I had actually already fixed that in 8d5471d because tests were failing.

pfliu-nlp · 2022-10-04T13:02:07Z

Cool! I found it! Last comment:

Do you think the following code will ignore the case when we uses_customized_aggregate but don't perform the significance test (not batched)? Or is this on purpose?

      if self.uses_customized_aggregate():
            if stats.is_batched():
                assert stats.get_batch_data().shape[0] == result.shape[0], (
                    "BUG: invalid operation: "
                    f"{type(self).__name__}._aggregate_stats(): "
                    f"Expected batch dimension {stats.get_batch_data().shape[0]}, but "
                    f"got {result.shape[0]}."
                )
      else:
          result_shape = (
              (stats.get_batch_data().shape[0], stats.num_statistics())
              if stats.is_batched()
              else (stats.num_statistics(),)
          )
          assert result.shape == result_shape, (
              "BUG: invalid operation: "
              f"{type(self).__name__}._aggregate_stats(): "
              f"Expected shape {result_shape}, but got {result.shape}."
          )

neubig · 2022-10-04T13:45:31Z

Yeah, that is intentional. It catches when the batch dimension differs, but doesn't do any checks otherwise. We might want to add additional checks, but I'm not sure what they would be.

pfliu-nlp · 2022-10-04T13:51:27Z

Yeah, that is intentional. It catches when the batch dimension differs, but doesn't do any checks otherwise. We might want to add additional checks, but I'm not sure what they would be.

OK, then I think it should be fine. (not sure if @odashi has other comments)

explainaboard/metrics/metric.py

explainaboard/metrics/nlg_meta_evaluation.py

odashi · 2022-10-05T01:04:00Z

@neubig I think this mitigation is not necessary at this point. Please take a look at alternatives in #527

neubig · 2022-10-05T01:12:34Z

Thanks @odashi ! To clarify, yes, I saw the mitigation strategies there. But I feel that they're relatively complicated. Even this here I feel is a bit hacky: https://github.com/neulab/ExplainaBoard/pull/528/files#diff-dc342aa901c2256e1da8308da557c5b23a17aab2d0383c0a2a795057ecbe61bdL116
and adding the dimensions as stats seems even more hacky.

The PR here seems to be a reasonable middle ground. It relaxes the dimension matching requirement a little bit but only in the case of use_customized_aggregate(), which we will only use sparingly.

odashi · 2022-10-05T02:46:34Z

@neubig This change gives unnecessarily wide permission for the Metric (not only for the problems in #527 but also any Metrics in the future) which looks too dangerous to me. I understand that the problem in #527 is exactly metric-specific, which should be resolved by the specific metric implementation itself as long as it can be applied.

neubig · 2022-10-05T02:57:14Z

To be clear, it only gives more permission to metrics that use use_customized_aggregate(), not any Metric. And I think the use of use_customized_aggregate() should be infrequent, and also a reason for increased scrutiny of the correctness of the implementation during code review (maybe the function should be commented as such?).

We could also perhaps make the check above more stringent if there is some part of the relaxed checks that is particularly worrisome.

odashi · 2022-10-05T02:57:46Z

I think nlg_meta_evaluation is no longer suitable to be implemented as a Metric; the requirement of the underlying data is completely different.

odashi · 2022-10-05T03:06:19Z

@neubig It means that use_customized_aggregate() can become a way to hack the Metric.

odashi

I don't think this change is appropriate, but we should go with this change to avoid some blockers of the actual tasks. There is a comment that should be applied before merging this (see L412 in metric.py)

neubig · 2022-10-05T04:06:18Z

Thanks @odashi , and yes, I totally see what you mean. I didn't think about Metrics that rely on sufficient statistics that aren't a "reduce" (sum or mean) of the sufficient statistics of each example when I first designed the Metric interface, and I think a larger refactoring is in order. We can think of this PR as a temporary fix.

Allowed specification of the metric #dimensions

8e0c696

neubig requested a review from pfliu-nlp October 4, 2022 09:38

neubig requested a review from odashi as a code owner October 4, 2022 09:38

neubig added 4 commits October 4, 2022 05:40

dim->ndim

5bdc111

Fix typo

0683e12

Changed size

8d5471d

Fix bug

952642e

Simplify code

c03e895

pfliu-nlp approved these changes Oct 4, 2022

View reviewed changes

explainaboard/metrics/metric.py Show resolved Hide resolved

explainaboard/metrics/nlg_meta_evaluation.py Outdated Show resolved Hide resolved

odashi reviewed Oct 5, 2022

View reviewed changes

Increase check and add warning

a1ef429

neubig requested a review from odashi October 5, 2022 04:06

odashi approved these changes Oct 5, 2022

View reviewed changes

neubig merged commit 289d585 into main Oct 5, 2022

neubig mentioned this pull request Oct 5, 2022

Introduce meta eval metric for general NLG tasks #526

Merged

odashi deleted the loosen_metric_restriction branch October 5, 2022 11:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowed specification of the metric #dimensions #528

Allowed specification of the metric #dimensions #528

neubig commented Oct 4, 2022 •

edited

pfliu-nlp commented Oct 4, 2022 •

edited

neubig commented Oct 4, 2022

pfliu-nlp commented Oct 4, 2022

neubig commented Oct 4, 2022

pfliu-nlp commented Oct 4, 2022

odashi commented Oct 5, 2022

neubig commented Oct 5, 2022

odashi commented Oct 5, 2022 •

edited

neubig commented Oct 5, 2022

odashi commented Oct 5, 2022

odashi commented Oct 5, 2022

odashi left a comment

neubig commented Oct 5, 2022

Allowed specification of the metric #dimensions #528

Allowed specification of the metric #dimensions #528

Conversation

neubig commented Oct 4, 2022 • edited

pfliu-nlp commented Oct 4, 2022 • edited

neubig commented Oct 4, 2022

pfliu-nlp commented Oct 4, 2022

neubig commented Oct 4, 2022

pfliu-nlp commented Oct 4, 2022

odashi commented Oct 5, 2022

neubig commented Oct 5, 2022

odashi commented Oct 5, 2022 • edited

neubig commented Oct 5, 2022

odashi commented Oct 5, 2022

odashi commented Oct 5, 2022

odashi left a comment

Choose a reason for hiding this comment

neubig commented Oct 5, 2022

neubig commented Oct 4, 2022 •

edited

pfliu-nlp commented Oct 4, 2022 •

edited

odashi commented Oct 5, 2022 •

edited