Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowed specification of the metric #dimensions #528

Merged
merged 7 commits into from Oct 5, 2022

Conversation

neubig
Copy link
Contributor

@neubig neubig commented Oct 4, 2022

This PR loosens the restriction that sufficient statistics must be a vector, and allows them to be a tensor with the dimension equal to Metric.stats_ndim().

It also demonstrates how this works on the NLGMetaEvaluation metric.

@pfliu-nlp and @odashi : could you please check this PR as a potential solution to the discussion in #527 ?

(sorry, after sending the review request I made a change of naming from dim->ndim, which I think is more in line with the naming in numpy)

@neubig neubig requested a review from pfliu-nlp October 4, 2022 09:38
@neubig neubig requested a review from odashi as a code owner October 4, 2022 09:38
@pfliu-nlp
Copy link
Collaborator

pfliu-nlp commented Oct 4, 2022

In general, I think if we want to keep stats as-is, this statement should be adjusted:

assert result.shape == result_shape

assert result.shape == result_shape, (

Note that result.shape will be a tuple with multiple values.

@neubig
Copy link
Contributor Author

neubig commented Oct 4, 2022

Thanks! I had actually already fixed that in 8d5471d because tests were failing.

@pfliu-nlp
Copy link
Collaborator

Cool! I found it! Last comment:

Do you think the following code will ignore the case when we uses_customized_aggregate but don't perform the significance test (not batched)? Or is this on purpose?

      if self.uses_customized_aggregate():
            if stats.is_batched():
                assert stats.get_batch_data().shape[0] == result.shape[0], (
                    "BUG: invalid operation: "
                    f"{type(self).__name__}._aggregate_stats(): "
                    f"Expected batch dimension {stats.get_batch_data().shape[0]}, but "
                    f"got {result.shape[0]}."
                )
      else:
          result_shape = (
              (stats.get_batch_data().shape[0], stats.num_statistics())
              if stats.is_batched()
              else (stats.num_statistics(),)
          )
          assert result.shape == result_shape, (
              "BUG: invalid operation: "
              f"{type(self).__name__}._aggregate_stats(): "
              f"Expected shape {result_shape}, but got {result.shape}."
          )

@neubig
Copy link
Contributor Author

neubig commented Oct 4, 2022

Yeah, that is intentional. It catches when the batch dimension differs, but doesn't do any checks otherwise. We might want to add additional checks, but I'm not sure what they would be.

@pfliu-nlp
Copy link
Collaborator

Yeah, that is intentional. It catches when the batch dimension differs, but doesn't do any checks otherwise. We might want to add additional checks, but I'm not sure what they would be.

OK, then I think it should be fine. (not sure if @odashi has other comments)

explainaboard/metrics/metric.py Show resolved Hide resolved
explainaboard/metrics/nlg_meta_evaluation.py Outdated Show resolved Hide resolved
@odashi
Copy link
Contributor

odashi commented Oct 5, 2022

@neubig I think this mitigation is not necessary at this point. Please take a look at alternatives in #527

@neubig
Copy link
Contributor Author

neubig commented Oct 5, 2022

Thanks @odashi ! To clarify, yes, I saw the mitigation strategies there. But I feel that they're relatively complicated. Even this here I feel is a bit hacky: https://github.com/neulab/ExplainaBoard/pull/528/files#diff-dc342aa901c2256e1da8308da557c5b23a17aab2d0383c0a2a795057ecbe61bdL116
and adding the dimensions as stats seems even more hacky.

The PR here seems to be a reasonable middle ground. It relaxes the dimension matching requirement a little bit but only in the case of use_customized_aggregate(), which we will only use sparingly.

@odashi
Copy link
Contributor

odashi commented Oct 5, 2022

@neubig This change gives unnecessarily wide permission for the Metric (not only for the problems in #527 but also any Metrics in the future) which looks too dangerous to me. I understand that the problem in #527 is exactly metric-specific, which should be resolved by the specific metric implementation itself as long as it can be applied.

@neubig
Copy link
Contributor Author

neubig commented Oct 5, 2022

To be clear, it only gives more permission to metrics that use use_customized_aggregate(), not any Metric. And I think the use of use_customized_aggregate() should be infrequent, and also a reason for increased scrutiny of the correctness of the implementation during code review (maybe the function should be commented as such?).

We could also perhaps make the check above more stringent if there is some part of the relaxed checks that is particularly worrisome.

@odashi
Copy link
Contributor

odashi commented Oct 5, 2022

I think nlg_meta_evaluation is no longer suitable to be implemented as a Metric; the requirement of the underlying data is completely different.

@odashi
Copy link
Contributor

odashi commented Oct 5, 2022

@neubig It means that use_customized_aggregate() can become a way to hack the Metric.

Copy link
Contributor

@odashi odashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change is appropriate, but we should go with this change to avoid some blockers of the actual tasks. There is a comment that should be applied before merging this (see L412 in metric.py)

@neubig
Copy link
Contributor Author

neubig commented Oct 5, 2022

Thanks @odashi , and yes, I totally see what you mean. I didn't think about Metrics that rely on sufficient statistics that aren't a "reduce" (sum or mean) of the sufficient statistics of each example when I first designed the Metric interface, and I think a larger refactoring is in order. We can think of this PR as a temporary fix.

@neubig neubig requested a review from odashi October 5, 2022 04:06
@neubig neubig merged commit 289d585 into main Oct 5, 2022
@odashi odashi deleted the loosen_metric_restriction branch October 5, 2022 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants