Change order of updates in metric forward to increase efficiency #126

janvainer · 2021-03-24T16:39:39Z

🚀 Feature

Refactor Metric.forward() to call update only once.

Motivation

The update() method in Metric gets computed twice in forward() in case the compute_on_step is True.
This means repeated computation, which can slow down execution. For example, I have a custom SmoothL1Metric and the update function calculates element-wise L1 distance (see below). The problem arisesd when the tensors on which the metric is computed have many dimensions and the computation itself is slow.

class SmoothL1Metric(Metric):
    def __init__(self, mask_dim, dist_sync_on_step: bool = False, compute_on_step: bool = True):
        super().__init__(dist_sync_on_step=dist_sync_on_step, compute_on_step=compute_on_step)
        self.loss = torch.nn.SmoothL1Loss(reduction="sum")
        self.mask_dim = mask_dim

        self.add_state("sum", default=torch.tensor(0.0), dist_reduce_fx="sum")
        self.add_state("numel", default=torch.tensor(0.0), dist_reduce_fx="sum")

    def update(self, input, target, lens):
        mask = get_mask(input, lens, self.mask_dim).type(input.dtype)
        # this is a heavy computation that should not be executed twice
        self.sum += self.loss(input * mask, target * mask)
        self.numel += mask.sum()

    def compute(self):
        return self.sum / self.numel

Suggestion

How about something like:

def forward(self, *args, **kwargs):

    if self.compute_on_step:
        self._to_sync = self.dist_sync_on_step

        # save context before switch
        cache = {attr: getattr(self, attr) for attr in self._defaults.keys()}

        # call reset, update, compute, on single batch
        self.reset()
        self.update(*args, **kwargs)
        self._forward_cache = self.compute()

        # merge new and old context without recomputing update
        for attr, val in cache.items():
            setattr(self, attr, self._reductions[attr](val, getattr(self, attr)))
    else:
        with torch.no_grad():
            self.update(*args, **kwargs)
        self._forward_cache = None

    return self._forward_cache

The code probably does not work now, but the idea should be clear. What do you think?

The text was updated successfully, but these errors were encountered:

github-actions · 2021-03-24T16:40:40Z

Hi! thanks for your contribution!, great first issue!

Borda · 2021-03-24T19:13:53Z

@PyTorchLightning/core-metrics thoughts?

maximsch2 · 2021-03-24T19:58:12Z

This makes sense to me. Another nit/question is that self._forward_cache seems like it doesn't necessarily belong on self as it can also be a heavy object we don't want to store long-term. I would suggest even something like this:

def forward(self, *args, **kwargs):

    if not self.compute_on_step:
        with torch.no_grad():
            self.update(*args, **kwargs)
         return

    self._to_sync = self.dist_sync_on_step # why are we resetting this on every forward btw?

    # save context before switch
    cache = {attr: getattr(self, attr) for attr in self._defaults.keys()}

    # call reset, update, compute, on single batch
    self.reset()
    self.update(*args, **kwargs)
    result = self.compute()

    # merge new and old context without recomputing update
    for attr, val in cache.items():
        setattr(self, attr, self._reductions[attr](val, getattr(self, attr)))

    return result

janvainer · 2021-03-25T08:22:14Z

@maximsch2 I think the self._forward_cache is used in tests.

SkafteNicki · 2021-03-25T08:42:01Z

self._forward_cache is also used for internally logging in lightning, see this file:
https://github.com/PyTorchLightning/pytorch-lightning/blob/555a6fea212e340f0b3a9684829e6027e4ba27c0/pytorch_lightning/core/step_result.py#L303
so that cannot be removed.
@janvainer I really like this suggestion. I cannot completely comprehend if changing will work for all metrics (I am sure it will work for the majority). Could you try locally and see if you can get all tests passing with this change?

janvainer · 2021-03-25T08:45:20Z

Yes, I will make a PR :) But probably not today, but on the weekend.

janvainer · 2021-03-28T13:48:29Z

@SkafteNicki the current code is a bit problematic when it comes to the use of dist_reduce_fx argument.
The reduction function should mimic what happens in the update function. But the current framework does not enforce that.
So for example, in the tests, the DummyMetric uses dist_reduce_fx=None, but in update there is self.x += x, so the reduction for distributed use and update are not aligned. This is problematic if we want to use only a single update call in forward, because we don't know how what reduction is used in update. Check out the draft PR #141.

For this to work fine, the update method should calculate a single-step value of the metric and the internal state should be updated with a provided reduction function. What do you think? I am afraid this suggestion requires an api change, which will probably not be approved. Maybe there is some other way how to accomplish the suggestion in this issue, but I don't see it yet. Any ideas?

SkafteNicki · 2021-03-28T14:16:20Z

@janvainer i agree that this is problemation. I am pretty sure that dist_reduce_fx is aligned with the operations for all "real" implemented metrics (else test should fail) and the dummy metric is a special case where we have not aligned. With that in mind we could make the change.

However, it is troublesome that we do not have a explicit check. To get this correct, we would need a API change, with the most simple being that instead of the user has to do the operations in the update call, they should instead return them and then we call dist_reduce_fx. Something like:

# accuracy example
def __init__(self, ...)
    self.add_state("correct", torch.tensor(0), dist_reduce_fx = 'sum')
    self.add_state("nobs", torch.tensor(0), dist_reduce_fx = 'sum')

def update(self, preds, target):
    equal = (preds==target).sum()
    n_obs = preds.numel()
    return equal, n_obs # need to return in same order as states are initilized

def _wrap_update(self, update):

    @functools.wraps(update)
    def wrapped_func(*args, **kwargs):
        self._computed = None
        out = update(*args, **kwargs)
        for state, step_val in zip(self._defaults.keys(), out):
            setattr(state, self._reductions[state]([getattr(self, state), step_val]))
        return None
    return wrapped_func

janvainer · 2021-03-29T15:36:11Z

Thanks, what you are suggesting makes sense to me. This looks relatively nice :) The question is how big api changer this would be for existing users? Should I implement this?

def __init__(...):
    self.add_state("l1_distances", torch.tensor(0.0), dist_reduce_fx="sum")
    self.add_state("numel", torch.tensor(0), dist_reduce_fx="sum")

def update(self, prediction, target, mask):
    distances = l1_dist(prediction, target)
    numel = mask.sum()
    return distances, numel

def forward(self, *args, **kwargs):
    ...
    update_results = self.update(*args, **kwargs)
    # Update old state with new results with reduction functions
    ...
    # return single-step metric value (potentially differentiable scalar)
    return self.compute(*update_results)

@staticmethod
def compute(sum_of_distances, total_numel):
    return sum_of_distances / total_numel

janvainer · 2021-03-29T15:45:25Z

Ok I like this solution. The api change seems quite big though. Is it ok?

SkafteNicki · 2021-03-29T19:40:35Z

@janvainer as this is a fundamental API change we need more input on it.
@PyTorchLightning/core-metrics any opinions?

maximsch2 · 2021-03-29T22:48:17Z

One potential issue here is that this prevents in-place updates, right?

In https://github.com/PyTorchLightning/metrics/pull/128/files#diff-a605698e7c4a7849117d5d944263ea2218cc58795426cd2c98165794dc31365eR70-R74 I'm going over each column separately to avoid constructing full matrix as it OOMs for us otherwise (very large number of classes).

SkafteNicki · 2021-04-19T09:41:55Z

@janvainer after some going back and forth internally, here is an API change that suits the needs without breaking backwards compatability:

new `inplace` arg

Basic idea is that the base Metric class will get a new fifth argument called inplace that as default is True. The idea is that the flag will indicate if metric states are updated in-place inside the update method or instead will be returned from the update method and we will internally do the reduction using some variation of:

out = update(*args, **kwargs)
for state, step_val in zip(self._defaults.keys(), out):
    setattr(state, self._reductions[state]([getattr(self, state), step_val]))

if the flag is True nothing about the current code should change such that it is still backward compatible. The forward method should then look something like:

def forward(self, *args, **kwargs):
    if not self.inplace:
        return self.fast_forward(*args, **kwargs)
    # insert whatever is in forward now

Positive:

faster competition for metrics which can be calculated with inplace=False
still support for metrics that require in-place updates such as @maximsch2 mentions
backwards compatible with all users custom metrics

Negative:

more complex code base

Initial PR should implement the API changes in Metric and maybe redo a single metric such as Accuracy using the new faster API. Then in follow up PRs we can begin changing the remaining metrics.

janvainer · 2021-04-24T13:44:49Z

Hi @SkafteNicki, thanks for the follow-up. The suggested api changes make sense to me! I unfortunately do not have the capacity to look into it in the upcoming weeks, so feel free to reassign if anyone wants to pick this up (it should be possible to continue from my draft PR). I will be able to work on this some time in June I think.

SkafteNicki · 2021-04-26T14:27:41Z

@janvainer thanks for letting us know. I am going to un-assign you and also close the PR you have created, then we will see if someone feels like picking it up. Else feel free to ping me if you find the time to contribute again :]
Thanks!

stale · 2021-06-25T19:48:09Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

janvainer added enhancement New feature or request help wanted Extra attention is needed labels Mar 24, 2021

SkafteNicki assigned janvainer Mar 26, 2021

janvainer mentioned this issue Mar 28, 2021

Refactor Metric forward to use only a single update call #141

Closed

ananthsub mentioned this issue Apr 26, 2021

LoggerConnector Refactor Lightning-AI/pytorch-lightning#7183

Closed

4 tasks

stale bot added the wontfix label Jun 25, 2021

SkafteNicki mentioned this issue Jul 1, 2021

Metric API re-design #344

Closed

stale bot closed this as completed Jul 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change order of updates in metric forward to increase efficiency #126

Change order of updates in metric forward to increase efficiency #126

janvainer commented Mar 24, 2021 •

edited by Borda

Loading

github-actions bot commented Mar 24, 2021

Borda commented Mar 24, 2021

maximsch2 commented Mar 24, 2021

janvainer commented Mar 25, 2021

SkafteNicki commented Mar 25, 2021

janvainer commented Mar 25, 2021

janvainer commented Mar 28, 2021

SkafteNicki commented Mar 28, 2021

janvainer commented Mar 29, 2021

janvainer commented Mar 29, 2021

SkafteNicki commented Mar 29, 2021

maximsch2 commented Mar 29, 2021

SkafteNicki commented Apr 19, 2021

janvainer commented Apr 24, 2021

SkafteNicki commented Apr 26, 2021

stale bot commented Jun 25, 2021

Change order of updates in metric forward to increase efficiency #126

Change order of updates in metric forward to increase efficiency #126

Comments

janvainer commented Mar 24, 2021 • edited by Borda Loading

🚀 Feature

Motivation

Suggestion

github-actions bot commented Mar 24, 2021

Borda commented Mar 24, 2021

maximsch2 commented Mar 24, 2021

janvainer commented Mar 25, 2021

SkafteNicki commented Mar 25, 2021

janvainer commented Mar 25, 2021

janvainer commented Mar 28, 2021

SkafteNicki commented Mar 28, 2021

janvainer commented Mar 29, 2021

janvainer commented Mar 29, 2021

SkafteNicki commented Mar 29, 2021

maximsch2 commented Mar 29, 2021

SkafteNicki commented Apr 19, 2021

new inplace arg

janvainer commented Apr 24, 2021

SkafteNicki commented Apr 26, 2021

stale bot commented Jun 25, 2021

janvainer commented Mar 24, 2021 •

edited by Borda

Loading

new `inplace` arg