Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics support mask #54

Closed
YuxianMeng opened this issue Jul 11, 2020 · 27 comments
Closed

Metrics support mask #54

YuxianMeng opened this issue Jul 11, 2020 · 27 comments
Labels
enhancement New feature or request help wanted Extra attention is needed wontfix

Comments

@YuxianMeng
Copy link

YuxianMeng commented Jul 11, 2020

馃殌 Feature

Current metrics like Accuracy/Recall would be better to support mask.

Motivation

For example, when I deal with a Sequence Labeling Task and pad some sequence to max-length, I do not want to calculate metrics at the padding locations.

Pitch

I guess a simple manipulation would work for accuracy.(here is the original one)

from typing import Any, Optional

import torch
from pytorch_lightning.metrics.functional.classification import (
    accuracy,
)
from pytorch_lightning.metrics.metric import TensorMetric


class MaskedAccuracy(TensorMetric):
    """
    Computes the accuracy classification score
    Example:
        >>> pred = torch.tensor([0, 1, 2, 3])
        >>> target = torch.tensor([0, 1, 2, 2])
        >>> mask = torch.tensor([1, 1, 1, 0])
        >>> metric = MaskedAccuracy(num_classes=4)
        >>> metric(pred, target, mask)
        tensor(1.)
    """

    def __init__(
        self,
        num_classes: Optional[int] = None,
        reduction: str = 'elementwise_mean',
        reduce_group: Any = None,
        reduce_op: Any = None,
    ):
        """
        Args:
            num_classes: number of classes
            reduction: a method for reducing accuracies over labels (default: takes the mean)
                Available reduction methods:
                - elementwise_mean: takes the mean
                - none: pass array
                - sum: add elements
            reduce_group: the process group to reduce metric results from DDP
            reduce_op: the operation to perform for ddp reduction
        """
        super().__init__(name='accuracy',
                         reduce_group=reduce_group,
                         reduce_op=reduce_op)
        self.num_classes = num_classes
        self.reduction = reduction

    def forward(self, pred: torch.Tensor, target: torch.Tensor, mask: torch.Tensor) -> torch.Tensor:
        """
        Actual metric computation
        Args:
            pred: predicted labels
            target: ground truth labels
            mask: only caculate metrics where mask==1
        Return:
            A Tensor with the classification score.
        """
        mask_fill = (1-mask).bool()
        pred = pred.masked_fill_(mask=mask_fill, value=-1)
        target = target.masked_fill_(mask=mask_fill, value=-1)

        return accuracy(pred=pred, target=target,
                        num_classes=self.num_classes, reduction=self.reduction)

Alternatives

Additional context

@Borda
Copy link
Member

Borda commented Jul 11, 2020

Looks nice! @YuxianMeng want to send it as a PR?
Cc: @justusschock @SkafteNicki

@YuxianMeng
Copy link
Author

Looks nice! @YuxianMeng want to send it as a PR?
Cc: @justusschock @SkafteNicki

My pleasure:) A little question is should this PR contain only masked precision metrics or also contain other metrics?

@Borda
Copy link
Member

Borda commented Jul 11, 2020

I would say all, in fact it would be nice to have an abstract function/class that do this masking and the new metrics would be created just its application, so for example:

  • for functional make a wrapper which does the masking and all the new masked-like function will call existing functions with this wrapper
  • for class make abstract class and the masked-like metrics will be created as inheriting from the mask and metric class

Does it make sense? @justusschock @SkafteNicki

@justusschock
Copy link
Member

@YuxianMeng But with your implementation, you calculate it also for the values you set to -1 I think.

What you instead need to do is accuracy(pred[mask], target[mask]) which is why I wouldn't add extras for them to be honest. We can't include every special case here and masking tensors is not much overhead, which is why I'd prefer not to include this into the metrics package. Thoughts @SkafteNicki ?

@YuxianMeng
Copy link
Author

YuxianMeng commented Jul 12, 2020

@YuxianMeng But with your implementation, you calculate it also for the values you set to -1 I think.

What you instead need to do is accuracy(pred[mask], target[mask]) which is why I wouldn't add extras for them to be honest. We can't include every special case here and masking tensors is not much overhead, which is why I'd prefer not to include this into the metrics package. Thoughts @SkafteNicki ?

@justusschock As for accuracy, actually only the non-negative classes are calculated. I thought about using accuracy(pred[mask], target[mask]), but it may cause speed trouble when training on TPU

@SkafteNicki
Copy link
Member

I agree with @Borda that this should be an abstract function/class. The most simple, in my opinion, would be a class that the user can wrap their already existing metric with: masked_accuracy=MaskedMetric(Accuracy()). This would add a additional argument to the call: value = masked_accuracy(pred, target, mask). The alternative, re-writing each metric to include this feature, is not feasible at the moment.

@Borda
Copy link
Member

Borda commented Aug 3, 2020

@YuxianMeng mind send a PR and I guess @SkafteNicki or @justusschock could help/guide you throw 馃惏

@justusschock
Copy link
Member

I think I speak for both of us, saying that we'd for sure do that and really appreciate the PR :)

@SkafteNicki
Copy link
Member

Yes just ping us in the PR when you are ready, and we will assist you.

@YuxianMeng
Copy link
Author

Working on it, I will let you when I'm ready :)

@stale stale bot closed this as completed Oct 28, 2020
@hadim
Copy link

hadim commented Nov 19, 2020

This issue has been closed. Does the mask metrics features has landed? Or nobody has worked on it yet?

@SkafteNicki
Copy link
Member

It was closed due to no activity, so it is still not a part of lightning. @hadim please feel free to pick it up and send a PR :]

@davzaman
Copy link

davzaman commented Mar 6, 2021

I have implemented a version of this in my own project, would anyone like to collaborate on making a PR for this?

@Borda Borda reopened this Mar 6, 2021
@SkafteNicki
Copy link
Member

@davzaman please yes, would be a great addition :]

@davzaman
Copy link

I didn't implement it as a class wrapper, but I have a few ideas on how to do it. it might take me a little while as i have deadlines for other things but i will be working on this!

@Borda Borda transferred this issue from Lightning-AI/pytorch-lightning Mar 12, 2021
@Borda Borda added enhancement New feature or request help wanted Extra attention is needed labels Mar 17, 2021
@davzaman
Copy link

Hello, perhaps I'm missing something but I'm not sure that there's a one-size-fits-all answer to this that can just be implemented as a wrapper.

I may be breaking down the problem incorrectly. For metrics with a simple internal sum, just replacing values in the mask with 0 will suffice. For metrics that have an internal mean, the generic solution would be to sum over dim=1 and then replacen_obs with mask.sum(axis=1), and divide only where the denominator (or number of elements in the row of the mask is not 0). However, I'm not quite sure how to cover all metrics I feel like I could be missing scenarios. I'm not sure if there are metrics that could be dividing pred/target, and also if we want to support custom metrics.

What are your thoughts?

@SkafteNicki
Copy link
Member

@davzaman I definitely see the problem. My original idea for this feature would be a simple wrapper that just internally does metric(pred[mask], target[mask]) when the user calls metric(pred, target, mask) (or something similar). However, that would not work for all metrics I guess.

@davzaman
Copy link

yeah I think each metric would need to have its own, there's not an insane amount of metrics but the overhead of including tests for all of them might be much. should we just let users figure out masking on their own? Is there something we can at least include to make the process easier?

@stale stale bot added the wontfix label Jun 1, 2021
@Lightning-AI Lightning-AI deleted a comment from stale bot Jun 1, 2021
@stale stale bot removed the wontfix label Jun 1, 2021
@Lightning-AI Lightning-AI deleted a comment from stale bot Jul 16, 2021
@Lightning-AI Lightning-AI deleted a comment from github-actions bot Jul 16, 2021
@Borda
Copy link
Member

Borda commented Jul 16, 2021

@davzaman @SkafteNicki how is it going here?

@davzaman
Copy link

Hi @Borda We ran into issues in trying to follow a one-size-fits-all approach to including masks for metrics, since internally the computations might be very different (which would change the logic required to properly compute a "masked" version of the metric). I wasn't sure how to proceed from here. From what I could tell, it would be best to have a masked version of each metric separately, even though it's more work. There's a chance there's a solution that I didn't see.

@yassersouri
Copy link
Contributor

I think this issue is related to #362.

@Borda Borda removed the help wanted Extra attention is needed label Sep 20, 2021
@Borda
Copy link
Member

Borda commented Nov 3, 2021

@davzaman @yassersouri could you pls open a draft PR so we have a more concrete discussion...
and eventually, we can help to find a solution? 馃惏
cc: @justusschock @SkafteNicki

@yassersouri
Copy link
Contributor

@Borda Sorry, but I am quite busy right now. I don't think I will have time to allocate to this or #362.

@davzaman
Copy link

@Borda I don't think I have the time to allocate to this at the moment but I am happy to help move things along.

@Borda Borda added the help wanted Extra attention is needed label Nov 10, 2021
@stale
Copy link

stale bot commented Jan 10, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@ZhaofengWu
Copy link

Bumping this. I believe ignore_index could be fragile in certain circumstances, and a mask would be more reliable. For example, while it is a sensible default to set ignore_index to be the padding token index of a tokenizer, some tokenizers such as GPT-2's do not have a padding token. Masking is also how the AllenNLP metrics (https://github.com/allenai/allennlp/tree/main/allennlp/training/metrics) dealt with this. I believe individually implementing a mask for each metric is probably a good idea. I definitely understand it's a non-trivial amount of work, but I believe it's worth it in the long run, and hopefully the AllenNLP implementations could be a useful reference for some common metrics.

@davzaman
Copy link

There was also some sort of attempt to support masked Tensors but I think it has died out https://github.com/pytorch/maskedtensor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed wontfix
Projects
None yet
Development

No branches or pull requests

8 participants