-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There should be metrics package #22439
Comments
I'm not sure what the difference would be with the loss functions from torch.nn ? |
Correct me if I'm wrong, but losses functions are intended to do .backward() on the Tensors returned by them. But for some metrics, like RSquared, Quantiles, ... it doesn't make sense or is undoable. So metrics don't need to calculate gradients and so one. |
The autograd overhead is very small, even if you can do a .backward(), you don't have to. I agree though that standard metrics could be implemented, either in tnt or in the main package? |
I get your point, you are correct. In my opinion, those metrics should be added to the main package, because as you pointed, it could be done within losses function from torch.nn. If it'd be done in tnt project, how would you classify which metric or 'loss' should be implemented in main package or in tnt? |
I'd like to also discuss, where should it be implemented? For sure as a new losses functions? If so, how should for example quantiles look like? from ..... import Quantile
quantile = Quantile(0.5, axis=0)
tensor = torch.tensor([[1, 2], [3, 4], [5, 6], [7, 8]])
quantile(tensor)
>>> [4, 5] |
The place to put it I guess is to implement a function in I would say it's fine to have non differentiable functions in |
I want to see a fuller proposal for metrics that expands on what metrics need to be metrics instead of loss functions in I think a lot (or a significant) amount of metrics actually belong in |
It'd take some time to write full proposal, but for now I'd like to propose some ideas and hear your opinion.
r_squared = RSquaredMetric()
for model_out, target_data in zip(model_outs, targets):
r_squared.accumulate(model_out, target_data)
correlation_matrix = r_squared.get()
|
What you present looks quite similar to the meters in tnt here with new types of metrics. |
I'm sorry but I know about existence of tnt since you mentioned it two days ago. And now, after brief reading of its documentation, I thing that its meters package meets my needs, since it provide accumulating batched data functionality. But there are only few (really few) meters available. And none of them probably is useful to me. So now I think, we need to implement more of them, including some losses from PyTorch (MSE, ...) and more. |
Check ignite, it is more feature complete and maintained |
I was thinking about this recently as well and came across something similar in Tensorflow. I believe the idea is to have metrics such as mAP, precision (at k), nDCG etc., easily available and with strong GPU and multi-dimensional tensor support, so that there is less reinventing of the wheel and users don't get hit by bugs when trying to implement multi-dimensional versions of them. Perhaps extending ignite's metrics module is the right way to go? Thoughts? |
@varunagrawal that's exactly the idea. I believe extending ignite's metrics, is what PyTorch members want. And I might be wrong, but I still don't know why should it be in separate project, as almost everyone using PyTorch, want to evaluate model accuracy. |
@jeffreyksmithjr says that this issue has been filed multiple times. It's common in analogous libraries; it's a weird user experience that we don't have these metrics out of the box. (And it may become a requirement for some of our internal requirements.) We (in PyTorch core) may need to step up and do the design work for something like this. (In terms of internal priority, this might come after having non-neural network baseline models working in PyTorch.) @fmassa says, if we can find someone to work on this, that would be great. But it will take some time to come up with a good design (echoing what @soumith said). Moving ignite metrics into PyTorch core is definitely a good idea; it's just not clear that ignite's design is stabilized to the extent that we can actually do this. Part of the trouble is that here in core PyTorch, most of us are not writing code that are using metrics (though, @jeffreyksmithjr points out that we have plenty of internal people who have relevant experience here.) |
I continue to feel that the omission of metrics is just a hole in our feature set that should be filled. But I totally agree that some metrics belong in domain libraries (torchvision, etc.) and some belong completely outside of PT. There is some base level of functionality that I think could make a lot of sense for us to provide us the batteries obviously worthy of including. Beyond the concerns of mine that @ezyang transcribed above, I would call out that we're increasing the amount of staffing of folks who are expected to produce materials like tutorials. Not having even basic metrics built in creates yet more weird devX for tutorial authors. In my book on machine learning systems, I always relied upon first-party library implementations of metrics to demonstrate any concept other than the single section where the reader was learning how to implement metrics code. It's just more comprehensible for a user to not need a whole separate component to check the performance of their trained model. That said, I don't have strong feelings about how we would structure such a solution. I think we could try to ensure a level of modularity, if it was deemed valuable, so long as we eventually pointed users to something that didn't introduce a lot of new cognitive overhead (which I would argue that tnt and ignite do if you're literally only trying to call a single function like I'll take responsibility for seeing if we can find an internal team member who could at least get us to a draft design that made sense to discuss. Low detail issues like this one tend to not really move the conversation more. We need an actual list of metrics to work from. |
@jeffreyksmithjr @ezyang I would be more than happy to help with the implementation. |
@ezyang @jeffreyksmithjr I'm also willing to help. |
I guess what should happen now is someone puts together some sort of
initial proposal, with reference to the existing out-of-PyTorch
implementations, and then we talk about it :)
Excerpts from Kasper Sapała's message of 2019-07-09 13:26:31 -0700:
… @ezyang @jeffreyksmithjr I'm also willing to help.
|
@ezyang in what form should it be? As a comment to this issue? What should it contain? |
A comment to this issue seems like the best format we have right now. I think at a first cut, a fuller description of the APIs that would be added, and some overall description about organization and philosophy, would be good. |
Basic ideaLet me quote @varunagrawal:
It's somewhat wired user experience that you have to write them by hand on import another package (Ignite) to have such a basic functionality. From my experience, I can tell that in pollution forecast, it's essential to measure numpy corrcoef to check correlation between model output and target (measured) output, and I had to implement it by my self to allow batch accumulation and GPU support. Other Libraries
In Tensorflow there is metrics package. All of metrics return two operations: first to calculate metric output, second to update metric (ex. accumulate over batches). # Placeholders to take in batches of data
tf_label = tf.placeholder(dtype=tf.int32, shape=[None])
tf_prediction = tf.placeholder(dtype=tf.int32, shape=[None])
# Define the metric and update operations
acc_metric, acc_metric_update = tf.metrics.accuracy(tf_label,
tf_prediction,
name="acc_metric")
That's how to update metric: for i in range(n_batches):
# Update the running variables on new batch of samples
feed_dict={tf_label: labels[i], tf_prediction: predictions[i]}
session.run(acc_metric_update, feed_dict=feed_dict) Now we can actually compute our acc_metric: score = session.run(acc_metric) Because Tensorflow uses static graphs, it's a bit complicated, we have to use sessions. # Get variables created by metric
running_vars = tf.get_collection(tf.GraphKeys.LOCAL_VARIABLES, scope="acc_metric")
# Define initializer to initialize/reset running variables
running_vars_initializer = tf.variables_initializer(var_list=running_vars)
# Reset variables
session.run(running_vars_initializer)
As Tensorflow 2.0 is stil in beta stage, this (probably?) can change slightly. But for now each metric provide three methods (metrics are classes from now one):
Let's look at Accuracy metric as before but from the Tensorflow 2.x perspective: # Define metric as a class
m = tf.keras.metrics.Accuracy()
for i in range(n_batches):
# Update the metrics on new batch of samples
m.update_state(labels[i], predictions[i])
# Get accuracy:
m.result().numpy()
>>> 0.99 In my opinion, PyTorch's metrics should be implemented in similar way as the Tensorflow's 2.x are. SummaryAfter looking at other libraries, and from my experience, I came to conclusion that there should be new package called:
It is not necessary for Metric to accumulate gradients, nor remember operations history. But it needs deeper look whether to implement it as There is also a matter of which device to use for computations. Maybe there should be another method or constructor parameter, which will set device for all contained Tensors? Something like I'm happy contribute, and have already prototyped numpy corrcoef and numpy cov equivalents. class CovMetric:
def __init__(self):
self.x_mean = 0
self.c = 0
self.n = 0
@staticmethod
def __concat_input(x, y, rowvar):
if not rowvar and x.shape[0] != 1:
x = x.t()
if y is not None:
if not rowvar and y.shape[0] != 1:
y = y.t()
x = torch.cat((x, y), dim=0)
return x
def reset(self):
self.x_mean = 0
self.c = 0
self.n = 0
def accumulate(self, x, y=None, rowvar=True):
x = self.__concat_input(x, y, rowvar)
self.n += x.size(1)
xs = torch.sum(x, 1).unsqueeze(-1).expand_as(x)
new_mean = self.x_mean + (xs - self.x_mean * x.size(1)) / self.n
m1 = torch.sub(x, new_mean)
m2 = torch.sub(x, self.x_mean)
self.c += m1.mm(m2.t())
self.x_mean = new_mean
def compute(self):
return self.c / (self.n - 1)
class CorrcoefMetric:
def __init__(self):
self.cov = CovMetric()
def reset(self):
self.cov.reset()
def accumulate(self, x, y=None, rowvar=True):
self.cov.accumulate(x, y, rowvar)
def compute(self):
c = self.cov.compute()
# normalize covariance matrix
d = torch.diag(c)
stddev = torch.sqrt(d)
c /= stddev[:, None]
c /= stddev[None, :]
return torch.clamp(c, -1.0, 1.0)
if __name__ == '__main__':
N = 1024
M = 100
batch_size = 64
mat1 = torch.rand((N, M), dtype=torch.float64)
mat2 = torch.rand((N, M), dtype=torch.float64)
cor = CorrcoefMetric()
for i in range(N // batch_size):
cor.accumulate(
mat1[i * batch_size:(i + 1) * batch_size, ],
mat2[i * batch_size:(i + 1) * batch_size, ],
False
)
accumulated_cor = cor.compute().numpy()
numpy_cor = np.corrcoef(mat1.numpy(), mat2.numpy(), False)
print(np.allclose(numpy_cor, accumulated_cor))
>>> True |
Is anyone working on it ? |
@prasunanand I believe the consensus right now is to first come up with a proposal for this subpackage. This will be taken care of by someone on the Pytorch internal team, and after an initial setup is when we can start adding things. |
So is there any progress with this issue internally? |
@Darktex are you thinking we enable this in lightning? How would that look for you? We could add a flag called metrics or something and we'd call at the right times. but maybe i'm not understanding, it seems like you just want to import this package and call at the end of your training_step or validation_step? |
Reviving this thread to cross-link to the issue on Lightning here and to spur more discussion. Let's take Lightning as an example of a library on top of PyTorch that might want to handle some wrapping for the pretty printing. This would align well with the proposal I wrote a few comments above. I like reasoning from more concrete stuff, so let me try to refine that proposal in a more concrete manner (I am not married to anything here, I think it just helps ground things).
The biggest question I have is whether we can find a way to support computing metrics nicely once we factor in DDP in a way that is completely independent. I like how Ignite is doing it and I would consider starting from their API for the one in PyTorch (potentially taking the implementation too and decoupling it from the Ignite specifics so everyone can use it!). Here's a primer from their API: class Metric(metaclass=ABCMeta):
_required_output_keys = ("y_pred", "y")
def __init__(self, output_transform=lambda x: x, device=None):
self._output_transform = output_transform
# Check device if distributed is initialized:
if dist.is_available() and dist.is_initialized():
# check if reset and update methods are decorated. Compute may not be decorated
if not (hasattr(self.reset, "_decorated") and hasattr(self.update, "_decorated")):
warnings.warn("{} class does not support distributed setting. Computed result is not collected "
"across all computing devices".format(self.__class__.__name__),
RuntimeWarning)
if device is None:
device = "cuda"
device = torch.device(device)
self._device = device
self._is_reduced = False
self.reset()
@abstractmethod
def reset(self):
"""
Resets the metric to it's initial state.
This is called at the start of each epoch.
"""
pass
@abstractmethod
def update(self, output):
"""
Updates the metric's state using the passed batch output.
This is called once for each batch.
Args:
output: the is the output from the engine's process function.
"""
pass
@abstractmethod
def compute(self):
"""
Computes the metric based on it's accumulated state.
This is called at the end of each epoch.
Returns:
Any: the actual quantity of interest.
Raises:
NotComputableError: raised when the metric cannot be computed.
"""
pass Source: ignite.metrics.metric. To make them amenable to DDP, they use decorators to mark what they do during Another good thing that Ignite does very well here is making it easy for others to write their own DDP-compatible metrics without dealing with the DDP internals themselves. For example, take a look at the VariableAccumulation metric which is a parent/mixin that can bootstrap other concrete metrics such as For the sake of argument, let's say our metrics package looks exactly the same as Ignite's. The next question is how we are going to use it: Ignite is events-based so they just declare what metrics they want and stuff happens behind the scenes eg Let's write a sample eval loop: macro_f1 = nn.F1Score("macro")
with torch.no_grad():
for x, y in eval_dataloader:
y_hat = F.softmax(model(x), dim=-1)
macro_f1.update(y_hat, y) # void method, just update state
m = macro_f1() # forward calls compute(), takes no args. Compute from state
print(f"Macro f1: {m.item()}") # or whatever else you want to do With DDP, maybe this still works - normally we don't need to pass the loss through macro_f1 = nn.F1Score("macro")
torch.distributed.init_process_group(backend='nccl', world_size=4, init_method='...')
model = DistributedDataParallel(model, device_ids=[i], output_device=i)
macro_f1 = DistributedDataParallel(macro_f1, device_ids=[i], output_device=i)
with torch.no_grad():
for x, y in eval_dataloader:
y_hat = F.softmax(model(x), dim=-1)
macro_f1.update(y_hat, y) # void method, just update state
m = macro_f1() # forward calls compute(), takes no args. Compute from state
print(f"Macro f1: {m.item()}") # or whatever else you want to do This is flexible, because we leave the responsibility of updating state and computing in the hands of the client, so they can do however they please. This will not limit researchers, actually quite the opposite: it will liberate them from writing metrics code that is unsexy and hard to do efficiently (for example on DDP). Trainer libraries can then do further magic by using events to update and compute for you at the right time. Any feedback? |
That looks really nice. And as I understand, sklearn functions replacement would be useful for small datasets, in which you don't have to accumulate state, because everything fits into memory, potentially increasing performance? |
Yes, and we can factor out components as needed so that they share as much code as possible |
Very nice! This is exactly what should be done! |
So now we have everything ready to start?
@Darktex proposed to put metrics in torch.nn: |
I don't think the metrics should be separated from In Everything is inside (You may ask: Then why do we have For the implementation, it'd be obvious that the metrics should be implemented in a new |
The APIs for a metrics package, especially because of (2), are not obvious. |
Hi, I'm a student who is a beginner at Pytorch but I understand what you guys are trying to do here. This comes from personal experience in that when I started the MNIST tutorial it was kind of weird I had to calculate accuracy myself. Thus I would love to help out, hopefully under some kind of mentorship! |
Thanks @Raikan10! We'll post here as there are developments. If you're interested in contributing to PyTorch generally then you could also check out issues with the "OSS contribution wanted" label. See this query: https://github.com/pytorch/pytorch/issues?q=is%3Aissue+is%3Aopen+label%3A%22oss+contribution+wanted%22. |
@Raikan10 for the metrics package we are looking for help with some metrics in lightning. good place to start! |
@Southmith :
Interesting remarks.
Maybe, one can identify more in details
the couplings of metrics with core training
dataset : different from training
Compute:
only forward pass ?
Aggregation : distribution
Callback ?
training time usage :
Scheduler, state of NN
… On Apr 30, 2020, at 1:11, Soumith Chintala ***@***.***> wrote:
Metrics should be different from torch.nn, because they are not differentiable
Metrics needs to fluidly interact with torch.distributed, and hence often in non-trivial ways with the training loop.
The APIs for a metrics package, especially because of (2), are not obvious.
Both 1 and 2 makes it nicer to have a separate metrics package that can eventually stabilize and come into core pytorch
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I definitely agree that Inconsistent input format for binary classification and multiclass problemsIn the first case, Ignite's Accuracy expects labels as input, whilst in the second case it expects probabilities or logits. It was a big point of confusion to me. No shortcuts for saying "I want to pass logits/probabilities as input"Fundamentally, Accuracy is a metric that takes predicted and correct labels as input and returns the percentage of correct predictions as output. However, in practice neural networks trained for classification often return logits or probabilities. For example, in the case of binary classification, I have never written the following: accuracy = Accuracy() Instead, I always have to write: accuracy = Accuracy(transform=lambda x: torch.round(torch.sigmoid(x)))
# either
accuracy = Accuracy(transform=lambda x: torch.round(x)) Suggested solution for both problems: let the user explicitly say in which form input will be passed: import enum
class Accuracy(...):
class Mode(enum.Enum):
LABELS = enum.auto()
PROBABILITIES = enum.auto()
LOGITS = enum.auto()
def __init__(self, mode=Mode.LABELS, ...):
... The suggested interface can be also extended to support custom thresholds by adding the Yes, the cc: @vfdev-5 |
@WeirdKeksButtonK this can be a nice enhancement for Ignite as well. Please, open an issue in Ignite repository. @soumith
@Darktex you do have also the same control of that in ignite with almost the same API : metric = ...
metric.reset()
for _ in range(n):
metric.update(y_pred, y)
result = metrics.compute() Another point is about distributed package, currently native torch distributed supports only nccl, gloo, mpi backends. For users playing with XLA, aggregating methods should be readapted. |
https://github.com/chinokenochkan/torch-metrics Feel free to suggest more metrics/ contributions are welcome! |
For those who are interested, I have implemented a package of IQA metrics: |
Check out https://github.com/PyTorchLightning/metrics! |
@edenlightning
Thanks
What about end to end sparseTensors support ?
Tzhx
… On Mar 24, 2021, at 8:56, edenlightning ***@***.***> wrote:
Check out https://github.com/PyTorchLightning/metrics!
Over 25 implementations and a simple API to build your own metric, optimized for distributed training!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
For those following along, there is now a third-party torchmetrics package that may be helpful https://torchmetrics.readthedocs.io/en/latest/ EDIT: edenlightning already mentioned this!! |
Closing this issue out since torchmetrics and torcheval fill this need now |
🚀 Feature
Why not implement some common metrics, to evaluate models, with strong GPU acceleration.
Motivation
When I need to evaluate model accuracy (ex. measure r squared) I have to write it by hand, or use Numpy or other library (without GPU support). It'd be cleaner and simpler to have dedicated package with Pytonic API. It could also provide performance benefits because of low-level (c++) optimizations.
Pitch
I'd like to have some package ex:
torch.metrics
then I could do something like:Or I could do:
Alternatives
For now, I can use some Tensors modifications or use Numpy:
If you agree, I'd like to contribute.
cc @ezyang @gchanan @zou3519 @albanD @mruberry
The text was updated successfully, but these errors were encountered: