Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison to torchmetrics #82

Closed
rsokl opened this issue Nov 1, 2022 · 6 comments
Closed

Comparison to torchmetrics #82

rsokl opened this issue Nov 1, 2022 · 6 comments

Comments

@rsokl
Copy link

rsokl commented Nov 1, 2022

Hello! torcheval looks great!

I'd be interested to know how torcheval compares to torchmetrics. Are there certain shortcomings in torchmetrics that torcheval hopes to address? Any other insights into what inspired the creation of torcheval might help users understand what makes this project unique 😄

@yongen9696
Copy link

yongen9696 commented Nov 3, 2022

what makes this project unique

I wonder as well, torchmetrics is quite mature and complete with available metric tools.

@ninginthecloud
Copy link
Contributor

ninginthecloud commented Nov 3, 2022

Hi, @rsokl and @yongen9696 , thanks for the great question~

Kudos to the community

Support for metrics and evaluation has been a long-running request from the PyTorch community. First, we would like to give kudos to scikit-learn metrics, Keras Metrics, Ignite Metrics, and TorchMetrics as existing projects in the ML community that have inspired TorchEval. In particular, we have discussed these design points on multiple occasions with the developers of TorchMetrics.

What makes TorchEval unique?

Philosophy for TorchEval

TorchEval is a library that enables easy and performant model evaluation for PyTorch. The library’s philosophy is to provide minimal interfaces that are bolstered by a robust toolkit, alongside a rich collection of performant, out-of-the-box implementations. Critically, we believe in the following axes:

  • No surprises in behavior
  • Fast by default
  • Easily extensible
  • Works naturally in distributed applications

Components in TorchEval

Interface clarity

  1. Class-based metrics in TorchEval offer only update, compute, reset, and merge_state methods, which makes it obvious to callers what states are used for computing results. There’s only 1 way to get results from class-based metrics which means no risk of inadvertent usage that slows down performance.
  2. In the base Metric interface, TorchEval does not wrap the update() or compute() methods implemented by callers.

Metric synchronization in distributed applications

  1. Metric synchronization is supported through the toolkit, not on the base interface.
  2. Metric synchronization does not change metric states in-place, which means users don’t need to worry about undefined transitions (e.g. calling update() after sync()) or rewinding to previous states.
  3. Metric synchronization must be explicitly opted into by users. This makes it easy for callers to distinguish between results on a particular rank vs global results. There is no default synchronization on step, which has a significant performance overhead in distributed applications.
  4. Explicit: Synchronizing states in TorchEval operates on the whole metric object, not only per state. Specifically, TorchEval requires users to implement a merge_state() method to define how to synchronize the states. This avoids assumptions of the state object being synchronized.
  5. Performance: When synchronizing a metric, the TorchEval toolkit runs only 1 collective per metric. In the near future, we will augment the toolkit to offer the capability to synchronize a collection of metrics, further reducing the communication overhead.
  6. Extensible: The toolkit for metric synchronization today covers the typical SPMD use case, but can be extended to cover peer-to-peer use cases (e.g. via torch.distributed.rpc) without changing the base interface.

Performance

  1. We believe the biggest performance benefits come from the explicit interfaces offered. In addition to the points listed above on synchronization, out-of-the-box implementations offered are optimized with:
    1. Vectorization (example)
    2. JIT scripting (example)
    3. Binned metrics (example)
  2. Looking forward, TorchEval is exploring integrating custom kernels and/or Triton integrations to further accelerate computation.

Beyond Metrics

TorchEval also includes tools for evaluation like FLOPs and summarization techniques for modules.

We are open to your feedback about what else you'd find helpful in this library!

cc: @ananthsub @bobakfb @JKSenthil

@ananthsub
Copy link
Contributor

ananthsub commented Nov 4, 2022

I think @ninginthecloud 's reply summarizes the difference very well, so I'll close out this issue. @rsokl please let us know if you have further questions about this though!

@rsokl
Copy link
Author

rsokl commented Nov 4, 2022

Thank you! This response was very useful.

@rsokl
Copy link
Author

rsokl commented Nov 4, 2022

(given the engagement on this thread, you might consider pinning it in your issues section so that other inquiring users can find it easily 😄 )

@williamFalcon
Copy link

williamFalcon commented Nov 5, 2022

Hi! William here from Lightning. The Lightning team led the development of torchmetrics. There was a period where @ananthsub was a close member of the torchmetrics team where the impression that we were under was that he was contributing back to Lightning Torchmetrics OSS, however it seems that we have diverged now.

We developed metrics for the larger community (beyond Lightning). Metrics has become a de-facto standard across the PyTorch community.

We valued API stability when Meta started engaging, to the point where we went back and forth on design decisions that didn’t bring crystal clear value, but that would break people’s code and not benefit the broad PyTorch community.

Meta pushed for changes that our team championed but decided not to go ahead with, then decided to start their own very similar project, and are very actively working at having projects adopt their solution, which we don’t think is fair, because it fragments the community and there’s nothing that we couldn’t fundamentally fix.

This mostly just fragments the ecosystem… The “differences” are so minor, that one of our engineers will just address them in the next week…

I’m sure that eval is a good attempt at metrics and you can be the judge of what you prefer to use @rsokl. What I can say is that we have a whole company dedicated to making sure our software is the best in the world and are committed to providing first class support and integrating the feedback into torchmetrics. We’ve been working on this for years and have deep expertise in-house that you are leveraging through torchmetrics, not to mention a massive contributor ecosystem.

Thanks for the thorough comparison! we will be taking this feedback into consideration as we prepare for our next release.

cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants