Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macro vs micro-averaging switched up in user guide #28585

Open
uhoenig opened this issue Mar 6, 2024 · 10 comments
Open

Macro vs micro-averaging switched up in user guide #28585

uhoenig opened this issue Mar 6, 2024 · 10 comments

Comments

@uhoenig
Copy link

uhoenig commented Mar 6, 2024

Describe the issue linked to the documentation

Hi guys,
In the "ROC curve using micro-averaged OvR" part of the doc (https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#roc-curve-using-micro-averaged-ovr)

it says:
"In a multi-class classification setup with highly imbalanced classes, micro-averaging is preferable over macro-averaging. In such cases, one can alternatively use a weighted macro-averaging, not demoed here."

I believe it should say:
In a multi-class classification setup with highly imbalanced classes, macro-averaging is preferable over micro-averaging. In such cases, one can alternatively use a weighted macro-averaging, not demoed here.

If correct, I believe it could spare users some confusion. Thanks for all your work, Im just trying to help :) !!!

Suggest a potential alternative/fix

I believe it should say:
In a multi-class classification setup with highly imbalanced classes, macro-averaging is preferable over micro-averaging. In such cases, one can alternatively use a weighted macro-averaging, not demoed here.

@uhoenig uhoenig added Documentation Needs Triage Issue requires triage labels Mar 6, 2024
@adrinjalali
Copy link
Member

cc @ogrisel @lorentzenchr @GaelVaroquaux who might have a better intuition here.

@fkdosilovic
Copy link
Contributor

@uhoenig You are right.

In micro averaging, all examples are treated equally (we compare predictions and ground truth for each example and compute the necessary metrics), while for macro averaging all classes are treated equally (we compare prediction and ground truth of examples for each class, compute the metrics for each class, and average those class metrics to get the macro average). See slide 41.

As such, in a multi-class setting with imbalanced classes, if the overall performance of a classifier is important to us, we should opt for macro-based evaluation.

On that note, it seems that few sentences above it should also be macro instead of micro:

Micro-averaging aggregates the contributions from all the classes (using numpy.ravel) to compute the average metrics as follows:

should be

Macro-averaging aggregates the contributions from all the classes (using numpy.ravel) to compute the average metrics as follows:

@glemaitre glemaitre removed the Needs Triage Issue requires triage label Mar 11, 2024
@glemaitre
Copy link
Member

Micro-average works will aggregate all instances (samples from all classes) to compute the metric. So a data point from a "minority" or "majority" class will have the same impact.

Macro-average will group sample by class and aggregate (using the mean) afterwards. Therefore, you increase the importance of data points from under-represented classes because you consider them as important of highly populated classes.

Having these aspects in mind, I cannot tell that one metric is particularly better than the other; it all boils down of the application and the setup where the classifier will be used.

@glemaitre
Copy link
Member

I assume that we should just remove the statement and instead make it explicit what are the consequence to use one or another type of average.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Mar 11, 2024 via email

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Mar 11, 2024 via email

@jmarintur
Copy link
Contributor

I'd love to contribute to this issue. It may also be worth mentioning that in a multi-class classification setup with balanced classes, both macro and micro averaging could produce comparable results when evaluating ROC curves (as can be seen for the Iris plants dataset in the documentation).

@uhoenig
Copy link
Author

uhoenig commented Mar 12, 2024

I appreciate the discussion regarding the documentation on micro and macro averaging for classification in multiclass scenarios. I concur with glemaitre's viewpoint on neutrally explaining the effects of choosing either averaging method, as it allows users to make informed decisions based on their specific needs without swaying them towards one option.

However, I would like to emphasize the value of providing guidance, particularly regarding handling imbalanced datasets. Real-world datasets are seldom perfectly balanced, making tips on effective strategies—like this library provides for: stratified sampling, class weighting, and the appropriate use of macro averaging in multiclass settings—not just useful but crucial for achieving reliable results. These recommendations align with the library's consistent efforts to equip users to tackle imbalanced classes effectively.

Removing explicit statements about the superiority of one method over another is sensible. Yet, maintaining practical advice reflects the realities users face and supports the library's broader goal of fostering effective and informed machine learning practices.

@lorentzenchr
Copy link
Member

I would like to emphasize the value of providing guidance, particularly regarding handling imbalanced datasets.

Agreed. In that case the recommendation is:

  • Choose a metric as close as possible to the business/use case outcome you hope to achieve.
  • To compare (classification) models and measure predictive performance, use consistent scoring rules. They are not negatively affected by imbalanced classes, nor do you need to choose beween micro vs macro averaging.

@lorentzenchr
Copy link
Member

PR welcome to fix the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants