Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubt Reason Based on Entropy #11

Closed
koaning opened this issue Nov 14, 2021 · 10 comments
Closed

Doubt Reason Based on Entropy #11

koaning opened this issue Nov 14, 2021 · 10 comments

Comments

@koaning
Copy link
Owner

koaning commented Nov 14, 2021

If a machine learning model is very "confident" then the proba scores will have low entropy. The most uncertain outcome is a uniform distribution which would contain high entropy. Therefore, it could be sensible to add entropy as a reason for doubt.

@koaning
Copy link
Owner Author

koaning commented Nov 23, 2021

I wonder ... what's a reasonable threshold here?

@avvorstenbosch
Copy link

avvorstenbosch commented Nov 30, 2021

I see two ways to use this:

  • Return all predictions with a high uncertainty; E > T1
  • Return all predictions with a high certainty, that don't match the dataset label; argmax(P) != Y, E < T2

I've been thinking about your question about the threshold, but I haven't been able to figure out a reasonable threshold value. I've been combing through some litterature related to this, but if such a threshold is used, it is often just a hyperparameter that is tuned, without a theoretical argument.
One thing that might help is to use the Normalized Shannon Entropy, since entropy values for distributions with a different number of classes are difficult to compare. A method that I could see working would be to determine the threshold relative to the entropy distribution of the dataset. The first thing that comes to mind would be to consider the lowest/highest percentiles, although I think there are more clever tricks available.

@koaning
Copy link
Owner Author

koaning commented Nov 30, 2021

Normalized entropy, as described here seems like a sound idea! Thanks for the mention 👍 I think I'm fine with keeping the threshold as a hyperparameter in this entropy-reason if that prevents adding an assumption to the stack. I think it'd be good to gather feedback anyway.

Return all predictions with a high certainty, that don't match the dataset label; argmax(P) != Y, E < T2

I'm wondering ... is this something best addressed via WrongPredictionReason. We may want to add a hyperparameter there for this use-case.

@Garve
Copy link
Contributor

Garve commented Dec 11, 2021

Hi!

I created a PR for version 1 of the entropy reason here. I went for a threshold of 0.5, just because it worked well for the iris dataset. 0.2 would have produced way too many non-zeros.

Best
Robert

@Garve
Copy link
Contributor

Garve commented Dec 11, 2021

Another way to tackle the "wtf should the threshold be" problem: Maybe we can specify a quantile instead of an absolute threshold like 0.5. This means that we can specify some quantile alpha and then only just flag a share of alpha samples having the highest normalized Shannon entropies.

@koaning
Copy link
Owner Author

koaning commented Dec 12, 2021

I'm wondering ... is this something best addressed via WrongPredictionReason. We may want to add a hyperparameter there for this use-case.

We also the ShortConfidence reason and the LongConfidence reason.

@koaning
Copy link
Owner Author

koaning commented Dec 12, 2021

Maybe we can specify a quantile instead of an absolute threshold like 0.5.

Part of me likes the idea. But I'm worried that we may introduce a lot of hyperparams and that at the moment it's unclear how much more useful doubt based on entropy will be compared to the margin-based reason.

@glevv
Copy link

glevv commented Dec 13, 2021

I think it's possible to use Hoover index instead of entropy: it's easier to compute, it is always in 0-1 range and has clear explanation (0 - equality/uniformity, 1 - inequality).

There is also a bigger problem with this approach in multiclass setting: assume you have 4 classes, if your probas are 0.25-0.25-0.25-0.25 then entropy/uniformity measure will correctly find them, but if you have something like 0-0 5-0.5-0 than it will fail, but this sample still could be mislabeled. This problem becomes even more sever with more classes. Straightforward solution would be to use one-vs-rest scheme.

@koaning
Copy link
Owner Author

koaning commented Dec 13, 2021

I'm wondering ... can we come up with a situation where entropy based doubt can adress issues that the other reasons cannot?

@koaning
Copy link
Owner Author

koaning commented Dec 15, 2021

Fixed by #24

@koaning koaning closed this as completed Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants