[MRG] Class weight imprecision #10515

Johayon · 2018-01-21T20:30:55Z

Reference Issues/PRs

Fixes #10233
Continue the work of #10249

What does this implement/fix? Explain your changes.

add epsilon to favour the true distribution in case when class_weight 'balanced' is used and float imprecision does not allow the sum of weights to be equal (float vs rational).

Any other comments?

Not sure if this is necessary.

…_weight

jnothman · 2018-01-24T07:41:09Z

Thanks. This is a good start on improving my pr. I'm not sure that I'm entitled to approve it because it's mostly my work!

Now. I'm sure this is helpful in the binary case. Yet it currently will apply jitter if there are differences in total weight between any two adjacent classes, which seems arbitrary; and the jitter in all classes may then be excessive. The former should be easy to solve; the latter I'm not sure.

…or multiple classes

…flawed sometimes

jnothman

I hope this isn't complete overkill for a rare problem...

jnothman · 2018-01-28T21:50:28Z

sklearn/utils/class_weight.py

@@ -54,13 +54,15 @@ def compute_class_weight(class_weight, classes, y):
        freq = np.bincount(y_ind).astype(np.float64)
        recip_freq = len(y) / (len(le.classes_) * freq)
        weight = recip_freq[le.transform(classes)]
-        if np.any(np.diff(freq * weight)):
+        freq_weight = np.reshape(freq * weight, (len(freq), 1))


I think this would be more readable with just len(np.unique(freq * weight)) > 1

jnothman · 2018-01-31T09:46:23Z

sklearn/utils/class_weight.py

@@ -78,6 +80,26 @@ def compute_class_weight(class_weight, classes, y):
    return weight


+def _jitter_transform(true_order, bad_order):


wow, this got big :p

I feel I may have gone a little overboard by trying to have a smaller jitter :)

jnothman · 2018-01-31T09:57:26Z

sklearn/utils/class_weight.py

+    # respect true order
+    k = len(true_order)
+    jitter = np.zeros(k)
+    for i in range(k // 2 + 1):


You're going to have to add some general comments on this algorithm!

jnothman · 2018-02-05T01:01:52Z

I have qualms about including this in practice. Let's say this is used for weighting samples in scoring, rather than training. Would that be a big problem?

Johayon · 2018-02-09T10:25:24Z

I must admit that I am not sure if we should include this in practice. It is possible that other imprecision may lead to the same issue even if we had exact rationals for the weights. I tried using integers for the weights instead of fraction, and it seems that it is possible to recreate the same issue (or the opposite with 0 accuracy).

ogrisel · 2022-01-14T16:18:26Z

Since nobody feels very comfortable with this change, maybe we could instead document the pitfall better?

#10233 (comment)

thomasjpfan · 2022-01-14T19:57:16Z

During a triaging meeting, we decided to close this PR and move forward with #10233 (comment)

CC @ogrisel @glemaitre

jnothman and others added 6 commits December 4, 2017 10:57

FIX? Add eps to obviate numerical imprecision issues in compute_class…

b07172d

…_weight

FIX typo

b21a1a1

Add test

aec3d2a

add test for sum of weighted frequence equal to sum of frequence

d72c734

add epsilon in weight computation, not weight directly

9d1715b

keep oredering

8941eae

Johayon changed the title ~~Class weight imprecision~~ [MRG] Class weight imprecision Jan 21, 2018

Johayon added 3 commits January 26, 2018 04:39

use jitter if any difference exists and minimize jitter if possible f…

6634a6b

…or multiple classes

use assert_allclose instead of assert_almost_equal as it seems to be …

3b6d7bc

…flawed sometimes

flake8 correction

0be6bcf

jnothman reviewed Jan 31, 2018

View reviewed changes

Johayon added 2 commits January 31, 2018 21:43

change condition, update jitter and add comments

ad56966

flake8

3fc1516

amueller added API Needs Decision Requires decision labels Aug 5, 2019

github-actions bot added the module:utils label Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:50

cmarmo added Needs Decision - Close Requires decision for closing and removed Needs Decision Requires decision labels Dec 23, 2021

ogrisel mentioned this pull request Jan 14, 2022

ENH Adds class_weight to HistGradientBoostingClassifier #22014

Merged

thomasjpfan closed this Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Class weight imprecision #10515

[MRG] Class weight imprecision #10515

Johayon commented Jan 21, 2018

jnothman commented Jan 24, 2018

jnothman left a comment

jnothman Jan 28, 2018

jnothman Jan 31, 2018

Johayon Jan 31, 2018

jnothman Jan 31, 2018

jnothman commented Feb 5, 2018

Johayon commented Feb 9, 2018

ogrisel commented Jan 14, 2022

thomasjpfan commented Jan 14, 2022

		@@ -78,6 +80,26 @@ def compute_class_weight(class_weight, classes, y):
		return weight


		def _jitter_transform(true_order, bad_order):

[MRG] Class weight imprecision #10515

[MRG] Class weight imprecision #10515

Conversation

Johayon commented Jan 21, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

jnothman commented Jan 24, 2018

jnothman left a comment

Choose a reason for hiding this comment

jnothman Jan 28, 2018

Choose a reason for hiding this comment

jnothman Jan 31, 2018

Choose a reason for hiding this comment

Johayon Jan 31, 2018

Choose a reason for hiding this comment

jnothman Jan 31, 2018

Choose a reason for hiding this comment

jnothman commented Feb 5, 2018

Johayon commented Feb 9, 2018

ogrisel commented Jan 14, 2022

thomasjpfan commented Jan 14, 2022