Adds mechanism for calibrating probabilities for category and binary features #1949

dantreiman · 2022-04-21T21:44:52Z

Implements:
temperature scaling for binary and category outputs
matrix scaling for category outputs

Example using Twitter Bots w categorical output feature:
On this example, matrix scaling does not improve ECE, though it improves NLL (probably due to overfitting) so matrix scaling gets rolled back and we get the original uncalibrated probs. On much larger datasets matrix scaling may yield better results.

Validation Set (used to determine softmax temperature):

Test set:

github-actions · 2022-04-21T22:10:30Z

Unit Test Results

      6 files ±  0       6 suites ±0 2h 24m 4s ⏱️ + 10m 50s
2 837 tests +  5 2 803 ✔️ +  5   34 💤 ±0 0 ❌ ±0
8 511 runs +15 8 405 ✔️ +15 106 💤 ±0 0 ❌ ±0

Results for commit aaea9d8. ± Comparison against base commit 8c6e189.

♻️ This comment has been updated with latest results.

w4nderlust · 2022-05-17T02:08:47Z

In the image I can't see the blue points, do they overlap with the green ones?

dantreiman · 2022-05-17T23:50:03Z

In the image I can't see the blue points, do they overlap with the green ones?

Thats right, in this example the matrix scaling got rolled back to the original uncalibrated probabilities.

ludwig/features/base_feature.py

ludwig/features/binary_feature.py

ludwig/features/category_feature.py

w4nderlust · 2022-05-26T23:14:01Z

ludwig/models/calibrator.py

+        self.batch_size = batch_size
+        self.skip_save_model = skip_save_model
+
+    def calibration(self, dataset, dataset_name: str, save_path: str):


Not 100% sure, but it seems to me that in some cases calibration is the act of calibrating outputs, in some cases calibration is the object/function that can calibrate and in some other cases it is the act of training the calibrator.

We probably should be a bit more explicit to avoid confusion.
Wdyt about train_calibrator, get_calibrated_probabilities_from_logits or something else?
Those may be too verbose, but we can brainstorm a bit on this

I like train_calibrator, that is clear. get_calibrated_probabilities_from_logits is a good name IMO, where are you thinking that should go (which method would be renamed to this)? Right now it happens in the PredictModule which just calls the forward method of the calibration module.

ludwig/models/calibrator.py

ludwig/models/predictor.py

ludwig/models/trainer.py

ludwig/utils/calibration.py

ludwig/api.py

…ogits in batch_predict.

…if no validation set available.

…alibration.

dantreiman marked this pull request as draft April 21, 2022 21:44

dantreiman changed the title ~~Adds mechanism for calibrating probabilities for category and binary features~~ Adds mechanism for calibrating probabilities for category and binary features [draft] Apr 21, 2022

dantreiman force-pushed the daniel/calibrate_probabilities branch from 19d6382 to 08cc421 Compare April 21, 2022 22:16

dantreiman force-pushed the daniel/calibrate_probabilities branch 2 times, most recently from 73a979d to 97058c0 Compare May 9, 2022 19:27

dantreiman force-pushed the daniel/calibrate_probabilities branch 3 times, most recently from 0e5f817 to 624ede4 Compare May 11, 2022 23:38

dantreiman changed the title ~~Adds mechanism for calibrating probabilities for category and binary features [draft]~~ Adds mechanism for calibrating probabilities for category and binary features May 11, 2022

dantreiman force-pushed the daniel/calibrate_probabilities branch 2 times, most recently from cb0419f to e84124f Compare May 16, 2022 22:42

dantreiman marked this pull request as ready for review May 17, 2022 00:55

dantreiman requested review from tgaddair, justinxzhao and w4nderlust May 17, 2022 00:55

w4nderlust reviewed May 26, 2022

View reviewed changes

dantreiman force-pushed the daniel/calibrate_probabilities branch 3 times, most recently from d6dcd7c to 52085ed Compare June 6, 2022 20:29

dantreiman force-pushed the daniel/calibrate_probabilities branch 5 times, most recently from 47274d9 to fb4c31b Compare June 14, 2022 19:13

tgaddair added feature New feature or request release-0.6 Feature to be implemented in v0.6 labels Jun 15, 2022

dantreiman force-pushed the daniel/calibrate_probabilities branch from b5fcd40 to 32b8011 Compare June 15, 2022 01:27

dantreiman added 26 commits June 22, 2022 16:56

Set random seed, testing to see if that makes a difference.

7cf1723

Remove checks for exact NLL, ECE values post calibration.

91fa8df

Restored LOGITS to EXCLUDE_PRED_SET, added another option to return l…

b92fb63

…ogits in batch_predict.

Factor calibration method out of Trainer into Calibrator

edc538c

Removed horovod argument from calibrator.

f86f16c

Return batch_size if eval_batch_size not specified.

2705053

Fix calibration_module docstring.

6082cc0

Updates comment, adds fallback method of calibrating on training set …

9697555

…if no validation set available.

Adds calibration registry, replaces if statements for instantiating c…

987d35b

…alibration.

Raise ValueError if unsupported calibration method specified.

8c0c1be

Remove calibrate method from Trainer

62b99aa

f string

c4f0358

Use backend to create predictor for calibration.

527bee5

Moves saving out of calibrator

cdab182

Fix comment.

a908b3f

Adds ray test of calibration.

e71c69f

Implements collect_logits in ray predictor.

db2b632

First pass implementation of collect_labels.

9d3a491

Implements collect_logits and collect_labels in ray backend.

bd49d47

Merge predictions and labels in ray backend

de40964

Reverts collect_labels, get labels from dataset in calibrate.

644f64b

Allow overriding EXCLUDE_PRED_SET when getting preds.

6500738

Changes 'calibration' config option to binary.

cb60df4

Test both binary and category output features in ray test.

d00a9b9

Comments/

5872afe

Adds type hints.

aaea9d8

dantreiman force-pushed the daniel/calibrate_probabilities branch from 81ca5b7 to aaea9d8 Compare June 22, 2022 23:56

dantreiman merged commit e65f74e into ludwig-ai:master Jun 23, 2022

dantreiman deleted the daniel/calibrate_probabilities branch June 23, 2022 17:12

tgaddair mentioned this pull request Jul 6, 2022

Calibrated Confidence Scores #1908

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds mechanism for calibrating probabilities for category and binary features #1949

Adds mechanism for calibrating probabilities for category and binary features #1949

dantreiman commented Apr 21, 2022 •

edited

Loading

github-actions bot commented Apr 21, 2022 •

edited

Loading

w4nderlust commented May 17, 2022

dantreiman commented May 17, 2022

w4nderlust May 26, 2022

dantreiman Jun 14, 2022

Adds mechanism for calibrating probabilities for category and binary features #1949

Adds mechanism for calibrating probabilities for category and binary features #1949

Conversation

dantreiman commented Apr 21, 2022 • edited Loading

github-actions bot commented Apr 21, 2022 • edited Loading

Unit Test Results

w4nderlust commented May 17, 2022

dantreiman commented May 17, 2022

w4nderlust May 26, 2022

Choose a reason for hiding this comment

dantreiman Jun 14, 2022

Choose a reason for hiding this comment

dantreiman commented Apr 21, 2022 •

edited

Loading

github-actions bot commented Apr 21, 2022 •

edited

Loading