Fix issue 1093 loss value mismatch #1103

jimthompson5802 · 2021-02-21T02:31:19Z

Code Pull Requests

Fix Issue #1093

Before this PR when an model consists of single output feature, the reported loss for combined differs slightly from the loss reported for the feature itself. In the case of multiple output features, the sum of the individual loss for each feature differs slightly from the loss reported for combined. As best I can tell, this is just a reporting issue. No effect on model convergence.

At this point this PR address the issue for the numeric, binary and categorical output features. Additional work is needed to propagate the fix to other output feature types.

Before PR

Here is excerpt of the Ludwig log that illustrates the issue.
b4_fix_log.txt

Here is a specific example for the numeric output feature. Note differences in the reported loss for y1 and combined.

Epoch 2
Training: 100%|██████████| 22/22 [00:00<00:00, 262.69it/s]
Evaluation train: 100%|██████████| 22/22 [00:00<00:00, 721.01it/s]
Evaluation vali : 100%|██████████| 3/3 [00:00<00:00, 625.74it/s]
Evaluation test : 100%|██████████| 7/7 [00:00<00:00, 800.83it/s]
Took 0.1817s
╒═══════╤════════╤═════════╤══════════════════════╤═══════════════════════╤═════════╕
│ y1    │   loss │   error │   mean_squared_error │   mean_absolute_error │      r2 │
╞═══════╪════════╪═════════╪══════════════════════╪═══════════════════════╪═════════╡
│ train │ 0.5693 │  0.2523 │               0.5693 │                0.5761 │ -5.4792 │
├───────┼────────┼─────────┼──────────────────────┼───────────────────────┼─────────┤
│ vali  │ 0.6719 │  0.2941 │               0.6719 │                0.6306 │ -6.6828 │
├───────┼────────┼─────────┼──────────────────────┼───────────────────────┼─────────┤
│ test  │ 0.7436 │  0.3089 │               0.7436 │                0.6677 │ -8.4869 │
╘═══════╧════════╧═════════╧══════════════════════╧═══════════════════════╧═════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.6402 │
├────────────┼────────┤
│ vali       │ 0.6639 │
├────────────┼────────┤
│ test       │ 0.6642 │
╘════════════╧════════╛

For the multi output feature use case, note the sum of the individual losses for the output feature does not equal combined

Epoch 3
Training: 100%|██████████| 22/22 [00:00<00:00, 121.39it/s]
Evaluation train: 100%|██████████| 22/22 [00:00<00:00, 223.90it/s]
Evaluation vali : 100%|██████████| 3/3 [00:00<00:00, 556.86it/s]
Evaluation test : 100%|██████████| 7/7 [00:00<00:00, 483.78it/s]
Took 0.4030s
╒═══════╤════════╤═════════╤══════════════════════╤═══════════════════════╤═════════╕
│ y1    │   loss │   error │   mean_squared_error │   mean_absolute_error │      r2 │
╞═══════╪════════╪═════════╪══════════════════════╪═══════════════════════╪═════════╡
│ train │ 0.4639 │  0.2116 │               0.4639 │                0.5136 │ -4.2804 │
├───────┼────────┼─────────┼──────────────────────┼───────────────────────┼─────────┤
│ vali  │ 0.5617 │  0.2632 │               0.5617 │                0.5691 │ -5.4227 │
├───────┼────────┼─────────┼──────────────────────┼───────────────────────┼─────────┤
│ test  │ 0.6211 │  0.2713 │               0.6211 │                0.6050 │ -6.9327 │
╘═══════╧════════╧═════════╧══════════════════════╧═══════════════════════╧═════════╛
╒═══════╤════════╤════════════╕
│ y2    │   loss │   accuracy │
╞═══════╪════════╪════════════╡
│ train │ 0.7907 │     0.4682 │
├───────┼────────┼────────────┤
│ vali  │ 0.8694 │     0.4375 │
├───────┼────────┼────────────┤
│ test  │ 0.7454 │     0.5755 │
╘═══════╧════════╧════════════╛
╒═══════╤════════╤════════════╤═════════════╕
│ y3    │   loss │   accuracy │   hits_at_k │
╞═══════╪════════╪════════════╪═════════════╡
│ train │ 2.5125 │     0.0809 │      0.2688 │
├───────┼────────┼────────────┼─────────────┤
│ vali  │ 2.4445 │     0.0417 │      0.2917 │
├───────┼────────┼────────────┼─────────────┤
│ test  │ 2.4906 │     0.0566 │      0.2642 │
╘═══════╧════════╧════════════╧═════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 3.7727 │
├────────────┼────────┤
│ vali       │ 3.9990 │
├────────────┼────────┤
│ test       │ 3.8001 │
╘════════════╧════════╛

After PR

with the PR here is the same output for the single numeric feature

Epoch 2
Training: 100%|██████████| 22/22 [00:00<00:00, 236.35it/s]
Evaluation train: 100%|██████████| 22/22 [00:00<00:00, 1067.87it/s]
Evaluation vali : 100%|██████████| 3/3 [00:00<00:00, 361.55it/s]
Evaluation test : 100%|██████████| 7/7 [00:00<00:00, 679.90it/s]
Took 0.1920s
╒═══════╤════════╤═════════╤══════════════════════╤═══════════════════════╤═════════╕
│ y1    │   loss │   error │   mean_squared_error │   mean_absolute_error │      r2 │
╞═══════╪════════╪═════════╪══════════════════════╪═══════════════════════╪═════════╡
│ train │ 0.6367 │  0.3242 │               0.6367 │                0.5967 │ -7.4361 │
├───────┼────────┼─────────┼──────────────────────┼───────────────────────┼─────────┤
│ vali  │ 0.5193 │  0.1818 │               0.5193 │                0.5644 │ -5.7765 │
├───────┼────────┼─────────┼──────────────────────┼───────────────────────┼─────────┤
│ test  │ 0.5360 │  0.1913 │               0.5360 │                0.5402 │ -6.6955 │
╘═══════╧════════╧═════════╧══════════════════════╧═══════════════════════╧═════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.6367 │
├────────────┼────────┤
│ vali       │ 0.5193 │
├────────────┼────────┤
│ test       │ 0.5360 │
╘════════════╧════════╛

And with the multiple output features the sum of the losses for the multiple output features now equal the combined:

Epoch 3
Training: 100%|██████████| 22/22 [00:00<00:00, 147.71it/s]
Evaluation train: 100%|██████████| 22/22 [00:00<00:00, 402.81it/s]
Evaluation vali : 100%|██████████| 3/3 [00:00<00:00, 444.30it/s]
Evaluation test : 100%|██████████| 7/7 [00:00<00:00, 497.02it/s]
Took 0.2890s
╒═══════╤════════╤═════════╤══════════════════════╤═══════════════════════╤═════════╕
│ y1    │   loss │   error │   mean_squared_error │   mean_absolute_error │      r2 │
╞═══════╪════════╪═════════╪══════════════════════╪═══════════════════════╪═════════╡
│ train │ 0.5158 │  0.2757 │               0.5158 │                0.5341 │ -5.8364 │
├───────┼────────┼─────────┼──────────────────────┼───────────────────────┼─────────┤
│ vali  │ 0.4241 │  0.1516 │               0.4241 │                0.5097 │ -4.5333 │
├───────┼────────┼─────────┼──────────────────────┼───────────────────────┼─────────┤
│ test  │ 0.4400 │  0.1593 │               0.4400 │                0.4871 │ -5.3179 │
╘═══════╧════════╧═════════╧══════════════════════╧═══════════════════════╧═════════╛
╒═══════╤════════╤════════════╕
│ y2    │   loss │   accuracy │
╞═══════╪════════╪════════════╡
│ train │ 0.7735 │     0.4682 │
├───────┼────────┼────────────┤
│ vali  │ 0.6530 │     0.6667 │
├───────┼────────┼────────────┤
│ test  │ 0.7751 │     0.4811 │
╘═══════╧════════╧════════════╛
╒═══════╤════════╤════════════╤═════════════╕
│ y3    │   loss │   accuracy │   hits_at_k │
╞═══════╪════════╪════════════╪═════════════╡
│ train │ 2.5176 │     0.0636 │      0.2514 │
├───────┼────────┼────────────┼─────────────┤
│ vali  │ 2.5683 │     0.1250 │      0.2500 │
├───────┼────────┼────────────┼─────────────┤
│ test  │ 2.5416 │     0.0660 │      0.2170 │
╘═══════╧════════╧════════════╧═════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 3.8069 │
├────────────┼────────┤
│ vali       │ 3.6454 │
├────────────┼────────┤
│ test       │ 3.7568 │
╘════════════╧════════╛

Note: Due to rounding hand calculating the sum of individual losses may not match the displayed combined sum in the 4th decimal position.
Here is the full log extract
after_fix_log.txt

jimthompson5802 · 2021-02-21T05:15:08Z

Commit 6a10c10 resolves issue I encountered with sampled_softmax_cross_entropy. I noticed in testing the output feature and combined loss values differ slightly. For this specific case, I think this is expected because of sampling is involved in calculating the loss. Loss calculations occur in two different locations: once during training and the second time when the metric is updated. Since loss computations occur in two different times, the sample used is most likely different.

… breakage

…_layer==1 only

… loss calculation

…d ludwig dependencies function

…l feature

jimthompson5802 · 2021-03-07T00:25:00Z

Given the changes for the sequence feature, this PR also fixes Issue #1096

This reverts commit e32ffd6

doc: update comments to reflect new naming

test: add learned_unigram to unit test

# Conflicts: # ludwig/modules/loss_modules.py # ludwig/modules/metric_modules.py

jimthompson5802 · 2021-04-23T04:05:32Z

@w4nderlust This PR is finally ready for review. Following summarizes the key changes:

Fix mis-match in reported loss values:

Converted all output features eval_loss_function to be subclass of tf.keras.losses.Loss this is the primary change to solve the mis-match in reported loss values

Fix for sampled_softmax_cross_entropy for sequence-like features

Added custom classes BasicDecoder and BasicDEcoderOutput to support retrieval of projection input tensor required for sampled softmax calculation.
Modified decder_teacher_forcing() method to use above custom classes to make available the projection input tensor: PROJECTION_INPUT.
Created sequence specific sampled softmax class: SequenceSampledCrossEntropyLoss
Renamed several classes to be more explicit
Created custom FixedUnigramCandidateSampler class to support updating the values to support Laplace smoothing of sampled candidate tensors.
Updated sequence_sampled_softmax_cross_entropy() function to support how Ludwig passes tensors in TF2
create sequence specific function to sample sequence-like features: sampled_values_from_sequence
Updated OutputFeature.call() method to support passing PROJECTION_INPUT tensor required for sampled softmax calculation.`

ludwig/modules/loss_modules.py

ludwig/features/category_feature.py

jimthompson5802 · 2021-04-25T01:27:23Z

Completed rework of categorical sampled softmax tensor passing.

doc: updated comments

jimthompson5802 · 2021-04-25T04:49:29Z

@w4nderlust

resolved the last two comments. However, I took a different approach to one suggested for this comment. The comment thread explains the approach.
Tested out the num_reserved_ids=2 parameter and adjusting uniqgram parameter to use class_counts[2:] for the fixed_unigram sampler. Did not make a difference, still saw zero entries in true_expected_count tensor. So backed off use of num_reserved_ids and still making use of Laplace smoothing to avoid numerical issues with sampled softmax calculation of logits.
Separated learned_unigram from fixed_unigram sampler.
Updated comments in the Sequence decoders to reflect our discussion on how to name and document shape of key tensors.
Clean up of deprecated code in various modules.

Assuming the Github Actions run is clean, the PR is ready for the next round of review.

jimthompson5802 added 6 commits February 7, 2021 08:10

fix: calculation combined loss for single output feature

0161f0f

chore: remove deprecated code

bcfc8c2

chore: source code format fix up

dbb8d2e

fix: calculation combined loss for single output feature

afbd271

fix: calculation combined loss for single output feature

4881fb0

fix: fix import error for mock function

7a03591

jimthompson5802 mentioned this pull request Feb 21, 2021

sampled_softmax_cross_entropy broken for machine translation example #1096

Closed

jimthompson5802 added 2 commits February 20, 2021 21:50

Merge branch 'master' into fix_issue_1093_loss_value_mismatch

ff364df

fix: resolve sampled_softmax_cross_entropy loss/metric issue

6a10c10

jimthompson5802 added 17 commits February 21, 2021 13:46

refactor: sequence loss/metric names in prep for sampled softmax version

46c43e6

feat: work-in-progress sampled softmax for sequences

4d7b5f4

refactor: work-in-progress sampled softmax for sequence feature

b24a5ba

refactor: work-in-progress for sampled softmax

9c9e740

refactor: work-in-progress for sampled softmax, test convergence

9aece72

refactor: WIP for sampled softmax, update metrics, seems to converge

b0a7da3

refactor: WIP for sampled softmax, speed improvement

9bce65b

refactor: undo sampled_softmax testing code, resolve gloabl unit test…

43501d0

… breakage

refactor: exclude last_hidden from prediction collection

0c664e3

refactor: WIP mods to greedy & beam decoders for sampled softmax, num…

973cebb

…_layer==1 only

refactor: WIP support lstm cell_type w/ and w/o Attention

93b3a94

refactor: WIP fix softmax cross entropy w/ beam search lack of logits

451ee82

doc: clarified comments for selecting eval_loss_function

8d68780

refactor: account for last_hidden tensor required for sampled_softmax…

b7f6332

… loss calculation

refactor: account for decoder type and beam width interaction

5216c6d

refactor: de-conflict last_hidden use between sampled_softmax loss an…

b01001e

…d ludwig dependencies function

fix: revert back to last_hidden for sample softmax used by categorica…

8cb72a1

…l feature

jimthompson5802 added 2 commits March 7, 2021 07:26

chore: clean-up commented code

e32ffd6

Revert "chore: clean-up commented code"

61a8b19

This reverts commit e32ffd6

jimthompson5802 added 13 commits April 19, 2021 21:50

feat: initial cleanup to support custom BasicDecoder class

c3ebc8a

refactor: reworked sampled_loss function

d9adb00

refactor: logits names

9996ef8

doc: code clean-up

4ac678f

fix: sequence test for sampled softmax changes

22d2410

doc: update comments on return signature

5290b3b

refactor: corrected formation logits used for sampled_softmax

64f1ac0

refactor: rename constant RNN_LAST_HIDDEN to PROJECTION_INPUT

1a6f964

doc: update comments to reflect new naming

refactor: remove obsolete code

abad8ca

feat: add sampler to unit test

bcbe529

fix: nan values appearing in logits

a6b8bed

test: add learned_unigram to unit test

Merge branch 'master' into fix_issue_1093_loss_value_mismatch

d19f4f0

# Conflicts: # ludwig/modules/loss_modules.py # ludwig/modules/metric_modules.py

chore: remove obsolete code

22fc1f3

w4nderlust reviewed Apr 24, 2021

View reviewed changes

ludwig/modules/loss_modules.py Outdated Show resolved Hide resolved

w4nderlust reviewed Apr 24, 2021

View reviewed changes

ludwig/features/category_feature.py Outdated Show resolved Hide resolved

refactor: remove deprecated code

86ea6de

jimthompson5802 added 4 commits April 24, 2021 21:51

chore: remove deprecated commented code

0dbf8b6

refactor: correct categorical support for sampled softmax loss

f71a6c0

refactor: naming of tensors

9c9a3dc

doc: updated comments

refactor: candidate sampler to provide learned_unigram

135a96a

Merge branch 'master' into fix_issue_1093_loss_value_mismatch

db26bfb

w4nderlust merged commit cbcb601 into ludwig-ai:master Apr 29, 2021

This was referenced Jun 3, 2021

Combined loss does not match #1190

Closed

Text decoder broken for machine translation example #1097

Closed

Pre-Trained Machine Translation #548

Closed

jimthompson5802 mentioned this pull request Jun 29, 2021

Reported epoch loss value for single output feature model does not match reported epoch combined loss #1093

Closed

jimthompson5802 deleted the fix_issue_1093_loss_value_mismatch branch July 10, 2021 11:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue 1093 loss value mismatch #1103

Fix issue 1093 loss value mismatch #1103

jimthompson5802 commented Feb 21, 2021 •

edited

Loading

jimthompson5802 commented Feb 21, 2021

jimthompson5802 commented Mar 7, 2021

jimthompson5802 commented Apr 23, 2021

jimthompson5802 commented Apr 25, 2021 •

edited

Loading

jimthompson5802 commented Apr 25, 2021

Fix issue 1093 loss value mismatch #1103

Fix issue 1093 loss value mismatch #1103

Conversation

jimthompson5802 commented Feb 21, 2021 • edited Loading

Code Pull Requests

Before PR

After PR

jimthompson5802 commented Feb 21, 2021

jimthompson5802 commented Mar 7, 2021

jimthompson5802 commented Apr 23, 2021

Fix mis-match in reported loss values:

Fix for sampled_softmax_cross_entropy for sequence-like features

jimthompson5802 commented Apr 25, 2021 • edited Loading

jimthompson5802 commented Apr 25, 2021

jimthompson5802 commented Feb 21, 2021 •

edited

Loading

jimthompson5802 commented Apr 25, 2021 •

edited

Loading