Reported epoch loss value for single output feature model does not match reported epoch combined loss #1093

jimthompson5802 · 2021-02-07T16:22:41Z

Describe the bug
For a single output feature model, the per epoch training/validation/test output feature loss value does not equal the combined training/validation/test loss values

To Reproduce
Steps to reproduce the behavior:

Run examples/mnist/simple_model_training.py with logging_level=logging.INFO
See following extract of log file. The output feature label's loss values are not equal to the combined loss value for every epoch. There are 'relatively' close but not equal.

╒══════════╕
│ TRAINING │
╘══════════╛


Epoch 1
Training: 100%|██████████| 469/469 [00:23<00:00, 19.83it/s]
Evaluation train: 100%|██████████| 469/469 [00:07<00:00, 62.35it/s]
Evaluation test : 100%|██████████| 79/79 [00:01<00:00, 77.71it/s]
Took 32.2061s
╒═════════╤════════╤════════════╤═════════════╕
│ label   │   loss │   accuracy │   hits_at_k │
╞═════════╪════════╪════════════╪═════════════╡
│ train   │ 0.1175 │     0.9683 │      0.9966 │
├─────────┼────────┼────────────┼─────────────┤
│ test    │ 0.1067 │     0.9708 │      0.9970 │
╘═════════╧════════╧════════════╧═════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.0960 │
├────────────┼────────┤
│ test       │ 0.0831 │
╘════════════╧════════╛


Epoch 2
Training: 100%|██████████| 469/469 [00:20<00:00, 22.43it/s]
Evaluation train: 100%|██████████| 469/469 [00:05<00:00, 86.99it/s]
Evaluation test : 100%|██████████| 79/79 [00:01<00:00, 42.91it/s]
Took 28.1646s
╒═════════╤════════╤════════════╤═════════════╕
│ label   │   loss │   accuracy │   hits_at_k │
╞═════════╪════════╪════════════╪═════════════╡
│ train   │ 0.0675 │     0.9838 │      0.9989 │
├─────────┼────────┼────────────┼─────────────┤
│ test    │ 0.0662 │     0.9830 │      0.9983 │
╘═════════╧════════╧════════════╧═════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.0527 │
├────────────┼────────┤
│ test       │ 0.0476 │
╘════════════╧════════╛


Epoch 3
Training: 100%|██████████| 469/469 [00:22<00:00, 21.27it/s]
Evaluation train: 100%|██████████| 469/469 [00:05<00:00, 81.35it/s]
Evaluation test : 100%|██████████| 79/79 [00:00<00:00, 95.46it/s]
Took 28.6584s
╒═════════╤════════╤════════════╤═════════════╕
│ label   │   loss │   accuracy │   hits_at_k │
╞═════════╪════════╪════════════╪═════════════╡
│ train   │ 0.0472 │     0.9875 │      0.9991 │
├─────────┼────────┼────────────┼─────────────┤
│ test    │ 0.0472 │     0.9863 │      0.9993 │
╘═════════╧════════╧════════════╧═════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.0428 │
├────────────┼────────┤
│ test       │ 0.0391 │
╘════════════╧════════╛


Epoch 4
Training: 100%|██████████| 469/469 [00:21<00:00, 22.08it/s]
Evaluation train: 100%|██████████| 469/469 [00:04<00:00, 98.50it/s] 
Evaluation test : 100%|██████████| 79/79 [00:00<00:00, 88.37it/s]
Took 26.9187s
╒═════════╤════════╤════════════╤═════════════╕
│ label   │   loss │   accuracy │   hits_at_k │
╞═════════╪════════╪════════════╪═════════════╡
│ train   │ 0.0400 │     0.9894 │      0.9996 │
├─────────┼────────┼────────────┼─────────────┤
│ test    │ 0.0422 │     0.9867 │      0.9993 │
╘═════════╧════════╧════════════╧═════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.0381 │
├────────────┼────────┤
│ test       │ 0.0375 │
╘════════════╧════════╛


Epoch 5
Training: 100%|██████████| 469/469 [00:19<00:00, 24.25it/s]
Evaluation train: 100%|██████████| 469/469 [00:04<00:00, 95.41it/s]
Evaluation test : 100%|██████████| 79/79 [00:00<00:00, 96.54it/s]
Took 25.9466s
╒═════════╤════════╤════════════╤═════════════╕
│ label   │   loss │   accuracy │   hits_at_k │
╞═════════╪════════╪════════════╪═════════════╡
│ train   │ 0.0291 │     0.9931 │      0.9997 │
├─────────┼────────┼────────────┼─────────────┤
│ test    │ 0.0324 │     0.9907 │      0.9997 │
╘═════════╧════════╧════════════╧═════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.0219 │
├────────────┼────────┤
│ test       │ 0.0224 │
╘════════════╧════════╛

Expected behavior
With only a single output feature, I was expecting the combined loss value to equal the single output feature's loss value per epoch

Screenshots
See log extract above

Environment (please complete the following information):

OS: MacOS 10.15.7, running Docker for Mac 3.04
Python version 3.6.9
Ludwig version 0.3.2. 0.3.3, 0.4-dev0

Additional context
From what I can tell this is solely reporting issue and does not affect a model's ability for convergence.

I have possible root cause for the reporting issue. In the context of the categorical feature, the CategoryOutputFeature._setup_loss() and CategoryOutputFeature._setup_metrics() methods may have an unintended interaction. In CategoryOutputFeature._setup_loss() there is this code fragment:

        self.eval_loss_function = SoftmaxCrossEntropyMetric(
            num_classes=self.num_classes,
            feature_loss=self.loss,
            name='eval_loss'
        )

and in CategoryOutputFeature._setup_loss()

        self.metric_functions[LOSS] = self.eval_loss_function

At conclusion of output feature construction, both self.eval_loss_function and self.metric_functions[LOSS] are pointing to the same instance of a Keras Metric object.

So during the training loop, this metric object is called twice, once in the context of calculating the loss for the output feature and a second time to capture the combined loss. During training the combined loss is accumulated in ECD.eval_loss_metric for reporting purposes. This results in the epoch loss calculations using different values for the output feature and combined.

Based on testing I've done, the fix is to change CategoryOutputFeature.eval_loss_function to be a Keras Loss function and not a metric and revise metric specification to CategoryOutputFeture.metric_functions[LOSS] to not reuse CategoryOutputFeature.eval_loss_function.

Further, this reporting issue appears to be present in other output features. Similar changes are needed in the other output features' _setup_loss() and _setup_metrics() methods.

I'll start a PR to address this issue.

The text was updated successfully, but these errors were encountered:

jimthompson5802 · 2021-06-29T00:47:51Z

Closed by #1103

This was referenced Feb 21, 2021

Fix issue 1093 loss value mismatch #1103

Merged

sampled_softmax_cross_entropy broken for machine translation example #1096

Closed

jimthompson5802 closed this as completed Jun 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reported epoch loss value for single output feature model does not match reported epoch combined loss #1093

Reported epoch loss value for single output feature model does not match reported epoch combined loss #1093

jimthompson5802 commented Feb 7, 2021

jimthompson5802 commented Jun 29, 2021

Reported epoch loss value for single output feature model does not match reported epoch combined loss #1093

Reported epoch loss value for single output feature model does not match reported epoch combined loss #1093

Comments

jimthompson5802 commented Feb 7, 2021

jimthompson5802 commented Jun 29, 2021