Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reported epoch loss value for single output feature model does not match reported epoch combined loss #1093

Closed
jimthompson5802 opened this issue Feb 7, 2021 · 1 comment

Comments

@jimthompson5802
Copy link
Collaborator

Describe the bug
For a single output feature model, the per epoch training/validation/test output feature loss value does not equal the combined training/validation/test loss values

To Reproduce
Steps to reproduce the behavior:

  1. Run examples/mnist/simple_model_training.py with logging_level=logging.INFO
  2. See following extract of log file. The output feature label's loss values are not equal to the combined loss value for every epoch. There are 'relatively' close but not equal.
╒══════════╕
│ TRAINING │
╘══════════╛


Epoch 1
Training: 100%|██████████| 469/469 [00:23<00:00, 19.83it/s]
Evaluation train: 100%|██████████| 469/469 [00:07<00:00, 62.35it/s]
Evaluation test : 100%|██████████| 79/79 [00:01<00:00, 77.71it/s]
Took 32.2061s
╒═════════╤════════╤════════════╤═════════════╕
│ label   │   loss │   accuracy │   hits_at_k │
╞═════════╪════════╪════════════╪═════════════╡
│ train   │ 0.1175 │     0.9683 │      0.9966 │
├─────────┼────────┼────────────┼─────────────┤
│ test    │ 0.1067 │     0.9708 │      0.9970 │
╘═════════╧════════╧════════════╧═════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.0960 │
├────────────┼────────┤
│ test       │ 0.0831 │
╘════════════╧════════╛


Epoch 2
Training: 100%|██████████| 469/469 [00:20<00:00, 22.43it/s]
Evaluation train: 100%|██████████| 469/469 [00:05<00:00, 86.99it/s]
Evaluation test : 100%|██████████| 79/79 [00:01<00:00, 42.91it/s]
Took 28.1646s
╒═════════╤════════╤════════════╤═════════════╕
│ label   │   loss │   accuracy │   hits_at_k │
╞═════════╪════════╪════════════╪═════════════╡
│ train   │ 0.0675 │     0.9838 │      0.9989 │
├─────────┼────────┼────────────┼─────────────┤
│ test    │ 0.0662 │     0.9830 │      0.9983 │
╘═════════╧════════╧════════════╧═════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.0527 │
├────────────┼────────┤
│ test       │ 0.0476 │
╘════════════╧════════╛


Epoch 3
Training: 100%|██████████| 469/469 [00:22<00:00, 21.27it/s]
Evaluation train: 100%|██████████| 469/469 [00:05<00:00, 81.35it/s]
Evaluation test : 100%|██████████| 79/79 [00:00<00:00, 95.46it/s]
Took 28.6584s
╒═════════╤════════╤════════════╤═════════════╕
│ label   │   loss │   accuracy │   hits_at_k │
╞═════════╪════════╪════════════╪═════════════╡
│ train   │ 0.0472 │     0.9875 │      0.9991 │
├─────────┼────────┼────────────┼─────────────┤
│ test    │ 0.0472 │     0.9863 │      0.9993 │
╘═════════╧════════╧════════════╧═════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.0428 │
├────────────┼────────┤
│ test       │ 0.0391 │
╘════════════╧════════╛


Epoch 4
Training: 100%|██████████| 469/469 [00:21<00:00, 22.08it/s]
Evaluation train: 100%|██████████| 469/469 [00:04<00:00, 98.50it/s] 
Evaluation test : 100%|██████████| 79/79 [00:00<00:00, 88.37it/s]
Took 26.9187s
╒═════════╤════════╤════════════╤═════════════╕
│ label   │   loss │   accuracy │   hits_at_k │
╞═════════╪════════╪════════════╪═════════════╡
│ train   │ 0.0400 │     0.9894 │      0.9996 │
├─────────┼────────┼────────────┼─────────────┤
│ test    │ 0.0422 │     0.9867 │      0.9993 │
╘═════════╧════════╧════════════╧═════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.0381 │
├────────────┼────────┤
│ test       │ 0.0375 │
╘════════════╧════════╛


Epoch 5
Training: 100%|██████████| 469/469 [00:19<00:00, 24.25it/s]
Evaluation train: 100%|██████████| 469/469 [00:04<00:00, 95.41it/s]
Evaluation test : 100%|██████████| 79/79 [00:00<00:00, 96.54it/s]
Took 25.9466s
╒═════════╤════════╤════════════╤═════════════╕
│ label   │   loss │   accuracy │   hits_at_k │
╞═════════╪════════╪════════════╪═════════════╡
│ train   │ 0.0291 │     0.9931 │      0.9997 │
├─────────┼────────┼────────────┼─────────────┤
│ test    │ 0.0324 │     0.9907 │      0.9997 │
╘═════════╧════════╧════════════╧═════════════╛
╒════════════╤════════╕
│ combined   │   loss │
╞════════════╪════════╡
│ train      │ 0.0219 │
├────────────┼────────┤
│ test       │ 0.0224 │
╘════════════╧════════╛

Expected behavior
With only a single output feature, I was expecting the combined loss value to equal the single output feature's loss value per epoch

Screenshots
See log extract above

Environment (please complete the following information):

  • OS: MacOS 10.15.7, running Docker for Mac 3.04
  • Python version 3.6.9
  • Ludwig version 0.3.2. 0.3.3, 0.4-dev0

Additional context
From what I can tell this is solely reporting issue and does not affect a model's ability for convergence.

I have possible root cause for the reporting issue. In the context of the categorical feature, the CategoryOutputFeature._setup_loss() and CategoryOutputFeature._setup_metrics() methods may have an unintended interaction. In CategoryOutputFeature._setup_loss() there is this code fragment:

        self.eval_loss_function = SoftmaxCrossEntropyMetric(
            num_classes=self.num_classes,
            feature_loss=self.loss,
            name='eval_loss'
        )

and in CategoryOutputFeature._setup_loss()

        self.metric_functions[LOSS] = self.eval_loss_function

At conclusion of output feature construction, both self.eval_loss_function and self.metric_functions[LOSS] are pointing to the same instance of a Keras Metric object.

So during the training loop, this metric object is called twice, once in the context of calculating the loss for the output feature and a second time to capture the combined loss. During training the combined loss is accumulated in ECD.eval_loss_metric for reporting purposes. This results in the epoch loss calculations using different values for the output feature and combined.

Based on testing I've done, the fix is to change CategoryOutputFeature.eval_loss_function to be a Keras Loss function and not a metric and revise metric specification to CategoryOutputFeture.metric_functions[LOSS] to not reuse CategoryOutputFeature.eval_loss_function.

Further, this reporting issue appears to be present in other output features. Similar changes are needed in the other output features' _setup_loss() and _setup_metrics() methods.

I'll start a PR to address this issue.

@jimthompson5802
Copy link
Collaborator Author

Closed by #1103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant