You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
For a single output feature model, the per epoch training/validation/test output feature loss value does not equal the combined training/validation/test loss values
To Reproduce
Steps to reproduce the behavior:
Run examples/mnist/simple_model_training.py with logging_level=logging.INFO
See following extract of log file. The output feature label's loss values are not equal to the combined loss value for every epoch. There are 'relatively' close but not equal.
Expected behavior
With only a single output feature, I was expecting the combined loss value to equal the single output feature's loss value per epoch
Screenshots
See log extract above
Environment (please complete the following information):
OS: MacOS 10.15.7, running Docker for Mac 3.04
Python version 3.6.9
Ludwig version 0.3.2. 0.3.3, 0.4-dev0
Additional context
From what I can tell this is solely reporting issue and does not affect a model's ability for convergence.
I have possible root cause for the reporting issue. In the context of the categorical feature, the CategoryOutputFeature._setup_loss() and CategoryOutputFeature._setup_metrics() methods may have an unintended interaction. In CategoryOutputFeature._setup_loss() there is this code fragment:
At conclusion of output feature construction, both self.eval_loss_function and self.metric_functions[LOSS] are pointing to the same instance of a Keras Metric object.
So during the training loop, this metric object is called twice, once in the context of calculating the loss for the output feature and a second time to capture the combined loss. During training the combined loss is accumulated in ECD.eval_loss_metric for reporting purposes. This results in the epoch loss calculations using different values for the output feature and combined.
Based on testing I've done, the fix is to change CategoryOutputFeature.eval_loss_function to be a Keras Loss function and not a metric and revise metric specification to CategoryOutputFeture.metric_functions[LOSS] to not reuse CategoryOutputFeature.eval_loss_function.
Further, this reporting issue appears to be present in other output features. Similar changes are needed in the other output features' _setup_loss() and _setup_metrics() methods.
I'll start a PR to address this issue.
The text was updated successfully, but these errors were encountered:
Describe the bug
For a single output feature model, the per epoch training/validation/test output feature loss value does not equal the
combined
training/validation/test loss valuesTo Reproduce
Steps to reproduce the behavior:
examples/mnist/simple_model_training.py
withlogging_level=logging.INFO
label
's loss values are not equal to thecombined
loss value for every epoch. There are 'relatively' close but not equal.Expected behavior
With only a single output feature, I was expecting the
combined
loss value to equal the single output feature's loss value per epochScreenshots
See log extract above
Environment (please complete the following information):
Additional context
From what I can tell this is solely reporting issue and does not affect a model's ability for convergence.
I have possible root cause for the reporting issue. In the context of the categorical feature, the
CategoryOutputFeature._setup_loss()
andCategoryOutputFeature._setup_metrics()
methods may have an unintended interaction. InCategoryOutputFeature._setup_loss()
there is this code fragment:and in
CategoryOutputFeature._setup_loss()
At conclusion of output feature construction, both
self.eval_loss_function
andself.metric_functions[LOSS]
are pointing to the same instance of a Keras Metric object.So during the training loop, this metric object is called twice, once in the context of calculating the loss for the output feature and a second time to capture the
combined
loss. During training thecombined
loss is accumulated inECD.eval_loss_metric
for reporting purposes. This results in the epoch loss calculations using different values for the output feature andcombined
.Based on testing I've done, the fix is to change
CategoryOutputFeature.eval_loss_function
to be a Keras Loss function and not a metric and revise metric specification toCategoryOutputFeture.metric_functions[LOSS]
to not reuseCategoryOutputFeature.eval_loss_function
.Further, this reporting issue appears to be present in other output features. Similar changes are needed in the other output features'
_setup_loss()
and_setup_metrics()
methods.I'll start a PR to address this issue.
The text was updated successfully, but these errors were encountered: