-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check fill_with_const
has fill_value
for binary features
#3278
Conversation
Have a couple of questions about this PR:
|
Actually, I took a look and I see the problem. It's in this line here: https://github.com/ludwig-ai/ludwig/blob/master/ludwig/data/preprocessing.py#L1393 The But after another look, we seem to have a default value for
and maybe timeseries? It would be great to add a parametrized test for all of these features to validate that this is the case, then update your function to raise a ConfiValidationError if it happens for any of these features. If you're also willing to make the extra effort, it would be super neat to have a meaningful resolution method for each feature type (like what to do for image features, or a vector feature) |
Thanks for the deep dive @arnavgarg1! The only case where we'd encounter the same error ( The |
Ah yeah good call, I didn't factor in the default missing value strategy itself. However, I think this seems to also be true for:
since they also default to
Hmm, it is an interesting proposition, and to be honest, I do think it makes sense for binary feature types to just support one of |
Quite frankly, I'd expand the scope of this aux validation check to make sure there is a non-empty |
I got rid of The error also does not happen with For these features, the default is |
ludwig/config_validation/checks.py
Outdated
@@ -322,6 +322,21 @@ def check_tagger_decoder_requirements(config: "ModelConfig") -> None: # noqa: F | |||
) | |||
|
|||
|
|||
@register_config_check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. No longer needed if FILL_WITH_CONST
is removed as an option.
@@ -17,7 +26,7 @@ class BinaryPreprocessingConfig(BasePreprocessingConfig): | |||
"""BinaryPreprocessingConfig is a dataclass that configures the parameters used for a binary input feature.""" | |||
|
|||
missing_value_strategy: str = schema_utils.StringOptions( | |||
MISSING_VALUE_STRATEGY_OPTIONS + [FILL_WITH_FALSE], | |||
[FILL_WITH_MODE, BFILL, FFILL, DROP_ROW, FILL_WITH_FALSE, FILL_WITH_TRUE], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This will break older models that have missing_value_strategy: FILL_WITH_CONST
. However, 1) the risk seems low since this wasn't the default value and 2) these models would be broken anyway unless they had also specified fill_value
.
6eb18ef
to
be0f00d
Compare
fix for the following failure
RuntimeError: Caught exception during model preprocessing: Invalid missing value strategy fill_with_const
using the following config