-
Notifications
You must be signed in to change notification settings - Fork 57
[ODSC-76829/76830] : improve auto-select logic and handle missing data #1259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
250585d
to
a57b872
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments. Just needs a bit of polish
if target_col not in data.columns: | ||
raise ValueError(f"Target column '{target_col}' not found in DataFrame") | ||
|
||
data[target_col] = data[target_col].fillna(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why fillna with 0? why no backfill? Did we discuss this?
Don't we already have this covered in pre-processing steps? What are we gaining from this?
): | ||
|
||
operator_config.spec.model = AUTO_SELECT | ||
model = ForecastOperatorModelFactory.get_model(operator_config, datasets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
Can we reflect this in the report? Make sure it's still saying "auto-select-series".
Can we add a unit test for this?
Improve auto-select logic and handle missing data
This commit introduces following to the forecasting operator.
Improved
AUTO_SELECT_SERIES
Logic:AUTO_SELECT
model has been implemented for cases whereAUTO_SELECT_SERIES
is used without specifyingtarget_category_columns
.Missing Data Handling:
build_fforms_meta_features
function now fills missing values in the target column with zeros. This prevents errors during meta-feature calculation when the data contains NaNs.New Test Case:
auto-select-series
model functions correctly with datasets containing missing values.