Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The 'group' parameter cannot be applied in the roles of the fit_predict function in TabularAutoML. #153

Closed
4 tasks
yuuniee opened this issue Jun 17, 2024 · 2 comments
Labels
bug Something isn't working
Milestone

Comments

@yuuniee
Copy link

yuuniee commented Jun 17, 2024

🐛 Bug

  • The 'group' parameter cannot be applied in the roles of the fit_predict function in TabularAutoML.
  • As a result of backtracking the code, I found that the 'group' item was not included in the CV split process.

To Reproduce

Steps to reproduce the behavior:

  1. Run -> TabularAutoML().fit_predict(..., roles={'target': TARGET_NAME, 'group': GROUP_NAME})
  2. "lightautoml > reader > utils.py > set_sklearn_folds()" <- Here, check if the 'group: Optional[np.ndarray]' variable has a value.

Expected behavior

  • The CV split method changes depending on the 'group' designation above.

Additional context

  • This was resolved by adding the following code to line 322 of "lightautoml > reader > base.py".
  • "kwargs["group"] = train_data.loc[:, roles["group"]]"

The issue was found in version "0.3.8.1" and is unknown for other versions.

I am always grateful for your open source contributions.
Thank you.

Checklist

  • bug description
  • steps to reproduce
  • expected behavior
  • code sample / screenshots
@yuuniee yuuniee added the bug Something isn't working label Jun 17, 2024
@alexmryzhkov
Copy link
Collaborator

Hi @yuuniee,

Thanks for the detailed description of the problem with the models validation. We will for sure take a look into it.

Alex

@dev-rinchin dev-rinchin added this to the Release 0.4.0 milestone Jul 23, 2024
@screengreen
Copy link
Contributor

screengreen commented Jul 23, 2024

Hi @yuuniee,

We checked correctness of CV split with the 'group' parameter. We tried several most popular task for it ['binary', 'multiclass', 'reg', 'multilevel'] and didn't find any issues.

what you tried to add "kwargs["group"] = train_data.loc[:, roles["group"]]" is implemented here:

kwargs[attrs_dict[r.name]] = train_data.loc[:, feat]

if you want to examine it yourself, you can check the attached .ipynb file, that shows the 'group' parameter effect.
Tutorial_1_basics_check_group_parameter.ipynb.zip

if it doesn't solve your problem, please provide additional information

Andrew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants