Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/groupby2 #58

Merged
merged 23 commits into from
Jul 26, 2023
Merged

Feature/groupby2 #58

merged 23 commits into from
Jul 26, 2023

Conversation

VaBun
Copy link
Collaborator

@VaBun VaBun commented Dec 20, 2022

Groupby features introduced: delta mean, delta median, min, max, std, cat mode. Config files updated, the features are not used by default.

@dev-rinchin
Copy link
Collaborator

dev-rinchin commented Dec 26, 2022

@VaBun

  1. You need to add demo14 to https://github.com/sb-ai-lab/LightAutoML/tree/master/examples/README.md
  2. Add short description for PR
  3. May be add this functionality to some tutorial? Existing or new
  4. demo14.py doesn't work, have you tried to run it?
  5. My comment about "verbose" from previous PR was ignored. Please fix it.
  6. Add type hints for all *.py files

lightautoml/transformers/utils.py Show resolved Hide resolved
examples/demo14.py Outdated Show resolved Hide resolved
lightautoml/pipelines/features/base.py Outdated Show resolved Hide resolved
lightautoml/pipelines/features/base.py Outdated Show resolved Hide resolved
lightautoml/pipelines/features/base.py Outdated Show resolved Hide resolved
examples/demo13.py Outdated Show resolved Hide resolved
@dev-rinchin
Copy link
Collaborator

btw, I would like you to add some unit tests for groupby features

@github-actions
Copy link

Stale pull request message

@VaBun VaBun requested a review from dev-rinchin March 28, 2023 13:10
examples/demo14.py Outdated Show resolved Hide resolved

train, test = train_test_split(data, test_size=2000, random_state=42)


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove redundant empty lines

general_params={"use_algos": [["lgb"]]},
gbm_pipeline_params={"use_groupby": True, "groupby_triplets": groupby_triplets},
)
_ = automl.fit_predict(train, roles=roles)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

automl.fit_predict(train, roles=roles)

examples/demo14.py Outdated Show resolved Hide resolved
VaBun and others added 3 commits April 13, 2023 21:59
Co-authored-by: Rinchin <57899558+dev-rinchin@users.noreply.github.com>
gbm_pipeline_params={"use_groupby": True, "groupby_triplets": groupby_triplets},
)
automl.fit_predict(train, roles=roles)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment for feature_scores


# Custom pipeline with groupby features defined by importance
print("\nTry custom pipeline with groupby features defined by importance:\n")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment for custom pipeline



pipe = LGBAdvancedPipeline(
use_groupby=True, pre_selector=selector, groupby_types=["delta_median", "std"], groupby_top_based_on="importance"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feats_imp

@@ -429,7 +429,7 @@ def get_gbms(
pre_selector: Optional[SelectionPipeline] = None,
):

gbm_feats = LGBAdvancedPipeline(**self.gbm_pipeline_params)
gbm_feats = LGBAdvancedPipeline(**self.gbm_pipeline_params, feats_imp=pre_selector)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add pre_selector to linear_l2_feats init in get_linear

@@ -348,7 +348,7 @@ def get_gbms(
):

text_gbm_feats = self.get_nlp_pipe(self.gbm_pipeline_params["text_features"])
gbm_feats = LGBAdvancedPipeline(output_categories=False, **self.gbm_pipeline_params)
gbm_feats = LGBAdvancedPipeline(feats_imp=pre_selector, output_categories=False, **self.gbm_pipeline_params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need eats_imp=pre_selector in get_linear?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm file

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm file

group_col: str,
numeric_cols: Optional[List[str]] = None,
categorical_cols: Optional[List[str]] = None,
used_transforms: Optional[List[str]] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename

self._features = [f"{self._fname_prefix}__{self.group_col}__{t}__{f}" for f, t in self.transformations_list]
self._features_mapping = {self._features[i]: k for i, k in enumerate(self.transformations_list)}

self._group_ids_dict = self._calculate_group_ids(dataset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm _group_ids_dict


def _set_feature_indices(self):
feat_idx = dict()
feat_idx[self.group_col] = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not hardcode values

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that we don't have the original feature names in there, because they are affected by different encoders. So that, hardcoding idx is the only way to link them with original names. To fix this we need some general improvements: save original feature names through one-to-one transformations.

@dev-rinchin dev-rinchin merged commit a9a72e2 into master Jul 26, 2023
4 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants