Feature/groupby2 #58

VaBun · 2022-12-20T18:04:05Z

Groupby features introduced: delta mean, delta median, min, max, std, cat mode. Config files updated, the features are not used by default.

Feature/groupby

dev-rinchin · 2022-12-26T15:52:03Z

@VaBun

You need to add demo14 to https://github.com/sb-ai-lab/LightAutoML/tree/master/examples/README.md
Add short description for PR
May be add this functionality to some tutorial? Existing or new
demo14.py doesn't work, have you tried to run it?
My comment about "verbose" from previous PR was ignored. Please fix it.
Add type hints for all *.py files

Remove file to upload changed version

https://github.com/numpy/numpy/releases/tag/v1.24.0

lightautoml/transformers/utils.py

examples/demo14.py

lightautoml/pipelines/features/base.py

examples/demo13.py

lightautoml/transformers/composite.py

lightautoml/transformers/utils.py

lightautoml/transformers/composite.py

dev-rinchin · 2023-01-12T12:56:43Z

btw, I would like you to add some unit tests for groupby features

github-actions · 2023-03-18T02:22:09Z

Stale pull request message

examples/demo14.py

dev-rinchin · 2023-03-28T14:28:27Z

examples/demo14.py

+
+train, test = train_test_split(data, test_size=2000, random_state=42)
+
+


remove redundant empty lines

dev-rinchin · 2023-03-28T14:32:57Z

examples/demo14.py

+    general_params={"use_algos": [["lgb"]]},
+    gbm_pipeline_params={"use_groupby": True, "groupby_triplets": groupby_triplets},
+)
+_ = automl.fit_predict(train, roles=roles)


automl.fit_predict(train, roles=roles)

examples/demo14.py

Co-authored-by: Rinchin <57899558+dev-rinchin@users.noreply.github.com>

dev-rinchin · 2023-04-21T12:56:48Z

examples/demo14.py

+    gbm_pipeline_params={"use_groupby": True, "groupby_triplets": groupby_triplets},
+)
+automl.fit_predict(train, roles=roles)
+


add comment for feature_scores

dev-rinchin · 2023-04-21T12:57:25Z

examples/demo14.py

+
+# Custom pipeline with groupby features defined by importance
+print("\nTry custom pipeline with groupby features defined by importance:\n")
+


add comment for custom pipeline

dev-rinchin · 2023-04-21T13:06:01Z

examples/demo14.py

+
+
+pipe = LGBAdvancedPipeline(
+    use_groupby=True, pre_selector=selector, groupby_types=["delta_median", "std"], groupby_top_based_on="importance"


dev-rinchin · 2023-04-27T08:29:58Z

lightautoml/automl/presets/tabular_presets.py

@@ -429,7 +429,7 @@ def get_gbms(
        pre_selector: Optional[SelectionPipeline] = None,
    ):

-        gbm_feats = LGBAdvancedPipeline(**self.gbm_pipeline_params)
+        gbm_feats = LGBAdvancedPipeline(**self.gbm_pipeline_params, feats_imp=pre_selector)


add pre_selector to linear_l2_feats init in get_linear

dev-rinchin · 2023-04-27T08:31:23Z

lightautoml/automl/presets/text_presets.py

@@ -348,7 +348,7 @@ def get_gbms(
    ):

        text_gbm_feats = self.get_nlp_pipe(self.gbm_pipeline_params["text_features"])
-        gbm_feats = LGBAdvancedPipeline(output_categories=False, **self.gbm_pipeline_params)
+        gbm_feats = LGBAdvancedPipeline(feats_imp=pre_selector, output_categories=False, **self.gbm_pipeline_params)


do we need eats_imp=pre_selector in get_linear?

dev-rinchin · 2023-04-27T09:28:41Z

lightautoml/transformers/composite.py

dev-rinchin · 2023-04-27T09:30:35Z

lightautoml/transformers/utils.py

dev-rinchin · 2023-04-27T09:33:02Z

lightautoml/transformers/groupby.py

+        group_col: str,
+        numeric_cols: Optional[List[str]] = None,
+        categorical_cols: Optional[List[str]] = None,
+        used_transforms: Optional[List[str]] = None,


dev-rinchin · 2023-04-27T09:42:01Z

lightautoml/transformers/groupby.py

+        self._features = [f"{self._fname_prefix}__{self.group_col}__{t}__{f}" for f, t in self.transformations_list]
+        self._features_mapping = {self._features[i]: k for i, k in enumerate(self.transformations_list)}
+
+        self._group_ids_dict = self._calculate_group_ids(dataset)


rm _group_ids_dict

dev-rinchin · 2023-04-27T09:50:34Z

lightautoml/transformers/groupby.py

+
+    def _set_feature_indices(self):
+        feat_idx = dict()
+        feat_idx[self.group_col] = 0


do not hardcode values

The thing is that we don't have the original feature names in there, because they are affected by different encoders. So that, hardcoding idx is the only way to link them with original names. To fix this we need some general improvements: save original feature names through one-to-one transformations.

Anton and others added 8 commits October 18, 2022 19:42

groupby transformer

791465f

fix in base classes

58908ef

Merge pull request #42 from tony20202021/feature/groupby

4d7f3d3

Feature/groupby

Add groupby features to linear pipeline

52eb953

Added groupby features to LGBAdvancedPipeline

0ad86ec

Params top_group_by_categorical and top_group_by_numerical changed

23dd0f9

Codestyle changes and docstrings added

ec95486

auxiliary files removed

8ecd887

VaBun requested review from DESimakov and dev-rinchin December 20, 2022 18:04

Merge branch 'master' into feature/groupby2

dd14c90

VaBun and others added 4 commits December 27, 2022 14:56

groupby features demo updated

75a8e55

Delete demo13.py

5517731

Remove file to upload changed version

add demo13.py

55de5a7

hotfix for numpy.bool

07fa43f

https://github.com/numpy/numpy/releases/tag/v1.24.0

dev-rinchin reviewed Jan 11, 2023

View reviewed changes

lightautoml/transformers/utils.py Show resolved Hide resolved

examples/demo14.py Outdated Show resolved Hide resolved

Corrected the path to example data

cc5e886

dev-rinchin requested changes Jan 11, 2023

View reviewed changes

lightautoml/pipelines/features/base.py Outdated Show resolved Hide resolved

lightautoml/pipelines/features/base.py Outdated Show resolved Hide resolved

lightautoml/pipelines/features/base.py Outdated Show resolved Hide resolved

examples/demo13.py Outdated Show resolved Hide resolved

dev-rinchin reviewed Jan 11, 2023

View reviewed changes

lightautoml/transformers/composite.py Show resolved Hide resolved

Remove redundant transformations

c8b3e1a