Bugfix for #158 #159

pim-hoeven · 2019-07-03T08:41:46Z

Fixed the bug caused when using the grouped_estimator with a string column as grouping variable.

Solution: Try the checks without adjustments, if that fails: remove the grouping column from the array or dataframe.

koaning · 2019-07-03T09:36:49Z

first of all; thanks for picking this up!
your work is much appreciated. ask matthijs for a reward (this project now has pretty stickers)
i wonder ... when would we want to have the column in the subset? it should be a constant value. could it not be removed in all cases?

pim-hoeven · 2019-07-03T10:49:20Z

My pleasure!
@MBrouns can I get a sticker?
Always removing the grouping column was my first approach, but then some other tests failed, for example in checking for check_estimators_nan_inf and check_fit2d_predict1d (cannot drop a column in that case). Therefore I adopted this EAFP approach. Do you agree?

koaning · 2019-07-03T11:33:49Z

ah yes. those are tricky.

i see two paths going forward and maybe a check with @MBrouns might be good.

We can make an estimator that contains a GroupEstimator in a sklearn.pipeline.Pipeline. The idea is to have a preprocessing step that adds the column that we want to remove. This might cause the tests to pass ... this option is OK, but a bit meh.
Alternatively we might also ignore those two tests in particular. Especially for meta estimators we cannot have the same rules as for a base estimator in sklearn. I would be fine with dropping those two tests in favour of, what I feel, is better behaviour for the model that is being implemented. This option feels best for me at the moment.

The sklearn tests are guidelines that are easy to automate but they aren't devine gospel. @pim-hoeven feel free to question my thoughts here if you have a better alternative.

pim-hoeven · 2019-07-03T12:22:24Z

I think the second approach makes more sense for two reasons:

It's easier to implement and keeps the GroupEstimator cleaner and less complicated
Especially the check_fit2d_predict1d makes no sense in this context: if we don't have a grouping column then why use a GroupEstimator?

Regarding the check_estimators_nan_inf we might want to build in checks on the grouping columns as well, specifically that it does not contain missing values.

So, (@MBrouns) my proposed solution would be:

Include a __check_group_columns in the estimator that checks whether the grouping column(s) exist and are valid
Do the check_X_y on the non-grouping columns
Remove the check_fit2d_predict1d

Do you agree?

koaning · 2019-07-03T19:26:51Z

Sounds great to me.

…ikit-lego into 158-string-values-as-group

sklego/meta.py

koaning · 2019-07-11T15:45:33Z

sklego/meta.py

+    def __validate(self, X, y=None):
+        try:
+            X_data = X.drop(columns=self.groups, inplace=False)
+        except AttributeError:  # np.array


it feels cleaner to test this using isinstance(X, pd.DataFrame) would you not agree?

sklego/meta.py

tests/test_meta/test_grouped_model.py

koaning · 2019-07-11T15:49:42Z

sklego/meta.py

@@ -54,6 +54,47 @@ def __init__(self, estimator, groups, use_fallback=True):
        self.groups = groups
        self.use_fallback = use_fallback

+    def __check_group_cols_exist(self, X):
+        try:


checking the type of X is probably better to do via isinstance instead waiting for an AttributeError.

I agree that in this case the intent is clearer with an isinstance check.

koaning · 2019-07-11T15:52:15Z

Great work! This is an improvement.
I did have some comments on this. Mainly around the checking if something is a pd.DataFrame object or not. My view suggests that asserting this via isinstance is a better habbit. Feel free to challenge me on this.
There are some places where adding comments makes it easier to understand why a certain decision is made.

pim-hoeven · 2019-07-12T07:48:25Z

Thanks, and thank you for the comments!
I do not have a strong view on this personally. I used to do these checks using isinstance, but read somewhere that the try-except approach is preferred (because, for example, we might in the feature want to pass something that behaves like a DataFrame but does not necessarily pass the isinstance check). See for example the answers to this so post. Still, I'm not very strongly opinionated on this point (and agree that isinstance is more readable), so if this argument does not convince you than I'll gladly change the try-excepts to isinstance checks.
Agreed, I will add some comments

…ikit-lego into 158-string-values-as-group

MBrouns · 2019-07-12T08:11:34Z

sklego/meta.py

@@ -54,6 +54,47 @@ def __init__(self, estimator, groups, use_fallback=True):
        self.groups = groups
        self.use_fallback = use_fallback

+    def __check_group_cols_exist(self, X):
+        try:


I agree that in this case the intent is clearer with an isinstance check.

MBrouns · 2019-07-12T08:12:03Z

sklego/meta.py

+        except AttributeError:
+            try:
+                ncols = X.shape[1]
+            except IndexError:


is X ever only 1d?

I guess it could be. groupby-mean as a model?

sklego/meta.py

pim-hoeven · 2019-07-23T09:12:35Z

Apologies for the delay, I made the requested changes

koaning · 2019-07-23T13:07:42Z

No worries about the delay, it is all volunteer work.

koaning · 2019-07-23T13:10:02Z

There is something interesting happening in the ci pipeline though. It runs fine locally?

________ ERROR collecting tests/test_meta/test_estimatortransformer.py _________
ImportError while importing test module '/home/travis/build/koaning/scikit-lego/tests/test_meta/test_estimatortransformer.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_meta/test_estimatortransformer.py:2: in <module>
    from pandas.tests.extension.numpy_.test_numpy_nested import np
E   ModuleNotFoundError: No module named 'pandas.tests.extension.numpy_'
___________ ERROR collecting tests/test_meta/test_outlier_remover.py ___________
ImportError while importing test module '/home/travis/build/koaning/scikit-lego/tests/test_meta/test_outlier_remover.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_meta/test_outlier_remover.py:2: in <module>
    from pandas.tests.extension.numpy_.test_numpy_nested import np
E   ModuleNotFoundError: No module named 'pandas.tests.extension.numpy_'

koaning · 2019-07-23T13:10:48Z

urgh. it seems unrelated to your work, i think pandas has a new version or something.

koaning · 2019-07-23T13:11:52Z

yep. it's pandas.

pim-hoeven · 2019-07-23T13:17:15Z

No I didn't have these locally, but only ran the relevant tests. Is the pandas issue something I can / should change?

koaning · 2019-07-23T17:50:55Z

i haven't looked at your code yet but i'll prolly make a fix tomorrow/day after. once that is merged to master we can try again. :)

koaning · 2019-07-23T17:53:07Z

@pim-hoeven do check the conversations. it feels like a comment here and there might still be missing.

koaning · 2019-07-25T07:10:58Z

I am fixing the failing tests here; #167

koaning · 2019-07-25T07:11:58Z

Also, just to check @pim-hoeven, you've got a sticker right?

pim-hoeven · 2019-07-25T08:45:30Z

Added some more comments to code
Added sticker to laptop
Added GroupedEstimator to my project at the client

pim-hoeven added 2 commits July 3, 2019 10:38

Add test with column of strings as groups

51241c6

Fixed bug in grouped estimator using string column as grouping variable

afffefb

pim-hoeven mentioned this pull request Jul 3, 2019

[BUG] Grouped Estimator does not accept string values in group. #158

Closed

Merge branch 'master' into 158-string-values-as-group

4766672

Merge branch 'master' into 158-string-values-as-group

17f6b70

pim-hoeven and others added 3 commits July 11, 2019 16:57

Proposed fix for the remarks

7103101

Merge branch '158-string-values-as-group' of github.com:pim-hoeven/sc…

ca9e5fe

…ikit-lego into 158-string-values-as-group

Merge branch 'master' into 158-string-values-as-group

8f0a918

koaning requested changes Jul 11, 2019

View reviewed changes

pim-hoeven and others added 3 commits July 12, 2019 09:56

Added some comments

0f6dcb5

Merge branch '158-string-values-as-group' of github.com:pim-hoeven/sc…

e099b67

…ikit-lego into 158-string-values-as-group

Merge branch 'master' into 158-string-values-as-group

54abb09

MBrouns reviewed Jul 12, 2019

View reviewed changes

Changed try-except to isinstance checks

7a82b98

koaning and others added 2 commits July 25, 2019 07:34

Merge branch 'master' into 158-string-values-as-group

a027ce6

Added some more comments

49d99b6

koaning approved these changes Jul 25, 2019

View reviewed changes

koaning merged commit aaaf485 into koaning:master Jul 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix for #158 #159

Bugfix for #158 #159

pim-hoeven commented Jul 3, 2019

koaning commented Jul 3, 2019 •

edited

pim-hoeven commented Jul 3, 2019 •

edited

koaning commented Jul 3, 2019

pim-hoeven commented Jul 3, 2019

koaning commented Jul 3, 2019

koaning Jul 11, 2019

koaning Jul 11, 2019

MBrouns Jul 12, 2019

koaning commented Jul 11, 2019

pim-hoeven commented Jul 12, 2019

MBrouns Jul 12, 2019

MBrouns Jul 12, 2019

koaning Jul 12, 2019

pim-hoeven commented Jul 23, 2019

koaning commented Jul 23, 2019

koaning commented Jul 23, 2019

koaning commented Jul 23, 2019

koaning commented Jul 23, 2019

pim-hoeven commented Jul 23, 2019

koaning commented Jul 23, 2019

koaning commented Jul 23, 2019

koaning commented Jul 25, 2019

koaning commented Jul 25, 2019

pim-hoeven commented Jul 25, 2019 •

edited

Bugfix for #158 #159

Bugfix for #158 #159

Conversation

pim-hoeven commented Jul 3, 2019

koaning commented Jul 3, 2019 • edited

pim-hoeven commented Jul 3, 2019 • edited

koaning commented Jul 3, 2019

pim-hoeven commented Jul 3, 2019

koaning commented Jul 3, 2019

koaning Jul 11, 2019

Choose a reason for hiding this comment

koaning Jul 11, 2019

Choose a reason for hiding this comment

MBrouns Jul 12, 2019

Choose a reason for hiding this comment

koaning commented Jul 11, 2019

pim-hoeven commented Jul 12, 2019

MBrouns Jul 12, 2019

Choose a reason for hiding this comment

MBrouns Jul 12, 2019

Choose a reason for hiding this comment

koaning Jul 12, 2019

Choose a reason for hiding this comment

pim-hoeven commented Jul 23, 2019

koaning commented Jul 23, 2019

koaning commented Jul 23, 2019

koaning commented Jul 23, 2019

koaning commented Jul 23, 2019

pim-hoeven commented Jul 23, 2019

koaning commented Jul 23, 2019

koaning commented Jul 23, 2019

koaning commented Jul 25, 2019

koaning commented Jul 25, 2019

pim-hoeven commented Jul 25, 2019 • edited

koaning commented Jul 3, 2019 •

edited

pim-hoeven commented Jul 3, 2019 •

edited

pim-hoeven commented Jul 25, 2019 •

edited