-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix for #158 #159
Merged
koaning
merged 13 commits into
koaning:master
from
pim-hoeven:158-string-values-as-group
Jul 25, 2019
Merged
Bugfix for #158 #159
Changes from 9 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
51241c6
Add test with column of strings as groups
pim-hoeven afffefb
Fixed bug in grouped estimator using string column as grouping variable
pim-hoeven 4766672
Merge branch 'master' into 158-string-values-as-group
koaning 17f6b70
Merge branch 'master' into 158-string-values-as-group
pim-hoeven 7103101
Proposed fix for the remarks
pim-hoeven ca9e5fe
Merge branch '158-string-values-as-group' of github.com:pim-hoeven/sc…
pim-hoeven 8f0a918
Merge branch 'master' into 158-string-values-as-group
pim-hoeven 0f6dcb5
Added some comments
pim-hoeven e099b67
Merge branch '158-string-values-as-group' of github.com:pim-hoeven/sc…
pim-hoeven 54abb09
Merge branch 'master' into 158-string-values-as-group
koaning 7a82b98
Changed try-except to isinstance checks
pim-hoeven a027ce6
Merge branch 'master' into 158-string-values-as-group
koaning 49d99b6
Added some more comments
pim-hoeven File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -54,6 +54,52 @@ def __init__(self, estimator, groups, use_fallback=True): | |
self.groups = groups | ||
self.use_fallback = use_fallback | ||
|
||
def __check_group_cols_exist(self, X): | ||
try: | ||
x_cols = set(X.columns) | ||
except AttributeError: | ||
try: | ||
ncols = X.shape[1] | ||
except IndexError: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess it could be. groupby-mean as a model? |
||
ncols = 1 | ||
x_cols = set(range(ncols)) | ||
|
||
# Check whether grouping columns exist | ||
diff = set(as_list(self.groups)) - x_cols | ||
|
||
if len(diff) > 0: | ||
raise KeyError(f'{diff} not in columns of X ({x_cols})') | ||
|
||
@staticmethod | ||
def __check_missing_and_inf(X): | ||
"""Check that all elements of X are non-missing and finite""" | ||
koaning marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if np.any(pd.isnull(X)): | ||
raise ValueError("X has NaN values") | ||
try: # if X cannot be converted to numeric, checking infinites does not make sense | ||
if np.any(np.isinf(X)): | ||
raise ValueError("X has infinite values") | ||
except TypeError: | ||
pass | ||
|
||
def __validate(self, X, y=None): | ||
# Split the model data from the grouping columns | ||
X_data = self.__remove_groups_from_x(X) | ||
|
||
# We want to use __validate in both fit and predict, so y can be None | ||
if y is not None: | ||
koaning marked this conversation as resolved.
Show resolved
Hide resolved
|
||
check_X_y(X_data, y) | ||
else: | ||
check_array(X_data) | ||
|
||
self.__check_missing_and_inf(X) | ||
self.__check_group_cols_exist(X) | ||
|
||
def __remove_groups_from_x(self, X): | ||
try: | ||
return X.drop(columns=self.groups, inplace=False) | ||
except AttributeError: # np.array | ||
return np.delete(X, self.groups, axis=1) | ||
|
||
def fit(self, X, y): | ||
""" | ||
Fit the model using X, y as training data. Will also learn the groups | ||
|
@@ -63,15 +109,17 @@ def fit(self, X, y): | |
:param y: array-like, shape=(n_samples,) training data. | ||
:return: Returns an instance of self. | ||
""" | ||
check_X_y(X, y) | ||
self.__validate(X, y) | ||
|
||
pred_col = 'the-column-that-i-want-to-predict-but-dont-have-the-name-for' | ||
if isinstance(X, np.ndarray): | ||
X = pd.DataFrame(X, columns=[str(_) for _ in range(X.shape[1])]) | ||
X = X.assign(**{pred_col: y}) | ||
|
||
self.group_colnames_ = [str(_) for _ in as_list(self.groups)] | ||
if any([c not in X.columns for c in self.group_colnames_]): | ||
raise ValueError(f"{self.group_colnames_} not in {X.columns}") | ||
raise KeyError(f"{self.group_colnames_} not in {X.columns}") | ||
|
||
self.X_colnames_ = [_ for _ in X.columns if _ not in self.group_colnames_ and _ is not pred_col] | ||
self.fallback_ = None | ||
if self.use_fallback: | ||
|
@@ -93,7 +141,8 @@ def predict(self, X): | |
:param X: array-like, shape=(n_columns, n_samples,) training data. | ||
:return: array, shape=(n_samples,) the predicted data | ||
""" | ||
check_array(X) | ||
self.__validate(X) | ||
|
||
check_is_fitted(self, ['estimators_', 'groups_', 'group_colnames_', | ||
'X_colnames_', 'fallback_']) | ||
if isinstance(X, np.ndarray): | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking the type of
X
is probably better to do viaisinstance
instead waiting for anAttributeError
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that in this case the intent is clearer with an isinstance check.