Skip to content

Commit

Permalink
Applying to iloc of pandas data frame (#500)
Browse files Browse the repository at this point in the history
* bug fix and applying to the index of pandas

#499
Bug fix for selecting columns from the list and it works not only a list but the index of pandas.

* add error message and  comment

add error message and comment for using it

* add error message and comment

add error message and comment for using it

* Update for pandas iloc

Update for pandas iloc

* Update for pandas iloc

Update for pandas iloc

* comment

comment

* if to elif

for coverage

* elif to if 

elif to if

* For PEP8

only new lines in the parenthesis

* add unit test and changelog note
  • Loading branch information
tetrar124 authored and rasbt committed Feb 13, 2019
1 parent 6b57a73 commit 1ca3059
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 5 deletions.
2 changes: 1 addition & 1 deletion docs/sources/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ The CHANGELOG for the current development version is available at

##### Bug Fixes

- The `feature_selection.ColumnSelector` now also supports column names of type `int` (in addition to `str` names) if the input is a pandas DataFrame. ([#500](https://github.com/rasbt/mlxtend/pull/500) via [tetrar124](https://github.com/tetrar124)
- Fix unreadable labels in `plot_confusion_matrix` for imbalanced datasets if `show_absolute=True` and `show_normed=True`. ([#504](https://github.com/rasbt/mlxtend/pull/504))

- Raises a more informative error if a `SparseDataFrame` is passed to `apriori` and the dataframe has integer column names that don't start with `0` due to current limitations of the `SparseDataFrame` implementation in pandas. ([#503](https://github.com/rasbt/mlxtend/pull/503))

### Version 0.15.0 (01-19-2019)
Expand Down
23 changes: 19 additions & 4 deletions mlxtend/feature_selection/column_selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ class ColumnSelector(BaseEstimator):
----------
cols : array-like (default: None)
A list specifying the feature indices to be selected. For example,
[1, 4, 5] to select the 2nd, 5th, and 6th feature columns.
[1, 4, 5] to select the 2nd, 5th, and 6th feature columns, and
['A','C','D'] to select the name of feature columns A, C and D.
If None, returns all columns in the array.
drop_axis : bool (default=False)
Expand Down Expand Up @@ -75,9 +76,23 @@ def transform(self, X, y=None):
"""

# We use the loc accessor if the input is a pandas dataframe
if hasattr(X, 'loc'):
t = X.loc[:, self.cols].values
# We use the loc or iloc accessor if the input is a pandas dataframe
if hasattr(X, 'loc') or hasattr(X, 'iloc'):
if type(self.cols) == tuple:
self.cols = list(self.cols)
types = {type(i) for i in self.cols}
if len(types) > 1:
raise ValueError(
'Elements in `cols` should be all of the same data type.'
)
if isinstance(self.cols[0], int):
t = X.iloc[:, self.cols].values
elif isinstance(self.cols[0], str):
t = X.loc[:, self.cols].values
else:
raise ValueError(
'Elements in `cols` should be either `int` or `str`.'
)
else:
t = X[:, self.cols]

Expand Down
12 changes: 12 additions & 0 deletions mlxtend/feature_selection/tests/test_column_selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,18 @@ def test_ColumnSelector_with_dataframe():
assert df_out.shape == (506, 2)


def test_ColumnSelector_with_dataframe_and_int_columns():
boston = datasets.load_boston()
df_in = pd.DataFrame(boston.data, columns=boston.feature_names)
df_out_str = ColumnSelector(cols=('INDUS', 'CHAS')).transform(df_in)
df_out_int = ColumnSelector(cols=(2, 3)).transform(df_in)

np.testing.assert_array_equal(df_out_str[:, 0],
df_out_int[:, 0])
np.testing.assert_array_equal(df_out_str[:, 1],
df_out_int[:, 1])


def test_ColumnSelector_with_dataframe_drop_axis():
boston = datasets.load_boston()
df_in = pd.DataFrame(boston.data, columns=boston.feature_names)
Expand Down

0 comments on commit 1ca3059

Please sign in to comment.