Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying to iloc of pandas data frame #500

Merged
merged 11 commits into from
Feb 13, 2019
Merged

Applying to iloc of pandas data frame #500

merged 11 commits into from
Feb 13, 2019

Conversation

tetrar124
Copy link
Contributor

@tetrar124 tetrar124 commented Feb 10, 2019

Description

Bug fix for selecting columns and add new features for the name of the columns of pandas data frame.

Related issues or pull requests

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
  • Ran nosetests ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., nosetests ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
  • Checked for style issues by running flake8 ./mlxtend

rasbt#499
Bug fix for selecting columns from the list and it works not only a list but the index of pandas.
@pep8speaks
Copy link

pep8speaks commented Feb 10, 2019

Hello @tetrar124! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 13, 2019 at 07:57 Hours UTC

add error message and comment for using it
add error message and comment for using it
@tetrar124 tetrar124 changed the title bug fix and applying to the index of pandas bug fix and applying to the name of the columns of pandas data frame Feb 10, 2019
@rasbt
Copy link
Owner

rasbt commented Feb 10, 2019

Thanks for the PR. However, instead doing a try-except, I think the following could solve the problem better? (Haven't tested it, but what do you think? Does that cover your use case issue?)

        if hasattr(X, 'loc') or hasattr(X, 'iloc'):
            types = {type(i) for i in self.cols}
            if len(types) > 1:
                raise ValueError('Elements in `cols` should be all of the same data type.')
            if isinstance(cols[0], int):
                t = X.iloc[:, self.cols].values
            elif isinstance(cols[0], str):
                t = X.loc[:, self.cols].values
           else:
               raise ValueError('Elements in `cols` should be either `int` or `str`.')

@tetrar124
Copy link
Contributor Author

tetrar124 commented Feb 11, 2019

I checked your code on my environment, your code works well (I changed only cols[0] to self.cols[0]).
And I added 2 lines in this code for using a tuple with iloc because of using a tuple in the user's guide.
http://rasbt.github.io/mlxtend/user_guide/feature_selection/ColumnSelector/

        if hasattr(X, 'loc') or hasattr(X, 'iloc'):
            if type(self.cols) == tuple:
               self.cols = list(self.cols)
            types = {type(i) for i in self.cols}
            if len(types) > 1:
                raise ValueError('Elements in `cols` should be all of the same data type.')
            if isinstance(self.cols[0], int):
                t = X.iloc[:, self.cols].values
            elif isinstance(self.cols[0], str):
                t = X.loc[:, self.cols].values
            else:
               raise ValueError('Elements in `cols` should be either `int` or `str`.')

Thank you for your cooperation.

Update for pandas iloc
@tetrar124 tetrar124 changed the title bug fix and applying to the name of the columns of pandas data frame Applying to iloc of pandas data frame Feb 12, 2019
Update for pandas iloc
@rasbt
Copy link
Owner

rasbt commented Feb 12, 2019

Thanks for working on this PR! Just as a tip to make the debugging smoother/faster (since the CI is very slow), you can also test your code locally by running

nosetests mlxtend/feature_selection

to see if there are any issues with the code.

comment
@coveralls
Copy link

coveralls commented Feb 12, 2019

Coverage Status

Coverage decreased (-0.009%) to 91.587% when pulling 09169ef on tetrar124:patch-4 into 6b57a73 on rasbt:master.

for coverage
mlxtend/feature_selection/column_selector.py Outdated Show resolved Hide resolved
elif to if
@tetrar124
Copy link
Contributor Author

Sorry, I return elif to if. The reason for using elif is only for the coverage. I checked new lines with nose and flask8, and there are no errors other than 'line too long' from the new comment lines for ValueError.

@rasbt
Copy link
Owner

rasbt commented Feb 12, 2019

Ok thanks :)! Btw don't worry about minor changes regarding the coverage too much. Can you please make the line < 79 chars though (splitting it over two lines) -- we decided to strictly enforce PEP8 at some point so that the library won't become too messy over time by making exceptions.

@rasbt
Copy link
Owner

rasbt commented Feb 12, 2019

Could you also add an additional unit test for using a DataFrame with integer column names to make sure it works correctly? (I think we don't have that one yet) :)

only new lines in the parenthesis
@tetrar124
Copy link
Contributor Author

tetrar124 commented Feb 12, 2019

I made the data frame with columns whose names were only integer values, and I checked the type of these columns names showed int.
In this situation, there was no bug but this code has worked only as the mode of iloc.
(I think it was correct work by this code)

@rasbt
Copy link
Owner

rasbt commented Feb 13, 2019

Thanks for testing. We should add a unit test though so that we can ensure that the behavior doesn't break in future.

I just added a unit test and changelog entry to this PR. Let me know if you have any further comments. Otherwise, I think it's good to merge if/when the CI tests pass. Thanks for the PR!

@rasbt rasbt merged commit 1ca3059 into rasbt:master Feb 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants