Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH dataframe in/out for all samplers #644

Merged
merged 5 commits into from Nov 17, 2019

Conversation

@glemaitre
Copy link
Member

glemaitre commented Nov 16, 2019

Reference Issue

closes #639

What does this implement/fix? Explain your changes.

Let dataframe out if they got in.

TODO:

  • Add some documentation.

Any other comments?

@lgtm-com

This comment has been minimized.

Copy link

lgtm-com bot commented Nov 16, 2019

This pull request introduces 1 alert when merging e936bdd into 45b538c - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment
@codecov

This comment has been minimized.

Copy link

codecov bot commented Nov 17, 2019

Codecov Report

Merging #644 into master will increase coverage by 0.54%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #644      +/-   ##
=========================================
+ Coverage   97.96%   98.5%   +0.54%     
=========================================
  Files          83      82       -1     
  Lines        4867    4888      +21     
=========================================
+ Hits         4768    4815      +47     
+ Misses         99      73      -26
Impacted Files Coverage Δ
imblearn/over_sampling/_smote.py 97.23% <100%> (ø) ⬆️
imblearn/utils/estimator_checks.py 96.11% <100%> (+7.22%) ⬆️
imblearn/over_sampling/tests/test_smote_nc.py 100% <100%> (ø) ⬆️
...ling/_prototype_selection/_random_under_sampler.py 100% <100%> (ø) ⬆️
imblearn/over_sampling/_random_over_sampler.py 100% <100%> (ø) ⬆️
...rototype_selection/_instance_hardness_threshold.py 100% <0%> (ø) ⬆️
...election/tests/test_instance_hardness_threshold.py 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 45b538c...2f93b40. Read the comment docs.

@chkoar

This comment has been minimized.

Copy link
Member

chkoar commented Nov 17, 2019

I thought that you were against on using pandas. With your implementation you assume that the user has pandas installed. Of course it is passing pandas pandas df. Shouldn't we try/catch in case that pandas in not installed and is passing something that has a loc property?

@glemaitre

This comment has been minimized.

Copy link
Member Author

glemaitre commented Nov 17, 2019

@lgtm-com

This comment has been minimized.

Copy link

lgtm-com bot commented Nov 17, 2019

This pull request introduces 1 alert when merging b1d2d56 into 45b538c - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment
glemaitre added 2 commits Nov 17, 2019
@lgtm-com

This comment has been minimized.

Copy link

lgtm-com bot commented Nov 17, 2019

This pull request introduces 1 alert when merging 9431bb8 into 45b538c - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment
@glemaitre

This comment has been minimized.

Copy link
Member Author

glemaitre commented Nov 17, 2019

@chkoar Would you have a bit of time too look at it.

@lgtm-com

This comment has been minimized.

Copy link

lgtm-com bot commented Nov 17, 2019

This pull request introduces 1 alert when merging 2f93b40 into 153f6e0 - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment
@chkoar

This comment has been minimized.

Copy link
Member

chkoar commented Nov 17, 2019

I will give it a go soon

if self._columns is not None:
import pandas as pd
X_ = pd.DataFrame(output[0], columns=self._columns)
else:
X_ = output[0]

Comment on lines +83 to +88

This comment has been minimized.

Copy link
@chkoar

chkoar Nov 17, 2019

Member

Why not doing this in _check_X_y?

This comment has been minimized.

Copy link
@glemaitre

glemaitre Nov 17, 2019

Author Member

This block required the resampled data so _check_X_y is called before that the resampling happened.

This comment has been minimized.

Copy link
@chkoar

chkoar Nov 17, 2019

Member

Ok. It is like we need to extracted it in an internal transformation method. But I am ok as is.

@chkoar

This comment has been minimized.

Copy link
Member

chkoar commented Nov 17, 2019

Is there a case where something that will pass, it will have .columns, .loc and will not be a DataFrame? Probably is a corner case. In that case we will have an ugly error. Is there a need to test/catch this? If no then I think I am ok.

@glemaitre

This comment has been minimized.

Copy link
Member Author

glemaitre commented Nov 17, 2019

Is there a case where something that will pass, it will have .columns, .loc and will not be a DataFrame? Probably is a corner case. In that case we will have an ugly error. Is there a need to test/catch this? If no then I think I am ok.

We always ducktype like this in scikit-learn for dataframe. So if it passes in imblearn it will break in scikit-learn just with the _safe_indexing.

@chkoar

This comment has been minimized.

Copy link
Member

chkoar commented Nov 17, 2019

Go ahead!

@glemaitre glemaitre merged commit 158258e into scikit-learn-contrib:master Nov 17, 2019
7 checks passed
7 checks passed
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python 1 new alert
Details
ci/circleci: python3 Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 97.96%)
Details
codecov/project 98.5% (+0.54%) compared to 45b538c
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.