added AllKNN under-sampling method #97

dvro · 2016-07-14T16:41:09Z

Added AllKNN under-sampling method
- imblearn/under_sampling/edited_nearest_neighbours.py.
changed added AllKNN in under_sampling/init.py for import purposes;
added a plot example comparing ENN, RENN and AllKNN
- examples/under-sampling/plot_allknn.py

python plot_allknn.py

ENN
Reduced 5.64%
RENN
Reduced 8.36%
AllKNN
Reduced 6.94%

glemaitre · 2016-07-15T11:29:11Z

@dvro Could you also write the test? You can check the other file to check how it is done for the moment.

glemaitre · 2016-07-15T12:03:34Z

You should also avoid to merge and use rebase I think. The merge will be on our side.

dvro · 2016-07-15T12:05:25Z

Ok, I'll do that today!

dvro · 2016-07-16T23:01:39Z

@glemaitre I believe it is fine now; let me know if it still needs any improvement!

glemaitre · 2016-07-17T20:45:53Z

Code wise it is fine. Just a last addition: can you add to the changelog in doc/whats_new.rst

dvro · 2016-07-17T23:38:07Z

@glemaitre done

updated README.rst, doc/whats_new.rst, doc/todo.rst
changed to commit name to a simple "AllKNN under-sampling technique"

glemaitre · 2016-07-18T07:48:06Z

I got an issue after merging on master with the testing on my repo. The table were mismatching.
Could you investigate. I put the inside a branch, you can check Travis

dvro · 2016-07-18T14:20:14Z

That's weird ... everything seems to be fine on my end ...

Doctest: imblearn.pipeline.Pipeline ... ok Doctest: imblearn.under_sampling.cluster_centroids.ClusterCentroids ... ok Doctest: imblearn.under_sampling.condensed_nearest_neighbour.CondensedNearestNeighbour ... ok Doctest: imblearn.under_sampling.edited_nearest_neighbours.AllKNN ... ok Doctest: imblearn.under_sampling.edited_nearest_neighbours.EditedNearestNeighbours ... ok Doctest: imblearn.under_sampling.edited_nearest_neighbours.RepeatedEditedNearestNeighbours ... ok Doctest: imblearn.under_sampling.instance_hardness_threshold.InstanceHardnessThreshold ... ok Doctest: imblearn.under_sampling.nearmiss.NearMiss ... ok Doctest: imblearn.under_sampling.neighbourhood_cleaning_rule.NeighbourhoodCleaningRule ... ok Doctest: imblearn.under_sampling.one_sided_selection.OneSidedSelection ... ok Doctest: imblearn.under_sampling.random_under_sampler.RandomUnderSampler ... ok Doctest: imblearn.under_sampling.tomek_links.TomekLinks ... ok

I'll look into it!

glemaitre · 2016-07-19T23:08:02Z

imblearn/under_sampling/tests/test_allknn.py

+    currdir = os.path.dirname(os.path.abspath(__file__))
+    X_gt = np.load(os.path.join(currdir, 'data', 'allknn_x.npy'))
+    y_gt = np.load(os.path.join(currdir, 'data', 'allknn_y.npy'))
+    assert_array_equal(X_resampled, X_gt)


@dvro Try to use assert_array_almost_equal to see if this is not a problem of rounding.

dvro · 2016-07-20T16:35:03Z

@glemaitre check out this weirdest thing:
On my env, the test_edited_nearest_neighbors fail:

(clean-env) dayvid:tests$ nosetests test_edited_nearest_neighbours.py
.....FFF.
======================================================================
FAIL: Test the fit sample routine
----------------------------------------------------------------------

and the test_allknn.py passes:

(clean-env) dayvid:tests$ nosetests test_allknn.py 
.........
----------------------------------------------------------------------
Ran 9 tests in 6.501s

OK

I created a clean-env using the same python version, numpy/scipy version, but the problem persists, specifically Arrays are not equal (mismatch 2.82770406112%)

do you think you could generate the allknn_*.npy files on your machine? and pull request them to my allknn repo?

Thanks,

glemaitre · 2016-07-20T17:11:29Z

@dvro I am doing that.

glemaitre · 2016-07-20T17:49:14Z

@dvro done

Add data

coveralls · 2016-07-20T18:06:53Z

Coverage increased (+0.02%) to 99.325% when pulling 4c8954c on dvro:allknn into e2f6f25 on scikit-learn-contrib:master.

glemaitre · 2016-07-20T18:08:58Z

ok so that's pretty weird, which OS are you using?

glemaitre · 2016-07-20T22:31:37Z

imblearn/under_sampling/edited_nearest_neighbours.py

+    >>> from collections import Counter
+    >>> from sklearn.datasets import fetch_mldata
+    >>> from imblearn.under_sampling import AllKNN
+    >>> pima = fetch_mldata('diabetes_scale')


Can you change the example to avoid fetching data. It can be a problem at testing.

coveralls · 2016-07-21T03:11:55Z

Coverage increased (+0.02%) to 99.325% when pulling be2d864 on dvro:allknn into e2f6f25 on scikit-learn-contrib:master.

dvro · 2016-07-21T03:25:51Z

@glemaitre whats up with this appveyor? it seems like several tests are failing ...

If everything is ok now, I'll squeeze the commits, pull and (finally) merge.
Let me know!

glemaitre · 2016-07-21T08:04:42Z

@dvro appveyor is still not working because of the npy file. That is why we have to move away from this solution.

This is fine for merging for me. Do you think that we need X_ = X.copy() in RENN as well?

* added AllKNN under sampling technique * test_allknn using assert_array_almost_equal * Add data * changing allknn doctest and removing internal data copy in _sample(X, y)

glemaitre force-pushed the master branch from a2d4fba to 5cf6c97 Compare July 15, 2016 11:57

dvro force-pushed the allknn branch 3 times, most recently from 02beef0 to 4bfbe2a Compare July 16, 2016 22:00

dvro force-pushed the allknn branch from 4bfbe2a to dd18587 Compare July 17, 2016 23:34

glemaitre force-pushed the master branch 6 times, most recently from 2fe4121 to 0e643dd Compare July 19, 2016 12:22

dvro force-pushed the allknn branch 2 times, most recently from 5b5a016 to 28a6ee1 Compare July 19, 2016 12:52

glemaitre force-pushed the master branch 3 times, most recently from 6671082 to e2f6f25 Compare July 19, 2016 14:07

added AllKNN under sampling technique

94e4fda

dvro force-pushed the allknn branch from 28a6ee1 to 94e4fda Compare July 19, 2016 15:07

glemaitre reviewed Jul 19, 2016
View reviewed changes

test_allknn using assert_array_almost_equal

1db3d18

Add data

a5bcc46

Merge pull request #2 from glemaitre/allknn

4c8954c

Add data

glemaitre reviewed Jul 20, 2016
View reviewed changes

changing allknn doctest and removing internal data copy in _sample(X, y)

be2d864

glemaitre merged commit 26c7ee8 into scikit-learn-contrib:master Jul 21, 2016

dvro deleted the allknn branch July 23, 2016 20:44

added AllKNN under-sampling method #97

added AllKNN under-sampling method #97

Uh oh!

Conversation

dvro commented Jul 14, 2016

Uh oh!

glemaitre commented Jul 15, 2016

Uh oh!

glemaitre commented Jul 15, 2016

Uh oh!

dvro commented Jul 15, 2016

Uh oh!

dvro commented Jul 16, 2016

Uh oh!

glemaitre commented Jul 17, 2016

Uh oh!

dvro commented Jul 17, 2016

Uh oh!

glemaitre commented Jul 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dvro commented Jul 18, 2016

Uh oh!

glemaitre Jul 19, 2016

Choose a reason for hiding this comment

Uh oh!

dvro commented Jul 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Jul 20, 2016

Uh oh!

glemaitre commented Jul 20, 2016

Uh oh!

coveralls commented Jul 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Jul 20, 2016

Uh oh!

glemaitre Jul 20, 2016

Choose a reason for hiding this comment

Uh oh!

dvro Jul 21, 2016

Choose a reason for hiding this comment

Uh oh!

coveralls commented Jul 21, 2016

Uh oh!

dvro commented Jul 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Jul 21, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

glemaitre commented Jul 18, 2016 •

edited

Loading

dvro commented Jul 20, 2016 •

edited

Loading

coveralls commented Jul 20, 2016 •

edited

Loading

dvro commented Jul 21, 2016 •

edited

Loading