Skip to content

Conversation

dvro
Copy link
Member

@dvro dvro commented Jul 14, 2016

  • Added AllKNN under-sampling method
    • imblearn/under_sampling/edited_nearest_neighbours.py.
  • changed added AllKNN in under_sampling/init.py for import purposes;
  • added a plot example comparing ENN, RENN and AllKNN
    • examples/under-sampling/plot_allknn.py

python plot_allknn.py

ENN
Reduced 5.64%
RENN
Reduced 8.36%
AllKNN
Reduced 6.94%

allknn

@glemaitre
Copy link
Member

@dvro Could you also write the test? You can check the other file to check how it is done for the moment.

@glemaitre
Copy link
Member

You should also avoid to merge and use rebase I think. The merge will be on our side.

@dvro
Copy link
Member Author

dvro commented Jul 15, 2016

Ok, I'll do that today!

@dvro dvro force-pushed the allknn branch 3 times, most recently from 02beef0 to 4bfbe2a Compare July 16, 2016 22:00
@dvro
Copy link
Member Author

dvro commented Jul 16, 2016

@glemaitre I believe it is fine now; let me know if it still needs any improvement!

@glemaitre
Copy link
Member

Code wise it is fine. Just a last addition: can you add to the changelog in doc/whats_new.rst

@dvro
Copy link
Member Author

dvro commented Jul 17, 2016

@glemaitre done

updated README.rst, doc/whats_new.rst, doc/todo.rst
changed to commit name to a simple "AllKNN under-sampling technique"

@glemaitre
Copy link
Member

glemaitre commented Jul 18, 2016

I got an issue after merging on master with the testing on my repo. The table were mismatching.
Could you investigate. I put the inside a branch, you can check Travis

@dvro
Copy link
Member Author

dvro commented Jul 18, 2016

That's weird ... everything seems to be fine on my end ...

Doctest: imblearn.pipeline.Pipeline ... ok Doctest: imblearn.under_sampling.cluster_centroids.ClusterCentroids ... ok Doctest: imblearn.under_sampling.condensed_nearest_neighbour.CondensedNearestNeighbour ... ok Doctest: imblearn.under_sampling.edited_nearest_neighbours.AllKNN ... ok Doctest: imblearn.under_sampling.edited_nearest_neighbours.EditedNearestNeighbours ... ok Doctest: imblearn.under_sampling.edited_nearest_neighbours.RepeatedEditedNearestNeighbours ... ok Doctest: imblearn.under_sampling.instance_hardness_threshold.InstanceHardnessThreshold ... ok Doctest: imblearn.under_sampling.nearmiss.NearMiss ... ok Doctest: imblearn.under_sampling.neighbourhood_cleaning_rule.NeighbourhoodCleaningRule ... ok Doctest: imblearn.under_sampling.one_sided_selection.OneSidedSelection ... ok Doctest: imblearn.under_sampling.random_under_sampler.RandomUnderSampler ... ok Doctest: imblearn.under_sampling.tomek_links.TomekLinks ... ok

I'll look into it!

@glemaitre glemaitre force-pushed the master branch 6 times, most recently from 2fe4121 to 0e643dd Compare July 19, 2016 12:22
@dvro dvro force-pushed the allknn branch 2 times, most recently from 5b5a016 to 28a6ee1 Compare July 19, 2016 12:52
@glemaitre glemaitre force-pushed the master branch 3 times, most recently from 6671082 to e2f6f25 Compare July 19, 2016 14:07
currdir = os.path.dirname(os.path.abspath(__file__))
X_gt = np.load(os.path.join(currdir, 'data', 'allknn_x.npy'))
y_gt = np.load(os.path.join(currdir, 'data', 'allknn_y.npy'))
assert_array_equal(X_resampled, X_gt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvro Try to use assert_array_almost_equal to see if this is not a problem of rounding.

@dvro
Copy link
Member Author

dvro commented Jul 20, 2016

@glemaitre check out this weirdest thing:
On my env, the test_edited_nearest_neighbors fail:

(clean-env) dayvid:tests$ nosetests test_edited_nearest_neighbours.py
.....FFF.
======================================================================
FAIL: Test the fit sample routine
----------------------------------------------------------------------

and the test_allknn.py passes:

(clean-env) dayvid:tests$ nosetests test_allknn.py 
.........
----------------------------------------------------------------------
Ran 9 tests in 6.501s

OK

I created a clean-env using the same python version, numpy/scipy version, but the problem persists, specifically Arrays are not equal (mismatch 2.82770406112%)

do you think you could generate the allknn_*.npy files on your machine? and pull request them to my allknn repo?

Thanks,

@glemaitre
Copy link
Member

@dvro I am doing that.

@glemaitre
Copy link
Member

@dvro done

@coveralls
Copy link

coveralls commented Jul 20, 2016

Coverage Status

Coverage increased (+0.02%) to 99.325% when pulling 4c8954c on dvro:allknn into e2f6f25 on scikit-learn-contrib:master.

@glemaitre
Copy link
Member

ok so that's pretty weird, which OS are you using?

>>> from collections import Counter
>>> from sklearn.datasets import fetch_mldata
>>> from imblearn.under_sampling import AllKNN
>>> pima = fetch_mldata('diabetes_scale')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the example to avoid fetching data. It can be a problem at testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 99.325% when pulling be2d864 on dvro:allknn into e2f6f25 on scikit-learn-contrib:master.

@dvro
Copy link
Member Author

dvro commented Jul 21, 2016

@glemaitre whats up with this appveyor? it seems like several tests are failing ...

If everything is ok now, I'll squeeze the commits, pull and (finally) merge.
Let me know!

@glemaitre
Copy link
Member

@dvro appveyor is still not working because of the npy file. That is why we have to move away from this solution.

This is fine for merging for me. Do you think that we need X_ = X.copy() in RENN as well?

@glemaitre glemaitre merged commit 26c7ee8 into scikit-learn-contrib:master Jul 21, 2016
glemaitre pushed a commit to glemaitre/imbalanced-learn that referenced this pull request Jul 21, 2016
* added AllKNN under sampling technique

* test_allknn using assert_array_almost_equal

* Add data

* changing allknn doctest and removing internal data copy in _sample(X, y)
@dvro dvro deleted the allknn branch July 23, 2016 20:44
christophe-rannou pushed a commit to christophe-rannou/imbalanced-learn that referenced this pull request Apr 3, 2017
* added AllKNN under sampling technique

* test_allknn using assert_array_almost_equal

* Add data

* changing allknn doctest and removing internal data copy in _sample(X, y)
glemaitre pushed a commit to glemaitre/imbalanced-learn that referenced this pull request Jun 15, 2017
* added AllKNN under sampling technique

* test_allknn using assert_array_almost_equal

* Add data

* changing allknn doctest and removing internal data copy in _sample(X, y)
glemaitre pushed a commit to glemaitre/imbalanced-learn that referenced this pull request Jun 15, 2017
* added AllKNN under sampling technique

* test_allknn using assert_array_almost_equal

* Add data

* changing allknn doctest and removing internal data copy in _sample(X, y)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants