-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Description
Hello,
I think there is a bug in NearMiss version 3.
Here is a sample.
import numpy as np
from sklearn.neighbors import NearestNeighbors
from imblearn.under_sampling import NearMiss
X = np.array([[-0.64994222, 0.04118058],
[ 3.8412067, -2.99400188],
[-1.20138471, -0.36572523],
[ 0.4781401, 1.88990383],
[-0.02652392, 1.51770219],
[-1.14731928, -1.39040574],
[-0.7018478, -1.45178366],
[ 1.14251608, 2.56114558],
[-0.94415193, 0.36672943],
[-0.63234148, 2.02266682],
[-2.31299362, -0.97552173],
[-0.29746135, -1.27207336],
[-2.31091098, 0.58882533],
[-1.32011229, -0.8209863 ],
[ 0.02731873, -1.35198314],
[ 2.2667989, -0.8029873 ],
[-0.69871584, -0.99712093],
[-1.8479131, -0.401391 ],
[ 0.28506844, 0.57768767],
[ 1.66413975, -1.00265841],
[-0.18265191, 0.90493259]])
y = np.array([1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1])
X_1 = X[y==0]
X_2 = X[y==1]
nn = NearestNeighbors(n_neighbors=1)
nn.fit(X_2)
dist, ind = nn.kneighbors(X_1)
print "Nearest neighbors of points in X_1\n", X_2[np.unique(ind)]
nm = NearMiss(ratio=1.0, size_ngh=1, ver3_samp_ngh=1, version=3, random_state=1)
X_rs, y_rs = nm.fit_sample(X, y)
print "Resampled points from nearmiss 3\n",X_rs[y_rs==1]
Output
Nearest neighbors of points in X_1
[[-1.20138471 -0.36572523]
[ 0.4781401 1.88990383]
[-1.14731928 -1.39040574]
[ 0.02731873 -1.35198314]]
Resampled points from nearmiss 3
[[ 3.8412067 -2.99400188]
[-0.64994222 0.04118058]
[-1.20138471 -0.36572523]
[ 0.4781401 1.88990383]]
These two set of points should be the same, but they are not. In particular, the point (3.84, -2.99) is not a nearest neighbor for any of the points in X_1 and hence should not get selected. Please let me know if I am missing something.
Metadata
Metadata
Assignees
Labels
No labels