Skip to content

Possible bug in NearMiss version 3 #124

@integrallyclosed

Description

@integrallyclosed

Hello,

I think there is a bug in NearMiss version 3.

Here is a sample.

import numpy as np
from sklearn.neighbors import NearestNeighbors
from imblearn.under_sampling import NearMiss

X = np.array([[-0.64994222,  0.04118058],
 [ 3.8412067,  -2.99400188],
 [-1.20138471, -0.36572523],
 [ 0.4781401,   1.88990383],
 [-0.02652392,  1.51770219],
 [-1.14731928, -1.39040574],
 [-0.7018478, -1.45178366],
 [ 1.14251608,  2.56114558],
 [-0.94415193,  0.36672943],
 [-0.63234148,  2.02266682],
 [-2.31299362, -0.97552173],
 [-0.29746135, -1.27207336],
 [-2.31091098,  0.58882533],
 [-1.32011229, -0.8209863 ],
 [ 0.02731873, -1.35198314],
 [ 2.2667989,  -0.8029873 ],
 [-0.69871584, -0.99712093],
 [-1.8479131,  -0.401391  ],
 [ 0.28506844,  0.57768767],
 [ 1.66413975, -1.00265841],
 [-0.18265191,  0.90493259]])
y = np.array([1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1])
X_1 = X[y==0]
X_2 = X[y==1]

nn = NearestNeighbors(n_neighbors=1)
nn.fit(X_2)
dist, ind = nn.kneighbors(X_1)

print "Nearest neighbors of points in X_1\n", X_2[np.unique(ind)]

nm = NearMiss(ratio=1.0, size_ngh=1, ver3_samp_ngh=1, version=3, random_state=1)
X_rs, y_rs = nm.fit_sample(X, y)

print "Resampled points from nearmiss 3\n",X_rs[y_rs==1]

Output

Nearest neighbors of points in X_1
[[-1.20138471 -0.36572523]
 [ 0.4781401   1.88990383]
 [-1.14731928 -1.39040574]
 [ 0.02731873 -1.35198314]]
Resampled points from nearmiss 3
[[ 3.8412067  -2.99400188]
 [-0.64994222  0.04118058]
 [-1.20138471 -0.36572523]
 [ 0.4781401   1.88990383]]

These two set of points should be the same, but they are not. In particular, the point (3.84, -2.99) is not a nearest neighbor for any of the points in X_1 and hence should not get selected. Please let me know if I am missing something.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions