- A feature extraction library based on k-nearest neighbor algorithm in Python
- k-NN based feature has experience of being used on 1st place solution of Kaggle competition (see references)
- Be able to switch backend of k-NN algorithm
- scikit-learn (default)
- annoy
- FYI: "gokinjo" is meant neighborhood in japanese.
- Python 3.6 or later
- setuptools >= 30.0.3.0
$ pip install gokinjo
$ pip install "gokinjo[annoy]"
$ pip install git+https://github.com/momijiame/gokinjo.git
step 1: generate example data
import numpy as np
x0 = np.random.rand(500) - 0.5
x1 = np.random.rand(500) - 0.5
X = np.array(list(zip(x0, x1)))
y = np.array([1 if i0 * i1 > 0 else 0 for i0, i1 in X])
step 2: plot the above
from matplotlib import pyplot as plt
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.show()
It is not linearly separable obviously.
step 3: extract k-NN feature with K-Fold
from gokinjo import knn_kfold_extract
X_knn = knn_kfold_extract(X, y)
step 4: plot the above
plt.scatter(X_knn[:, 0], X_knn[:, 1], c=y)
plt.show()
It looks like almost linearly separable.
- Please see examples in GitHub repository.
$ pip install -e ".[develop]"
$ pytest
- The competition which k-NN feature was used on 1st place solution
- R implementation
- Super respectable another Python implementation