What is this?
- A feature extraction library based on k-nearest neighbor algorithm in Python
- k-NN based feature has experience of being used on 1st place solution of Kaggle competition (see references)
- Be able to switch backend of k-NN algorithm
- FYI: "gokinjo" is meant neighborhood in japanese.
- Python 3.6 or later
- setuptools >= 18.104.22.168
How to install
$ pip install gokinjo
With annoy backend
$ pip install "gokinjo[annoy]"
From source code
$ pip install git+https://github.com/momijiame/gokinjo.git
step 1: generate example data
import numpy as np x0 = np.random.rand(500) - 0.5 x1 = np.random.rand(500) - 0.5 X = np.array(list(zip(x0, x1))) y = np.array([1 if i0 * i1 > 0 else 0 for i0, i1 in X])
step 2: plot the above
from matplotlib import pyplot as plt plt.scatter(X[:, 0], X[:, 1], c=y) plt.show()
It is not linearly separable obviously.
step 3: extract k-NN feature with K-Fold
from gokinjo import knn_kfold_extract X_knn = knn_kfold_extract(X, y)
step 4: plot the above
plt.scatter(X_knn[:, 0], X_knn[:, 1], c=y) plt.show()
It looks like almost linearly separable.
- Please see examples in GitHub repository.
How to setup a development environment
$ pip install -e ".[develop]" $ pytest
- The competition which k-NN feature was used on 1st place solution
- R implementation
- Super respectable another Python implementation