# momijiame/gokinjo

gokinjo: A feature extraction library based on k-nearest neighbor algorithm in Python
Latest commit 5ab16e3 Feb 8, 2019
# gokinjo

### What is this?

• A feature extraction library based on k-nearest neighbor algorithm in Python
• k-NN based feature has experience of being used on 1st place solution of Kaggle competition (see references)
• Be able to switch backend of k-NN algorithm
• FYI: "gokinjo" is meant neighborhood in japanese.

### Prerequisite

• Python 3.6 or later
• setuptools >= 30.0.3.0

### How to install

#### From PyPI

`\$ pip install gokinjo`
##### With annoy backend
`\$ pip install "gokinjo[annoy]"`

#### From source code

`\$ pip install git+https://github.com/momijiame/gokinjo.git`

### Quick start

step 1: generate example data

```import numpy as np
x0 = np.random.rand(500) - 0.5
x1 = np.random.rand(500) - 0.5
X = np.array(list(zip(x0, x1)))
y = np.array([1 if i0 * i1 > 0 else 0 for i0, i1 in X])```

step 2: plot the above

```from matplotlib import pyplot as plt
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.show()```

It is not linearly separable obviously.

step 3: extract k-NN feature with K-Fold

```from gokinjo import knn_kfold_extract
X_knn = knn_kfold_extract(X, y)```

step 4: plot the above

```plt.scatter(X_knn[:, 0], X_knn[:, 1], c=y)
plt.show()```

It looks like almost linearly separable.

### Usage example

• Please see examples in GitHub repository.

### How to setup a development environment

```\$ pip install -e ".[develop]"
\$ pytest```

### References

• The competition which k-NN feature was used on 1st place solution
• R implementation
• Super respectable another Python implementation