gokinjo: A feature extraction library based on k-nearest neighbor algorithm in Python
Branch: master
Clone or download
Latest commit 5ab16e3 Feb 8, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci split workflow (test & build) Feb 7, 2019
examples
gokinjo bump up version Feb 7, 2019
tests initial implementation Feb 7, 2019
.gitignore
LICENSE Initial commit Feb 7, 2019
README.md lacks code Feb 8, 2019
entry_points.cfg initial implementation Feb 7, 2019
setup.cfg initial implementation Feb 7, 2019
setup.py
tox.ini initial implementation Feb 7, 2019

README.md

gokinjo

CircleCI

What is this?

  • A feature extraction library based on k-nearest neighbor algorithm in Python
    • k-NN based feature has experience of being used on 1st place solution of Kaggle competition (see references)
  • Be able to switch backend of k-NN algorithm
  • FYI: "gokinjo" is meant neighborhood in japanese.

Prerequisite

  • Python 3.6 or later
  • setuptools >= 30.0.3.0

How to install

From PyPI

$ pip install gokinjo
With annoy backend
$ pip install "gokinjo[annoy]"

From source code

$ pip install git+https://github.com/momijiame/gokinjo.git

Quick start

step 1: generate example data

import numpy as np
x0 = np.random.rand(500) - 0.5
x1 = np.random.rand(500) - 0.5
X = np.array(list(zip(x0, x1)))
y = np.array([1 if i0 * i1 > 0 else 0 for i0, i1 in X])

step 2: plot the above

from matplotlib import pyplot as plt
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.show()

not linearly separable data

It is not linearly separable obviously.

step 3: extract k-NN feature with K-Fold

from gokinjo import knn_kfold_extract
X_knn = knn_kfold_extract(X, y)

step 4: plot the above

plt.scatter(X_knn[:, 0], X_knn[:, 1], c=y)
plt.show()

linearly separable data

It looks like almost linearly separable.

Usage example

  • Please see examples in GitHub repository.

How to setup a development environment

$ pip install -e ".[develop]"
$ pytest

References