Skip to content

gokinjo: A feature extraction library based on k-nearest neighbor algorithm in Python

License

Notifications You must be signed in to change notification settings

momijiame/gokinjo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gokinjo

CircleCI

What is this?

  • A feature extraction library based on k-nearest neighbor algorithm in Python
    • k-NN based feature has experience of being used on 1st place solution of Kaggle competition (see references)
  • Be able to switch backend of k-NN algorithm
  • FYI: "gokinjo" is meant neighborhood in japanese.

Prerequisite

  • Python 3.6 or later
  • setuptools >= 30.0.3.0

How to install

From PyPI

$ pip install gokinjo
With annoy backend
$ pip install "gokinjo[annoy]"

From source code

$ pip install git+https://github.com/momijiame/gokinjo.git

Quick start

step 1: generate example data

import numpy as np
x0 = np.random.rand(500) - 0.5
x1 = np.random.rand(500) - 0.5
X = np.array(list(zip(x0, x1)))
y = np.array([1 if i0 * i1 > 0 else 0 for i0, i1 in X])

step 2: plot the above

from matplotlib import pyplot as plt
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.show()

not linearly separable data

It is not linearly separable obviously.

step 3: extract k-NN feature with K-Fold

from gokinjo import knn_kfold_extract
X_knn = knn_kfold_extract(X, y)

step 4: plot the above

plt.scatter(X_knn[:, 0], X_knn[:, 1], c=y)
plt.show()

linearly separable data

It looks like almost linearly separable.

Usage example

  • Please see examples in GitHub repository.

How to setup a development environment

$ pip install -e ".[develop]"
$ pytest

References

About

gokinjo: A feature extraction library based on k-nearest neighbor algorithm in Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages