Skip to content

sysbio-curie/scikit-dimension

 
 

Repository files navigation

Build Status Build status codecov CircleCI Documentation Status Language grade: Python GitHub license

scikit-dimension

scikit-dimension is a (work-in-progress /!\) Python module for intrinsic dimension estimation built according to the scikit-learn API and distributed under the 3-Clause BSD license.

Installation

Using pip:

pip install git+https://github.com/sysbio-curie/scikit-dimension.git

From source:

git clone https://github.com/sysbio-curie/scikit-dimension
cd scikit-dimension
pip install .

Quick start

Local and global estimators can be used in this way:

import skdim
import numpy as np

#generate data : np.array (n_points x n_dim). Here a uniformly sampled 5-ball embedded in 10 dimensions
data = np.zeros((1000,10))
data[:,:5] = skdim.gendata.hyperBall(n_points = 1000, n_dim = 5, radius = 1, random_state = 0)

#fit an estimator of global intrinsic dimension (gid)
danco = skdim.gid.DANCo().fit(data)
#fit an estimator of local intrinsic dimension (lid): local estimators assume input data comes from a local data neighborhood
fishers = skdim.lid.FisherS().fit(data)
#fit a global or local estimator in k-nearest-neighborhoods of each point:
lpca_pw = skdim.asPointwise(data = data,
                            class_instance = skdim.lid.lPCA(),
                            n_neighbors = 100,
                            n_jobs = 1)

#get estimated intrinsic dimension
print(danco.dimension_, fishers.dimension_, np.mean(lpca_pw))

Please refer to the documentation for detailed API and examples.

Credits and links to original implementations:

R

MATLAB

C++ TwoNN

Python TwoNN

Packages

No packages published

Languages

  • Python 100.0%