Skip to content

Finding top n nearest neighbor points by hyperplane hashing. LSH

License

Notifications You must be signed in to change notification settings

stepping1st/hyperplane-hash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hyperplane-Hash

A Java implementation of point-to-hyperplane search.

  • Bilinear Hyperplane(BH) Neighbor Search
  • Embedding Hyperplane(EH) Neighbor Search
  • Multilinear Hyperplane(MH) Neighbor Search
  • Nearest Hyperplane(NH) Neighbor Search
  • Furthest Hyperplane(FH) Neighbor Search

Hyperplane Hash is a technique for hashing multidimensional vectors through linear transformation. This implementation follows the approach of point-to-hyperplane search to partition data into the dimension space in neighborhoods.

Why we need nearest neighbor search ?

  • Approximate near-neighbor search(NNS)

    • For low-dimensional points, spatial decomposition and tree based search algorithms can provide the exact neighbors in sub-linear time. 1
    • The higher the dimensional data, the more computing cost is required to search. Accordingly, a method for finding a point having a small euclidean distance within a linear time is required.
  • Margin-based active learning

    • Active classifier learning methods for pool-based selection generally scan all database instances before selecting which to have labeled next. 2
    • NNS is useful in large-scale active learning with SVM, maximum margin clustering, and large-margin dimensionality reduction.

Performance

  • BH and MH have been proved to outperform EH. 2
  • If the data is normalized, BH and MH also show good enough performance.
  • NH and FH search faster than BH and MH, but require more computing power.

Download

Using maven:

<dependency>
    <groupId>io.github.stepping1st</groupId>
    <artifactId>hyperplane-hash</artifactId>
    <version>0.1.0</version>
</dependency>

Examples

import org.apache.commons.math.random.JDKRandomGenerator;
import org.apache.commons.math.random.RandomData;
import org.apache.commons.math.random.RandomDataImpl;
import io.github.stepping1st.hh.hash.BHHash;
import io.github.stepping1st.hh.search.HashSearch;

import java.util.List;


public class RunHash {

    public static void main(String[] args) {
        double[][] data; // load real data array
        double[] q = new double[]{0.13032633, 0.6227648, -0.33633736, 0.37559462, -0.29248887};
        int dim = q.length;
        // single hasher of the compond hasher
        int m = 4;
        // hash tables
        int l = 4;
        // top n search
        int top = 100;
        // candidate search limit
        int limit = 10000;

        JDKRandomGenerator rg = new JDKRandomGenerator();
        RandomData rd = new RandomDataImpl(rg);

        // generate hash algorithm
        BHHash hash = new BHHash(dim, m, l, rd);
        HashBucket bucket = new HashBucket(dim, l);
        // generate searcher
        HashSearch search = new HashSearch(hash, data, bucket);
        Query query = new Query(q, data, top, limit, Dist.ABS_DOT);

        // return search index of data and distance from query
        List<IdxVal> found = search.nns(query);
        for (IdxVal idx : found) {
            // get data from index
            double[] ds = data[idx.idx()];
        }
    }
}

Check the examples

Papers

Embedding Hyperplane(EH) Neighbor Search

Bilinear Hyperplane(BH) Neighbor Search

Multilinear Hyperplane(MH) Neighbor Search

Nearest Hyperplane(NH), Furthest Hyperplane(FH) Neighbor Search

Reference

source code is originally based on https://github.com/HuangQiang/P2HNNS

ref1

  • J. Freidman, J. Bentley, and A. Finkel. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Transactions on Mathematical Software, 3(3):209–226, September 1977.

ref2

  • Prateek Jain, Sudheendra Vijayanarasimhan, and Kristen Grauman. 2010. Hashing hyperplane queries to near points with applications to large-scale active learning. In NeurIPS. 928–936.