Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to return more points? #104

Open
Dronablo opened this issue Jun 18, 2018 · 2 comments
Open

How to return more points? #104

Dronablo opened this issue Jun 18, 2018 · 2 comments

Comments

@Dronablo
Copy link

Hi!

Let's start with the code:

import falconn
import numpy as np

params_cp = falconn.LSHConstructionParameters()
params_cp.dimension = 300
params_cp.lsh_family = falconn.LSHFamily.CrossPolytope
params_cp.distance_function = falconn.DistanceFunction.EuclideanSquared
params_cp.l = 50
params_cp.num_rotations = 2
params_cp.seed = 5721840

params_cp.num_setup_threads = 1
params_cp.storage_hash_table = falconn.StorageHashTable.BitPackedFlatHashTable
# 2**17 <= N_SAMPLES
falconn.compute_number_of_hash_functions(17, params_cp)

table = falconn.LSHIndex(params_cp)

qdata = np.zeros((1000, 300), dtype=np.float32)

for en, i in enumerate(range(0, 1000, 100)):
    qdata[i:i+100][:,en*30:(en+1)*30] = 1

# qdata[0]
# qdata[200]

table.setup(qdata)

centroid_index = table.construct_query_object()
centroid_index.set_num_probes(50)

print(len(centroid_index.find_k_nearest_neighbors(qdata[0], k=200)))

print(len(centroid_index.find_near_neighbors(qdata[0], threshold=10e10)))

So, we have 1000 points in 300-dimensional space, grouped in 10 extremely dense areas (in the example above - grouped into dots). When I make a query to get k nearest neigbours, I would never get more points, than in nearest dense area. So, when I ask for k<100 it's ok, but if I want k=200, or k=1000 falconn returns only 100 points.

Is there any way to configure params in such a way, so above script would print

200
1000

?

Thank you!

@ludwigschmidt
Copy link
Collaborator

Have you tried increasing the number of probes?

@Dronablo
Copy link
Author

Sure!

Here's little bit simpler code, goal is to get more than 3 results:

import falconn
import numpy as np

params_cp = falconn.LSHConstructionParameters()
params_cp.dimension = 6
params_cp.lsh_family = falconn.LSHFamily.CrossPolytope
params_cp.distance_function = falconn.DistanceFunction.EuclideanSquared
params_cp.l = 50
params_cp.num_rotations = 3
params_cp.seed = 5721840

params_cp.num_setup_threads = 1
params_cp.storage_hash_table = falconn.StorageHashTable.BitPackedFlatHashTable
# 2**17 <= N_SAMPLES
falconn.compute_number_of_hash_functions(17, params_cp)

table = falconn.LSHIndex(params_cp)

qdata = np.array([[1,1,1,1,0,0],
                  [1,2,1,2,0,0],
                  [2,1,2,1,0,0],
                  [0,0,0,1,1,1],
                  [0,0,0,1,2,2],
                 ], dtype=np.float32)

table.setup(qdata)

centroid_index = table.construct_query_object()
centroid_index.set_num_probes(5000)

print(len(centroid_index.find_k_nearest_neighbors(qdata[0], k=200)))
print(len(centroid_index.find_near_neighbors(qdata[0], threshold=10e10)))

Also tried to play with storage_hash_table, num_rotations, l, k - nothing changed.

A-Guldborg pushed a commit to duckth/FOENNIX that referenced this issue May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants