Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big K - output empty result #61

Closed
ccalvin97 opened this issue Jun 6, 2023 · 3 comments
Closed

Big K - output empty result #61

ccalvin97 opened this issue Jun 6, 2023 · 3 comments

Comments

@ccalvin97
Copy link

ccalvin97 commented Jun 6, 2023

Hi, when I set a big K=50,000, the result of the model is empty.
When I set K=10,000, the result is fine.
My dataset is about 0.1b row of size.
My final goal is to set K=50,000 ~ 100,000.

Setting is below:
hnsw = HnswSimilarity(identifierCol='ITEM_ID', queryIdentifierCol='ITEM_ID',featuresCol='EMBEDDINGS', distanceFunction='eu clidean', m=128, ef=25, k=50000,efConstruction=1000, numPartitions=9000, numReplicas=50, excludeSelf=True, similarityThreshold =0.2, predictionCol='pred')

@jelmerk
Copy link
Owner

jelmerk commented Jun 7, 2023

you are trying to find the best 50k results for each item in the set ? that would be a huge data set. I can speculate about the problem but thats not something thats going to be easy to reproduce. And a bit of a unusual scenario

@ccalvin97
Copy link
Author

Does this model have the limitation of K? I found that when I set k=50,000. It still output empty dataframe

@jelmerk
Copy link
Owner

jelmerk commented Jun 10, 2023

no it should never give an empty dataframe

@jelmerk jelmerk closed this as not planned Won't fix, can't repro, duplicate, stale Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants