Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why can't got the top k ? #26

Closed
Alisaincd opened this issue Apr 20, 2017 · 1 comment
Closed

Why can't got the top k ? #26

Alisaincd opened this issue Apr 20, 2017 · 1 comment

Comments

@Alisaincd
Copy link

Alisaincd commented Apr 20, 2017

Hi, I want to got the top k element with MinHashLSH but failed. For example, I set 'k=3', but I got ('result: ', ['21', '28', '51', '1', '82', '3', '91', '69', '86', '85']), whose length is larger than 3. My demo is like below:
def query_topk(l, query_doc, k):
forest= MinHashLSHForest(num_perm=256)
count=0
for i in l:
forest.add(str(count), i)
count += 1
forest.index()
result = forest.query(query_doc, k)
return result

l : list of MinHash, query_doc: a MinHash
Is there anything wrong?
By the way, does the input must be a list of string? What if my input is a vector?
Thanks for your patience,
And another question, does this realization just support for texts? if each of my input is a list of float, i.e.[[1,2,3],[1.2,2.3,2.1]], can this work perfectly?

Sincerely,

ekzhu added a commit that referenced this issue Apr 26, 2017
@ekzhu
Copy link
Owner

ekzhu commented Apr 26, 2017

Thanks for raising the issue. I just fixed it in 1.2.1.

For your question. MinHash supports bytes as input. So as long as you can convert the object (i.e., integers, strings, floats, lists) into bytes, it works with MinHash. For example:

# For a set of floats, e.g. {1.3, 123.4, 32.9, 3.1415926, ...}
minhash.update(struct.pack("f", 3.1415926))

# EVERY ELEMENT in your input set is a LIST of float
# e.g. {[1.34, 1.3, 343.0, 123.9], [2.3, 23.2, 86.8], ...}
minhash.update(struct.pack("4f", *[1.34, 1.3, 343.0, 123.9]))

@ekzhu ekzhu closed this as completed Apr 26, 2017
pombredanne added a commit to pombredanne/datasketch that referenced this issue Jun 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants