Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use two data sets to compute their intersection? #4

Closed
GoogleCodeExporter opened this issue Sep 2, 2015 · 1 comment
Closed

Comments

@GoogleCodeExporter
Copy link

Roberto: Sorry for the late reply but for whatever reason, the first
notification about your Jan 2nd question got lost in my spam filter.
Since you closed the original ticket I am opening a new one with
clarifications.

What I meant is the ability to provide as an input not one dataset but two
dataset. 

In this setting, one dataset would be some "reference" and the second
dataset a "query" dataset. 
The goal would be to find all items in the "query" set that are similar to
items in the "reference" data set above a certain threshold: basically
returning the similarity intersection between the two sets as opposed to
the current setting where only pairs within the same are considered. I
guess one way could be to merge the sets and discard pairs returned from
the same set, though that does seem pretty naive.  

Original issue reported on code.google.com by pombreda...@gmail.com on 26 Jan 2010 at 6:59

@GoogleCodeExporter
Copy link
Author

Sorry for the incredibly late reply, I obviously need to set up notifications.

Yes your solution of merging the data would work. You could also implement this 
by having the algorithm build an index only over the "reference" dataset. Then 
you could iterate over the elements of the query elements and probe the index 
as before.

Original comment by roberto....@gmail.com on 15 Sep 2010 at 11:27

  • Changed state: WontFix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant