Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized matching #144

Merged
merged 24 commits into from
Nov 7, 2014
Merged

Optimized matching #144

merged 24 commits into from
Nov 7, 2014

Conversation

AlexisEidelman
Copy link
Contributor

No description provided.

The SDtOM is the most relevant distance.
'''
def __init__(self, set1filter, set2filter, score, orderby, pool_size=None):
# Why not to have a second order ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it doesn't make sense, we take the best score, independently to the order of individuals in set 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. :)

@gdementen gdementen added this to the 0.9 milestone Sep 28, 2014
@gdementen gdementen self-assigned this Sep 28, 2014
@gdementen
Copy link
Member

This looks like a great contribution (again). I wonder if it is not possible to make the optimized version return exactly the same results as the old method (possibly as an option if it has a non-negligible cost). If my understanding is correct, I guess it is a matter of getting "df_by_cell" to return a list of ids in the "original" order within the cell, no?

If that is indeed possible, we could simply remove the old method. There are probably cases where there are enough different combination of variables that the additional groupby makes the new method slower rather than faster. I believe those cases should be relatively rare, but it would be nice to know what is the threshold/at what point it is not worth it. Have you done some tests in that area?

@gdementen
Copy link
Member

merged in optimized_matching branch. I will clean it up before merging to master

@gdementen gdementen merged commit 67fc8e6 into liam2:master Nov 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants