-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized matching #144
Optimized matching #144
Conversation
No test because, there is nothing to do except adding other matching options in simulation.yml
Oops, there is something wrong with change/version_0_8_2.rst, I'll fix it in an incoming commit.
TODO: - deal with orederby - keep only optimized matching - test performance
…d_matching Conflicts: doc/usersguide/source/changes.rst
The SDtOM is the most relevant distance. | ||
''' | ||
def __init__(self, set1filter, set2filter, score, orderby, pool_size=None): | ||
# Why not to have a second order ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it doesn't make sense, we take the best score, independently to the order of individuals in set 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. :)
This looks like a great contribution (again). I wonder if it is not possible to make the optimized version return exactly the same results as the old method (possibly as an option if it has a non-negligible cost). If my understanding is correct, I guess it is a matter of getting "df_by_cell" to return a list of ids in the "original" order within the cell, no? If that is indeed possible, we could simply remove the old method. There are probably cases where there are enough different combination of variables that the additional groupby makes the new method slower rather than faster. I believe those cases should be relatively rare, but it would be nice to know what is the threshold/at what point it is not worth it. Have you done some tests in that area? |
merged in optimized_matching branch. I will clean it up before merging to master |
No description provided.