New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use vectorization for categorical distance #5147
Use vectorization for categorical distance #5147
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5147 +/- ##
=======================================
Coverage 89.42% 89.43%
=======================================
Files 205 205
Lines 15160 15170 +10
=======================================
+ Hits 13557 13567 +10
Misses 1603 1603 ☔ View full report in Codecov by Sentry. |
Could you review this PR? |
consider_prior = parameters.consider_prior or len(observations) == 0 | ||
choices = search_space.choices | ||
n_choices = len(choices) | ||
n_samples = observations.size + (parameters.consider_prior or observations.size == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the name n_samples
might not be appropriate, because this includes prior. Maybe n_mixture
may be better?
Nit: Is there any reason you prefer |
The reason why I take |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great PR with benchmarking results. I added a small comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
It returns the total number of elements, regardless of the dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This pull request has not seen any recent activity. |
Motivation
As the current implementation for the categorical distance is not vectorized, I modified it to the vectorized version for a quicker runtime.
Furthermore, I reuse some results to reduce the computational overheads as well.
Description of the changes
The changes are to:
Benchmarking Results
As the time complexity of the new algorithm is
O(n_trials + n_choices * n_unique_occurrences)
while that of the old one isO(n_trials * n_choices)
, the new algorithm is quicker in all the instances listed in the table.Plus, all the instances passed
assert np.allclose(res_old, res_new)
.