Compute average distance matrices more efficiently #454
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Apologies for implementing all of this in a piecemeal manner, but I just noticed the same trick I'd used in PR #432 to make
ExponentialError::lnPdf
more performant can also be exploited inTreeUtilities::getAverageDistanceMatrix
. (In contrast, the even fasterstd::sort
-based solution eventually implemented in PR #434 wouldn't work here, because we are working with string vectors of different lengths.) It results in an approximately 16-fold speed-up when computing 10,000-by-10,000 average distance matrices.