Skip to content

Improving Mahalanobis distance KNN speed #17813

@PGuti

Description

@PGuti

Describe the workflow you want to enable:

KNN usage with Mahalanobis can become rather slow (several seconds per test datapoint) when the feature space is large (1500 features). Even if the training set is small (100s of images)

Describe your proposed solution:

Mahalanobis distance computes d = (x-y)T VI (x-y) for each x in the training set.

If we develop we get d = xT.VI.x + yT.VI.y - xT.VI.y - yT.VI.x
Now the first term could be computed at init_time for all x in training set, as well as xT.VI and VI.x.

This way, at test time:

  • yT.VI.y could be computed only once. It is not necessary to compute it for all x.
  • for each x in training set, only two dot products are left to compute the distances.

The issue with this is now we cannot have fully parallel (in terms of training set elements) code of distance computation in the KNN implementation.

Describe alternatives you've considered, if relevant

Similar idea can be found here: https://www.ece.ucsb.edu/scl/html/database/RelevanceFeedbackIndexing_ICIP2008.pdf

Metadata

Metadata

Assignees

Labels

ModerateAnything that requires some knowledge of conventions and best practicesNeeds BenchmarksA tag for the issues and PRs which require some benchmarksPerformancemodule:metrics

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions