Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I cluster data using a distance matrix with the ELKI library? #60

Closed
Vagger opened this issue Jun 5, 2019 · 4 comments
Closed

Comments

@Vagger
Copy link

Vagger commented Jun 5, 2019

I have a distance matrix and I want to use that distance matrix when clustering my data.

I've read the ELKI documentation and it states that I can overwrite the distance method when extending the AbstractNumberVectorDistanceFunction class.

The distance class however, returns the coordinates. So from coordinate x to coordinate y. This is troublesome because the distance matrix is filled only with distance values and we use the indexes to find the distance value from index x to index y. Here's the code from the documentation:

public class TutorialDistanceFunction extends AbstractNumberVectorDistanceFunction {
  @Override
  public double distance(NumberVector o1, NumberVector o2) {
    double dx = o1.doubleValue(0) - o2.doubleValue(0);
    double dy = o1.doubleValue(1) - o2.doubleValue(1);
    return dx * dx + Math.abs(dy);
  }
}

My question is how to correctly use the distance matrix when clustering with ELKI.

@kno10
Copy link
Member

kno10 commented Jun 5, 2019

AbstractNumberVectorDistanceFunction is the approriate parent class only if your input data are number vectors. If your data type is abstract object identifiers, subclass AbstractDBIDRangeDistanceFunction. You then have to implement double distance(int i1, int i2);

There are already different implementations of a distance function for precomputed distances, for example DiskCacheBasedDoubleDistanceFunction that memory-maps a distance matrix stored on disk. We should add a DoubleMatrixDistanceFunction though, for direct use from Java.

See also: https://elki-project.github.io/howto/precomputed_distances

@kno10 kno10 closed this as completed Jun 5, 2019
@Vagger
Copy link
Author

Vagger commented Jun 5, 2019

Does this support asymmetric matrices?

@kno10
Copy link
Member

kno10 commented Jun 5, 2019

A distance is supposed to be symmetric; it may or may not work with asymmetric distances.
The current implementations will likely assume symmetry to reduce memory usage by 50%.

But it even more depends on the algorithm. Many will assume distances are symmetric, and asymmetric distances can likely cause infinite loops etc. Some parts of the code may be switching x and y if they have reason to assume that one is faster than the other because of caches etc.

@Vagger
Copy link
Author

Vagger commented Jun 5, 2019

Alright. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants