Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

haversine distance for OPTICS #12480

Open
koushiksaha89 opened this issue Oct 29, 2018 · 10 comments
Open

haversine distance for OPTICS #12480

koushiksaha89 opened this issue Oct 29, 2018 · 10 comments

Comments

@koushiksaha89
Copy link

When i am trying to run OPTICS(min_samples=2, max_eps=epsilon, metric='precomputed', algorithm='ball_tree', rejection_ratio=0.1 , n_jobs=3).fit(distance_matrix) i am getting Metric 'precomputed' not valid. Use sorted(sklearn.neighbors.VALID_METRICS['ball_tree']) to get valid options. Metric can also be a callable function. where distance_matrix is a numpy array of harvesine distances.

Again when trying OPTICS(min_samples=2, max_eps=epsilon, metric='haversine', algorithm='ball_tree', rejection_ratio=0.1 , n_jobs=3).fit(np.radians(coordinates[:, [0, 1]])) i am getting ValueError: Unknown metric haversine.

Using Version: 0.21.dev0

@koushiksaha89
Copy link
Author

possible duplicate of #11954.
Hence closing this.

@adrinjalali
Copy link
Member

Not all tree algorithms accept a precomputed distance matrix. You can change the algorithm to brute if you still want to give a precomputed matrix.

@qinhanmin2014
Copy link
Member

Note that our OPTICS is experimental. The API is likely to be changed in the future and there're some known issues (especially related to the extraction methods). So be careful when using it.

@jnothman
Copy link
Member

jnothman commented Oct 30, 2018 via email

@SantiagoOrdonez
Copy link

So if I want to cluster lat/lng points, OPTICS is not a good approach because it does not support haversine?

@espg
Copy link
Contributor

espg commented Nov 4, 2018

@jnothman I think that ball_tree provides the best performance on large datasets, and also has the best compatibility with various distance matrices. It's probably more consistent to have it default to 'auto' on the assumption that it will infer when it is appropriate to use ball_tree based on distance metric and data size, and also for compatibility moving forward as additional neighbors algorithms are added. My reasoning for setting to ball_tree is that the algorithm is performant at large data sizes, and at small sizes the gap between brute and ball is unlikely to be noticed as the return will be fast already due to the smaller dataset size... but, as long as 'auto' can parse the appropriate algorithm to use based on the distance metric, I don't see an issue for switching to it...

@espg
Copy link
Contributor

espg commented Nov 4, 2018

@SantiagoOrdonez OPTICS should support haversine if the ball_tree algorithm is used. If you supply 'haversine' as the metric type, the 'auto' algorithm should default to something that supports that distance metric (i.e., 'ball_tree'), but you could also be explicit in the function call and specify both 'ball_tree' and 'haversine' if you like. I have used custom distance metrics with OPTICS and gotten valid results in my own testing and use; any distance metric that obeys the triangle inequality should work with the 'ball_tree' algorithm, although the performance may be less then other distance metrics... Note that if your input is lat/long coordinates, you will need to convert them to radians first, as haversine assumes generic spherical coordinates and not lat/long values.

Does 'haversine' error when called with 'ball_tree' in OPTICS? If so, that is strange behavior, as custom metrics work fine...

@SantiagoOrdonez
Copy link

yes, 'haversine' is not supported with 'ball_tree'. It is really odd since It does work with a custom function. I am just passing in haversine as a custom function to bypass this.

@jnothman
Copy link
Member

jnothman commented Nov 6, 2018 via email

@rth
Copy link
Member

rth commented Nov 9, 2018

yes, 'haversine' is not supported with 'ball_tree'.

Wait, it does work with ball_tree for NearestNeihbors, while its doesn't for auto. Re-opening as I think there is a related bug in NearestNeihbors that might need fixing #12552

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants