New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
haversine distance for OPTICS #12480
Comments
possible duplicate of #11954. |
Not all tree algorithms accept a |
Note that our OPTICS is experimental. The API is likely to be changed in the future and there're some known issues (especially related to the extraction methods). So be careful when using it. |
But perhaps we should be defaulting to algorithm='auto'. @espg, why are we
defaulting to ball_tree?
|
So if I want to cluster lat/lng points, OPTICS is not a good approach because it does not support haversine? |
@jnothman I think that ball_tree provides the best performance on large datasets, and also has the best compatibility with various distance matrices. It's probably more consistent to have it default to 'auto' on the assumption that it will infer when it is appropriate to use ball_tree based on distance metric and data size, and also for compatibility moving forward as additional neighbors algorithms are added. My reasoning for setting to ball_tree is that the algorithm is performant at large data sizes, and at small sizes the gap between brute and ball is unlikely to be noticed as the return will be fast already due to the smaller dataset size... but, as long as 'auto' can parse the appropriate algorithm to use based on the distance metric, I don't see an issue for switching to it... |
@SantiagoOrdonez OPTICS should support haversine if the ball_tree algorithm is used. If you supply 'haversine' as the metric type, the 'auto' algorithm should default to something that supports that distance metric (i.e., 'ball_tree'), but you could also be explicit in the function call and specify both 'ball_tree' and 'haversine' if you like. I have used custom distance metrics with OPTICS and gotten valid results in my own testing and use; any distance metric that obeys the triangle inequality should work with the 'ball_tree' algorithm, although the performance may be less then other distance metrics... Note that if your input is lat/long coordinates, you will need to convert them to radians first, as haversine assumes generic spherical coordinates and not lat/long values. Does 'haversine' error when called with 'ball_tree' in OPTICS? If so, that is strange behavior, as custom metrics work fine... |
yes, 'haversine' is not supported with 'ball_tree'. It is really odd since It does work with a custom function. I am just passing in haversine as a custom function to bypass this. |
Let's switch the default to auto, please.
|
Wait, it does work with |
When i am trying to run
OPTICS(min_samples=2, max_eps=epsilon, metric='precomputed', algorithm='ball_tree', rejection_ratio=0.1 , n_jobs=3).fit(distance_matrix)
i am getting Metric 'precomputed' not valid. Use sorted(sklearn.neighbors.VALID_METRICS['ball_tree']) to get valid options. Metric can also be a callable function. wheredistance_matrix
is a numpy array of harvesine distances.Again when trying
OPTICS(min_samples=2, max_eps=epsilon, metric='haversine', algorithm='ball_tree', rejection_ratio=0.1 , n_jobs=3).fit(np.radians(coordinates[:, [0, 1]]))
i am getting ValueError: Unknown metric haversine.Using Version: 0.21.dev0
The text was updated successfully, but these errors were encountered: