-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added ability to use custom DistanceMetric #692
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hooray! Just a few changes are needed to gracefully handle sklearn versions before late 2021.
trackpy/linking/linking.py
Outdated
dist_func : function, optional | ||
a custom distance function that takes two 1D arrays of coordinates and | ||
returns a float. Must be used with the 'BTree' neighbor_strategy. | ||
dist_func : function or ```sklearn.metrics.DistanceMetric``` class, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is supposed to be an instance of DistanceMetric
, not a class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My new commits should hopefully resolve my poor choice of documentation here.
trackpy/linking/subnet.py
Outdated
@@ -10,6 +10,7 @@ | |||
|
|||
try: | |||
from sklearn.neighbors import BallTree | |||
from sklearn.metrics import DistanceMetric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line appears to raise an ImportError
in many of our automated tests, which in turn breaks BallTree
even though sklearn
is installed and detected. Here is an example: https://github.com/soft-matter/trackpy/runs/5127629137?check_suite_focus=true#step:6:2336 . Is DistanceMetric
absent from earlier sklearn versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it appears to be the difference between e.g. scikit-learn 1.0.2 and scikit-learn 0.24.2 . Since DistanceMetric was only released in late 2021, this code should check for its presence separately from BallTree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for interpreting the CI output, I was seeing a failure but was having a hard time deducing the cause without further investigation. Looking through the API versions, it looks like they recently moved DistanceMetric
over to sklearn.metrics
. Before, it was in sklearn.neighbors
. I'll update the PR to resolve this- first try importing DistanceMetric
from sklearn.metrics
and if an ImportError
is raised, import it from sklearn.neighbors
. It looks like sklearn.neighbors.DistanceMetric
should work as far back as at least scikit-learn 0.16 (March 2015); should I still have it separately check for presence, or is it safe to assume that users will have a version newer than this? Happy to do either, but of course assuming that it is there is easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!! We don't have a formal policy but 6 years is definitely past our horizon for backwards compatibility. My own personal preference is that it be roughly 1/2 of the time to complete a PhD :). Thanks for clearing that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now resolved this, and it looks to be running on my end. Hopefully the CI likes it too :).
It seems like I have increased my percentage of CI tests passed, but am still failing a few. It seems like the CI is failing on some code I didn't touch, but perhaps I caused it to fail from my edits. I'm happy to keep revising my PR as needed, but I would appreciate any insights on why the CI is failing. Thanks for your help on this! |
Those are (almost) certainly errors due to a new Pandas or matplotlib release, not in your code. To be safe, let's plan to merge once that's resolved. Here's an instructive failed test that leads me to believe that: https://github.com/soft-matter/trackpy/runs/5276655344?check_suite_focus=true#step:5:41 |
Thanks much for the insight and for your help with this PR! |
@freemansw1 the CI and compatibility problems that were holding this PR back have (hopefully) been fixed in the |
I'm happy to rebase, but I don't see any changes to the main branch (actually I don't see |
Sorry, I just want to check that you aren't waiting for me on anything @nkeim |
Looks like whatever dependency (or GHA) issues were at play in February are now gone; all checks pass. Thanks for your contribution @freemansw1 and for your patience! |
Per #691, I added the ability to use a custom scikit-learn
DistanceMetric
function as adist_func
parameter intrackpy.link
. I also added a new test for this method. I think that this would allow users to extend theDistanceMetric
class, too, with compiled distance functions (and therefore faster than passing in a python function).