FEA add ValueDifferenceMetric as a pairwise metric#796
FEA add ValueDifferenceMetric as a pairwise metric#796glemaitre merged 23 commits intoscikit-learn-contrib:masterfrom
Conversation
|
Hello @glemaitre! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-02-14 18:28:41 UTC |
|
Could you add in this branch some bench code so we could test vdm performance in order to improve it? |
Yep. We could do that. I made some profiling and I am actually not sure that we can speed-up the computation. |
Codecov Report
@@ Coverage Diff @@
## master #796 +/- ##
=======================================
Coverage 98.62% 98.62%
=======================================
Files 89 89
Lines 5881 5881
Branches 494 494
=======================================
Hits 5800 5800
Misses 80 80
Partials 1 1 Continue to review full report at Codecov.
|
|
@chkoar I am wondering if this metric should be public or private. Indeed, it required some |
We could add it without a leading |
It seems x20 slower than the current implementation. I think that we are fine to go indeed. |
I am starting to think that it could leave in the documentation as well. |
There was a problem hiding this comment.
In the case that we will use fit, probably we could inherit from the base estimator since it estimates from data.
class ValueDifferenceMetric(BaseEstimator):
def __init__(self, k=1, r=2):...
def fit(self, X, y):
# learning unique classes here
def pairwise(self, X, Y=None):...On the other hand, another option would be to require the data in the init.
class ValueDifferenceMetric:
def __init__(self, X, y, k=1, r=2):...
def pairwise(self, X, Y=None):...Additionally we could implement the callable API
vdm = ValueDifferenceMetric(...)
distance = vdm(x1, x2)Design wise, I would be in favor for the following in order to use the DistanceMetric API but is way to slow.
class ValueDifferenceMetric:
def __init__(self, X, y):...
def __call__(self, x1, x2, k=1, r=2):...
vdm = ValueDifferenceMetric(X,y)
metric = DistanceMetric.get_metric(vdm)
metric.pairwise(X)
# or
knn = KNearestNeighbors(metric=metric)All these assuming that X is ordinal encoded with ints.
Can we get rid of that after the sampling? |
yes, it is just temporary for the sampling for the NN search. |
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
|
@chkoar I think that I would like to see this PR merge as is and open another one for Do you see anything else to add? |
Agreed. |
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
|
OK I think this is good to be merged. I fixed the issue with what's new. @chkoar Feel free to merge. |
Done |
Uh oh!
There was an error while loading. Please reload this page.