New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Adding SAX+MINDIST to KNN #152
Conversation
Hello @GillesVandewiele! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-03-16 15:28:59 UTC |
Could also maybe important to note that the technique doesn't really work that well, the results are quite consistent with the paper though, but I thought SAX would work insanely fast, which turns out not to be the case currently: EDIT: Made a mistake in the code when printing times for SAX, updated results are below (I'll commit the fix soon)
|
…f-bounds-error in kneighbors
Great @GillesVandewiele I will try to review this PR asap, probably early next week. I am still unsure if we should target to get a fast SAX+MINDIST in this PR or delay that to a later PR. |
Hi @rtavenar there's no rush in reviewing I'd say. I still need to fix Travis etc, so maybe after that passes reviewing is more appropriate. EDIT: Another TODO is update docs + tests (mostly putting there here for myself so I don't forget ;)) |
Hi Romain, Seems like travis is now building successfully. I did not add any tests yet though, as it doesn't really fit in the test_neighbors.py script currently (its output will not be equal to that of the euclidean KNN). Any suggestions on what type of tests would be interesting to add? Kind regards, |
That looks great @GillesVandewiele ! Maybe a minimal test would be to check that the distances returned by kNN's kneighbors are indeed lower bounding L2 (I think they should be, if I remember the original paper correctly, but maybe it would be worth checking :) Hope this helps |
Codecov Report
@@ Coverage Diff @@
## dev #152 +/- ##
==========================================
+ Coverage 93.62% 93.79% +0.17%
==========================================
Files 25 25
Lines 3404 3419 +15
==========================================
+ Hits 3187 3207 +20
+ Misses 217 212 -5
Continue to review full report at Codecov.
|
Hi Romain, I added the test. I'm not sure how this codecov report is actually generated, it's weird that it is reporting about barycenters.py in this PR... There's also one other point that's maybe worth discussing. Currently, I always perform a SAX transformation on the provided input data, but this then actually assumes that the user has not already done that. Moreover, the current SAX transformation will fail pretty badly if the data is not standard-normalized. So maybe it could be better to actually change the |
This is a very good question. I do not think we can use Pipelines here, because we need parameters of the previous step for the distance computation, so I guess the way you did it is reasonable, or at least I do not see a more straight-forward approach. Maybe having a Pipeline (standardize + SAX kNN) in your gallery example (together with some comment saying that this is good practice for SAX to standardize your data beforehand) would be a good idea. And a note in the kNN doc (under the metric param) also. Let me know when I should start my review, I could probably do it next week if you feel your code is in sufficiently advanced state. Have a nice week-end, |
Two other options would be the following:
What do you want me to do about the codecov btw? |
Hi, I would be in favor of option 1.
Concerning codecov, I can have a look when I'll do a review. |
Forgot to mention it yesterday, but I think the code is now ready to be reviewed, whenever you have time @rtavenar. I refactored quite a lot of code in |
I did not find time to review it this week, sorry. I hope I can do it next week. |
I handled most of the feedback and left some comments where needed. |
… _transform of SAX
One test if failing for KNN with SAX. My guess would be that X = to_time_series_dataset([[1, 2, 3, 4],
[1, 2, 3],
[2, 5, 6, 7, 8, 9],
[3, 5, 6, 7, 8]]) is transformed into SAX with two segements clf = KNeighborsTimeSeriesClassifier(metric="sax", n_neighbors=1,
metric_params={'n_segments': 2}) and all the new time series are equal to Could you try to print the SAX transformation for this dataset? You could also try to make the time series for class 1 decreasing instead of increasing, it should fix this. |
Thanks Johann! Let me see if I can fix it :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Gilles! @rtavenar
No thank you for again some great feedback! Always a great learning experience to do a PR here! Now next up should be some improvements for the ShapeletModel. I hope I can do it faster than this one ;) |
This looks good to me too. I had a simple comment regarding the documentation and my last comment is that you can document your changes in the changelog before merge. Great job, once again, thanks @GillesVandewiele ! |
Hi Romain, Thanks! Where/how do you want me to add it to the changelog? Do I add the following just on top?
|
Yes you can create a |
Alright done! :) |
This PR contains the following changes:
Added BaseEstimator to classes in
preprocessing
module so that they can be used within a Pipeline (errors were raised when usingTimeSeriesScalerMeanVariance
)Fixed a bug in
kneighbors
method which would always return [0] as nearest neighbor for every sample.kneighbors
so that its result is consistent with sklearn. There was a small difference in breaking ties (tslearn would pick largest index while sklearn would pick the smallest index). Now the following code is equivalent:Some remarks:
njit
decorator to cdist_sax did not work immediately, I could perhaps use some help with that.