New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH, BUG] Distance refactor and bug fix #2268
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…puted for lloyds based algorithms
MatthewMiddlehurst
previously approved these changes
Apr 9, 2022
note to self, wait until after Franz does pydata thing to merge this |
# Conflicts: # sktime/_contrib/distance_refactor.py # sktime/_contrib/tests/test_data_io.py # sktime/_contrib/tests/test_experiments.py # sktime/_contrib/timing_comparisons.py
MatthewMiddlehurst
approved these changes
Apr 13, 2022
Indeed, the pydata thing is done now 😄 |
srggrs
added a commit
to Gridsight/sktime
that referenced
this pull request
Apr 19, 2022
* upstream/main: [ENH] more forecaster scenarios for testing: using `X` (sktime#2462) less uncertainty samples (sktime#2497) [ENH] remove error message on exogeneous X from DirRec reducer (sktime#2463) [BUG] fix accidental overwrite of default method/arg sequences in test scenarios (sktime#2457) changed references to fit-in-transform to fit_is_empty (sktime#2494) [MNT] loosen strict upper bound on `scipy` to 1.9.0 (sktime#2474) [DOC] Added clustering module to API docs (sktime#2429) [ENH] Add prediction intervals for `UnobservedComponets` forecaster (sktime#2454) [BUG] remove `alpha` arg from `_boxcox`, remove private method dependencies, ensure scipy 1.8.0 compatibility (sktime#2468) [BUG] removed metric integration notebook (sktime#2476) [ENH] refactor `_predict_moving_cutoff` and bugfix, outer `update_predict_single` should be called (sktime#2466) [ENH, BUG] Distance refactor and bug fix (sktime#2268) [ENH] add new argument return_tags to all_estimators (sktime#2410)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
module:distances&kernels
dists_kernels and distances modules: time series distances, kernels, pairwise transforms
refactor
Restructuring without changing its external behavior. Neither fixing a bug nor adding a feature.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A range of internal changes to distances module and how they are used. The primary change is in the internal representation. From the beginning, we assumed an instance was of shape (m,d), where m is the series length and d is the number of dimensions. The reason for this was because thats how others did it, and there were some native python operations that were more efficient. However, it was later decided to use and (n,d,m) throughout sktime for numpy representations (n is number of cases) . This meant any estimator using the distances would need to transform internally, since it is not possible to know if data is passed in (d,m) or (m,d) shape. This has caused errors in code (e.g. KNN was broken for a while in 2021 because the transform was removed) and its generally far too brittle. This PR converts all distances to expect input series to be shape (d,m). Given it is all compiled to C, there will be no performance hit, its all just loops in the end. This means we can remove all transposing from distance clusterers and classifiers.
This PR has got a bit too big, apologies. A few other things includes
This has been done by introducing another function average_of_slope_transform in ddtw rather than as a transformer. We could make it a transformer if thought desirable