Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Make TimeSince trafo faster by changing period diff calculation #4018

Merged
merged 1 commit into from Dec 30, 2022

Conversation

KishManani
Copy link
Contributor

@KishManani KishManani commented Dec 29, 2022

What does this implement/fix? Explain your changes.

Inspired by how @ltsaprounis calculated the integer time difference when using a PeriodIndex in the FourierFeatures transformer, I've switched to using this alternative method which is much faster. I measured a factor of ~100 speed up in _transform.

An example to measure the difference:

from sktime.utils._testing.hierarchical import _make_hierarchical
from sktime.transformations.series.time_since import TimeSince

df =_make_hierarchical(hierarchy_levels=(10000,), min_timepoints=200, max_timepoints=200)

trafo = TimeSince(freq="D")
trafo._fit(df)
trafo._transform(df)

This PR will have conflicting changes with #4015. Once #4015 has merged I will resolve the conflicts in this PR.

What should a reviewer concentrate their feedback on?

Code quality. I'm having to write a lot of comments to explain the context of what's going on. I'm unsure whether this is a bad smell.

PR checklist

For all contributions
  • I've added myself to the list of contributors.
  • I've added unit tests and made sure they pass locally.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 29, 2022

Merged #4015 so avoiding conflicts should now be easier.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm the speedup.
In a fresh python 3.10 env on windows, the example is reduced from 7 sec to 300 msec.

Functionality seems unaffected.

@KishManani
Copy link
Contributor Author

It appears there are no conflicts 🤷‍♂️ , that saves some time then!

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 30, 2022

Re comments: I think this is a bit of a smell, but I would say, ultimately it's coming from pandas (and how it handles indices). The dependent code isn't bad.

@fkiraly fkiraly merged commit 79977b2 into sktime:main Dec 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants