Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Implementing SplitterSummarizer Transformer #4759

Merged
merged 16 commits into from
Jul 15, 2023

Conversation

BensHamza
Copy link
Contributor

@BensHamza BensHamza commented Jun 24, 2023

Reference Issues/PRs

Fixes #4664 .

What does this implement/fix? Explain your changes.

This PR introduces a new series-to-series transformer SplitterSummarizer.

Does your contribution introduce a new dependency? If yes, which one?

No

Any other comments?

The testing isn't working

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • Optionally, I've added myself and possibly others to the CODEOWNERS file - do this if you want to become the owner or maintainer of an estimator you added.
    See here for further details on the algorithm maintainer role.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

@yarnabrina
Copy link
Collaborator

Thanks for your contribution.

The testing isn't working

Unfortunately, your code style is a bit different the rest of the codebase, and this is being checked by the configured hooks in .pre-commit-config.yaml file.

What you can do are the following:

  1. install pre-commit python package from in your local environment (pip install pre-commit)
  2. install configured git hooks (pre-commit install)
  3. run configured git hooks manually once now for all files (pre-commit run --all-files)
  4. stage all of the changes (git add .)
  5. commit the staged changes (git commit -m "code quality changes")
  6. push the committed changes (git push)

Please review the changes before you stage. Ideally, they should match what you see here.

Also, feel free to use any commit message of your choice.

For your next commits, all the configured git hooks will run automatically for the staged files while you run git commit in future. You may read up more about pre-commit here.

@fkiraly fkiraly added the module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing label Jun 25, 2023
@fkiraly fkiraly added implementing algorithms Implementing algorithms, estimators, objects native to sktime enhancement Adding new functionality labels Jun 25, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Jul 7, 2023

do you still need help on this, or input, @BensHamza?

@BensHamza
Copy link
Contributor Author

do you still need help on this, or input, @BensHamza?

Sorry for the time it takes and thank you for the follow-up. I needed some time to get used to the codebase.

I had first a problem understanding how to make the pipeline mentionned in #4664 work since PCA() is a series-to-series transformer and we first wanted a series-to-primitives transformer as param. From what I understood in discord, we want SplitterSummarizer to work in a general context ( eg. SplitterSummarizer(SummaryTransformer(),ExpandingWindowSplitter()) ).

This last code version works fine on this last example, Can you please tell me if it is the right guess? Also, there are surely some improvements to this version. I would appreciate any feedback. Thank you !

@BensHamza BensHamza marked this pull request as ready for review July 9, 2023 09:56
@fkiraly
Copy link
Collaborator

fkiraly commented Jul 9, 2023

I had first a problem understanding how to make the pipeline mentionned in #4664 work since PCA() is a series-to-series transformer and we first wanted a series-to-primitives transformer as param.

That's an excellent point!

I made a mistake here, I was thinking of a (hypothetical) PCA transformer that produces PCA loadings probably (a primitive/row vector), rather than the actual PCA transforme that is currently in the code base (this produces scores, column vectors).

From what I understood in discord, we want SplitterSummarizer to work in a general context ( eg. SplitterSummarizer(SummaryTransformer(),ExpandingWindowSplitter()) ).

Yes, you understand correctly imo!

fkiraly
fkiraly previously approved these changes Jul 9, 2023
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BensHamza, looks great! This is exactly what we were looking for!

Let's see whether the tests run through, nice contribution!

I left some comments above. We could merge as is, but it would be great to straighten out the docstrings and clarify the index parameter.

Additional features we can leave for later, e.g., an option to remember data seen in fit and use the union of data seen in fit / transform - that might be important for forecasting use cases.

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 10, 2023

Re test failures, this is due to the test parameters using the catch22 transformer, which has some soft dependencies (numba I think). Optimally, the test parameters do not rely on additional packages - so you could replace it with the SummaryTransformer.

fkiraly
fkiraly previously approved these changes Jul 12, 2023
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Would approve if all tests pass.

fkiraly
fkiraly previously approved these changes Jul 13, 2023
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great, good to go!

I've made two minor changes to expedite this for the release:

  • added the estimator to the API reference, so users can find it easily
  • fixed the docstring - for multi-line cells, use ... and not >>>

(see the latest commits for this)

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 14, 2023

FYI, I think you did not change the default from ExpandingWindowSplitter to SlidingWindowSplitter as you suggested?

@BensHamza
Copy link
Contributor Author

I've made two minor changes to expedite this for the release

Thank you @fkiraly for these changes and the explanations.

FYI, I think you did not change the default from ExpandingWindowSplitter to SlidingWindowSplitter as you suggested?

My bad. I did the changes.

I was having a problem passing the test-nosoftdeps-full , I didn't get where it fails.

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 14, 2023

I was having a problem passing the test-nosoftdeps-full , I didn't get where it fails.

sure - let's check where/if it still fails.

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 15, 2023

FYI, I shortened the windows in some test cases in get_test_params, as the test data is short and longer window length will raise errors.

@fkiraly fkiraly merged commit c2dd60c into sktime:main Jul 15, 2023
23 of 24 checks passed
@BensHamza BensHamza deleted the add_splittersummarizer branch July 15, 2023 20:32
@BensHamza
Copy link
Contributor Author

Thank you @fkiraly , I learned a lot in the process.

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 15, 2023

well, @BensHamza, hopefully this is just the start of a long and fruitful collaboration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality implementing algorithms Implementing algorithms, estimators, objects native to sktime module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] SplitterSummarizer which applies a series-to-primitive transformer per split
3 participants