[ENH] Implementing SplitterSummarizer Transformer #4759

BensHamza · 2023-06-24T16:26:59Z

Reference Issues/PRs

Fixes #4664 .

What does this implement/fix? Explain your changes.

This PR introduces a new series-to-series transformer SplitterSummarizer.

Does your contribution introduce a new dependency? If yes, which one?

No

Any other comments?

The testing isn't working

PR checklist

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
Optionally, I've added myself and possibly others to the CODEOWNERS file - do this if you want to become the owner or maintainer of an estimator you added.
See here for further details on the algorithm maintainer role.
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

For new estimators

I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
dependency isolation, see the estimator dependencies guide.

yarnabrina · 2023-06-25T07:56:06Z

Thanks for your contribution.

The testing isn't working

Unfortunately, your code style is a bit different the rest of the codebase, and this is being checked by the configured hooks in .pre-commit-config.yaml file.

What you can do are the following:

install pre-commit python package from in your local environment (pip install pre-commit)
install configured git hooks (pre-commit install)
run configured git hooks manually once now for all files (pre-commit run --all-files)
stage all of the changes (git add .)
commit the staged changes (git commit -m "code quality changes")
push the committed changes (git push)

Please review the changes before you stage. Ideally, they should match what you see here.

Also, feel free to use any commit message of your choice.

For your next commits, all the configured git hooks will run automatically for the staged files while you run git commit in future. You may read up more about pre-commit here.

fkiraly · 2023-07-07T11:16:18Z

do you still need help on this, or input, @BensHamza?

BensHamza · 2023-07-09T09:44:52Z

do you still need help on this, or input, @BensHamza?

Sorry for the time it takes and thank you for the follow-up. I needed some time to get used to the codebase.

I had first a problem understanding how to make the pipeline mentionned in #4664 work since PCA() is a series-to-series transformer and we first wanted a series-to-primitives transformer as param. From what I understood in discord, we want SplitterSummarizer to work in a general context ( eg. SplitterSummarizer(SummaryTransformer(),ExpandingWindowSplitter()) ).

This last code version works fine on this last example, Can you please tell me if it is the right guess? Also, there are surely some improvements to this version. I would appreciate any feedback. Thank you !

fkiraly · 2023-07-09T09:56:52Z

I had first a problem understanding how to make the pipeline mentionned in #4664 work since PCA() is a series-to-series transformer and we first wanted a series-to-primitives transformer as param.

That's an excellent point!

I made a mistake here, I was thinking of a (hypothetical) PCA transformer that produces PCA loadings probably (a primitive/row vector), rather than the actual PCA transforme that is currently in the code base (this produces scores, column vectors).

From what I understood in discord, we want SplitterSummarizer to work in a general context ( eg. SplitterSummarizer(SummaryTransformer(),ExpandingWindowSplitter()) ).

Yes, you understand correctly imo!

sktime/transformations/series/summarize.py

fkiraly

@BensHamza, looks great! This is exactly what we were looking for!

Let's see whether the tests run through, nice contribution!

I left some comments above. We could merge as is, but it would be great to straighten out the docstrings and clarify the index parameter.

Additional features we can leave for later, e.g., an option to remember data seen in fit and use the union of data seen in fit / transform - that might be important for forecasting use cases.

sktime/transformations/series/summarize.py

fkiraly · 2023-07-10T21:35:33Z

Re test failures, this is due to the test parameters using the catch22 transformer, which has some soft dependencies (numba I think). Optimally, the test parameters do not rely on additional packages - so you could replace it with the SummaryTransformer.

fkiraly

Looks great!

Would approve if all tests pass.

fkiraly

Really great, good to go!

I've made two minor changes to expedite this for the release:

added the estimator to the API reference, so users can find it easily
fixed the docstring - for multi-line cells, use ... and not >>>

(see the latest commits for this)

fkiraly · 2023-07-14T06:06:15Z

FYI, I think you did not change the default from ExpandingWindowSplitter to SlidingWindowSplitter as you suggested?

BensHamza · 2023-07-14T09:18:38Z

I've made two minor changes to expedite this for the release

Thank you @fkiraly for these changes and the explanations.

FYI, I think you did not change the default from ExpandingWindowSplitter to SlidingWindowSplitter as you suggested?

My bad. I did the changes.

I was having a problem passing the test-nosoftdeps-full , I didn't get where it fails.

fkiraly · 2023-07-14T11:53:05Z

I was having a problem passing the test-nosoftdeps-full , I didn't get where it fails.

sure - let's check where/if it still fails.

…/sktime into pr/4759

fkiraly · 2023-07-15T11:39:27Z

FYI, I shortened the windows in some test cases in get_test_params, as the test data is short and longer window length will raise errors.

BensHamza · 2023-07-15T20:34:26Z

Thank you @fkiraly , I learned a lot in the process.

fkiraly · 2023-07-15T23:04:18Z

well, @BensHamza, hopefully this is just the start of a long and fruitful collaboration!

BensHamza added 2 commits June 24, 2023 16:38

First version

ec47ab0

Some changes in get_test_params()

12706ce

fkiraly added the module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing label Jun 25, 2023

code quality changes

8ceb1d6

fkiraly added implementing algorithms Implementing algorithms, estimators, objects native to sktime enhancement Adding new functionality labels Jun 25, 2023

new code version

6f4b8c2

BensHamza marked this pull request as ready for review July 9, 2023 09:56

BensHamza requested review from achieveordie, benHeid, fkiraly and yarnabrina as code owners July 9, 2023 09:56