Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformers refactor: summarizer and base class #1663

Merged
merged 11 commits into from Dec 7, 2021

Conversation

fkiraly
Copy link
Collaborator

@fkiraly fkiraly commented Nov 28, 2021

This refactors the SummaryTransformer to use the new transformer template.
Some minor changes there - transform was still present and overwrote _transform, and the "inner type" tags were set incorrectly.

In addition, this fixes two bugs:

  • in the base template, appears when a Panel is passed to a Series-wise transform that outputs Primitives - in this case, thet different rows were not concatenated properly (some index issues).
  • the base class now enforces that fit is called before transform, even if the fit-in-transform flag as True (which means fit is empty and can be skipped, but fit still needs to be called). As a shorthand for "transform right away", fit_transform should be called, not transform.

@fkiraly fkiraly requested a review from RNKuhns November 28, 2021 22:04
@fkiraly fkiraly added bug Something isn't working module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing labels Nov 28, 2021
Copy link
Contributor

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested the SummaryTransformer for panel data portion, it all works fine with univariate data. With an input of shape (20, 1, 24) I get an output of shape (20, 7) and I can build a sklearn classifier with no issue.

Multivariate data seems to have an issue, however. With an input of (10, 6, 100) I get an output of shape (60, 7). The conversion process has given me 50 extra cases, I would expect (10, 42).

If we need this in to be on-track for release, I'm not going to push that the above be sorted now, as you could probably reshape the above output for the correct format (even though that would be annoying). Will approve as the other changes are good and let you decide if more should be added to the PR.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Dec 7, 2021

Multivariate data seems to have an issue, however. With an input of (10, 6, 100) I get an output of shape (60, 7). The conversion process has given me 50 extra cases, I would expect (10, 42).

What should the result be in the multivariate case, in your opinion? Would you expect multi-index across rows, or across columns? Or, something else?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Dec 7, 2021

@MatthewMiddlehurst, I opened an issue here, I think it's primarily a design question:
#1719

FYI, the vectorization over the variables isn't coming from me, i.e., not automated by the base class.

If you didn't intend it, it's probably coming from apply-like pandas commands which also happen to work seamlessly in the multivariate case, without you taking it into account.

@fkiraly fkiraly merged commit e4549b4 into main Dec 7, 2021
@fkiraly fkiraly deleted the transformers-primitives-fixes branch December 7, 2021 17:31
Bugfixing automation moved this from Under review to Fixed/resolved Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Development

Successfully merging this pull request may close these issues.

None yet

3 participants