[ENH] refactored `ColumnConcatenator`, rewrite using `pd-multiindex` inner type #2379

fkiraly · 2022-04-03T17:20:44Z

This PR is a rewrite of the ColumnConcatenator, using pd-multiindex.
Originally in #2369.

The conversion issues are fixed now, should all be good to go.

Relies on the fix to data loaders to ensure unique input index:
#3031

from first post:

Unfortunately, this has dredged up all kinds of issues with type casting and conversion, so I am isolating it to one PR and removing it from #2369.

The failures in this refactor are linked to the failures in the conversions here: #2375, since in the tests we make conversions between the nested data frame format and the pd-multiindex format.

fkiraly · 2022-07-16T12:16:16Z

Test failure is due to #1290 - the old ColumnConcatenator with its byzantine code was apparently able to cope with repeated instance indices, whereas the pandas native functionality breaks. Fix in #3029.

This reverts commit f54f352.

This fixes a recurring issue with duplicated indices from the time series classification datasets. This has been repeatedly breaking things downstream and made refactors difficult. Fixes #1290, fixes #2331 for the instance index, addresses the example in #1893 fixes the problem that prevents #2379 to be refactored. The fix makes changes in the `datasets` module that ensures unique index for loaded time series classification datasets, via `reset_index(drop=True)` Note: this does *not* prohibit duplicate time index, only duplicate instance name (index of the outer `DataFrame` in `nested_univ`).

…nstance index in `sktime` datasets (#3029) This fixes a recurring issue with duplicated indices from the time series classification datasets. This has been repeatedly breaking things downstream and made refactors difficult. Fixes #1290, fixes #2331 for the instance index, addresses the example in #1893 fixes the problem that prevents #2379 to be refactored. Initial bug report about duplicate indices: #1290 Nature of the fix: * adds a check for `nested_univ` mtype that prohibits duplicate instance index * fixes an instance of duplicate indices in the benchmarking module * ensures that any `nested_univ` returns of the old `SeriesToSeriesRowTransformer` have unique instance index Note: this does *not* prohibit duplicate time index, only duplicate instance name (index of the outer `DataFrame` in `nested_univ`).

ColumnConcatenator refactored

928a9f1

fkiraly added module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing refactor Restructuring without changing its external behavior. Neither fixing a bug nor adding a feature. labels Apr 3, 2022

fkiraly requested review from aiwalter and TonyBagnall as code owners April 3, 2022 17:20

fkiraly added 2 commits April 20, 2022 00:03

Merge branch 'main' into Columnconcatenator-refactored

6384782

Merge branch 'main' into Columnconcatenator-refactored

7c2dd65

fkiraly mentioned this pull request Jul 10, 2022

DIAGNOSTIC: pandas input with nullable dtype-s - do not merge #2966

Draft

fkiraly added 4 commits July 16, 2022 11:38

fixed index

2cae578

add check for nested_univ dupicate index

f54f352

ensure index is unique

514ffb9

fix loaders index reset

bc8c4ea

fkiraly mentioned this pull request Jul 16, 2022

[ENH] add check for nested_univ duplicate index and ensure unique instance index in sktime datasets #3029

Merged

reset index only if pandas

42a70ee

Revert "add check for nested_univ dupicate index"

5c033d4

This reverts commit f54f352.

fkiraly changed the title ~~[ENH] refactored ColumnConcatenator, rewrite using pd-multiindex~~ [ENH] refactored ColumnConcatenator, rewrite using pd-multiindex inner type Jul 16, 2022

fkiraly added 2 commits July 16, 2022 14:55

Merge branch 'main' into Columnconcatenator-refactored

64f6f07

Merge branch 'loaders-unique-index' into Columnconcatenator-refactored

b6919eb

fkiraly merged commit ec41ee5 into main Jul 24, 2022

fkiraly deleted the Columnconcatenator-refactored branch July 24, 2022 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] refactored `ColumnConcatenator`, rewrite using `pd-multiindex` inner type #2379

[ENH] refactored `ColumnConcatenator`, rewrite using `pd-multiindex` inner type #2379

fkiraly commented Apr 3, 2022 •

edited

fkiraly commented Jul 16, 2022

[ENH] refactored ColumnConcatenator, rewrite using pd-multiindex inner type #2379

[ENH] refactored ColumnConcatenator, rewrite using pd-multiindex inner type #2379

Conversation

fkiraly commented Apr 3, 2022 • edited

fkiraly commented Jul 16, 2022

[ENH] refactored `ColumnConcatenator`, rewrite using `pd-multiindex` inner type #2379

[ENH] refactored `ColumnConcatenator`, rewrite using `pd-multiindex` inner type #2379

fkiraly commented Apr 3, 2022 •

edited