Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fix write_ndarray_to_tsfile for classLabel = False #3303

Merged
merged 1 commit into from
Aug 21, 2022
Merged

[BUG] Fix write_ndarray_to_tsfile for classLabel = False #3303

merged 1 commit into from
Aug 21, 2022

Conversation

paulbauriegel
Copy link
Contributor

Reference Issues/PRs

Fixes the following issue when reading a tsfile written with write_ndarray_to_tsfile

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
/tmp/ipykernel_1092/2889448795.py in <module>
----> 1 df = load_from_tsfile_to_dataframe("./amex_train_1.ts/AMEX-Data/AMEX-Data.ts", replace_missing_vals_with='NaN', return_separate_X_and_y=False)

/tmp/ipykernel_1092/3989994789.py in load_from_tsfile_to_dataframe(full_file_path_and_name, return_separate_X_and_y, replace_missing_vals_with)
    159                         print(has_data_tag)
    160                         raise IOError(
--> 161                             "a full set of metadata has not been provided "
    162                             "before the data"
    163                         )

OSError: a full set of metadata has not been provided before the data

What does this implement/fix? Explain your changes.

The issue in load_from_tsfile_to_dataframe happend because has_class_labels_tag = False. This is because the line

elif line.startswith("@classlabel"):
     ....
     has_class_labels_tag = True

is never executed since the line.startswith("@classlabel") is False

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

How should the tag "@classlabel" be named, write_ndarray_to_tsfile and load_from_tsfile_to_dataframe depend on each other here. Another solution would be to allow multiple way of writing this tag in load_from_tsfile_to_dataframe

For all contributions
  • I've added myself to the list of contributors.
  • Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
  • I've added unit tests and made sure they pass locally.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 20, 2022

@achieveordie, I think you recently looked at these I/O utilities - do you have any comments?

@fkiraly fkiraly added bugfix Fixes a known bug or removes unintended behavior module:datasets&loaders data sets and data loaders labels Aug 20, 2022
@fkiraly fkiraly changed the title [BUG] Fix write_ndarray_to_tsfile for classLabel = false [BUG] Fix write_ndarray_to_tsfile for classLabel = false Aug 20, 2022
@fkiraly
Copy link
Collaborator

fkiraly commented Aug 20, 2022

@paulbauriegel, thanks for the fix!

Just started the CI to check.

Btw, do you know whether there is a test that checks for the classLabel=False case? If not, would appreciate a test, or advice what the test should check.

@paulbauriegel
Copy link
Contributor Author

@fkiraly As far as I understand you have two test functions in sktime/datasets/tests/test_data_io.py

  • test_load_from_tsfile_to_dataframe to test the load function
  • test_write_dataframe_to_ts_success/test_write_dataframe_to_ts_fail to test if writing works in general

So far there is no function that actually checks what is written. So I think it would make sense to implement a test that verifies the output of write_ndarray_to_tsfile.
I can write such a test if you would like me to.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 20, 2022

I can write such a test if you would like me to.

that would be neat, much obliged!

@fkiraly fkiraly changed the title [BUG] Fix write_ndarray_to_tsfile for classLabel = false [BUG] Fix write_ndarray_to_tsfile for classLabel = False Aug 20, 2022
@achieveordie
Copy link
Collaborator

I believe this is due to a typo that this PR fixes. In write_ndarray_to_tsfile():

    if class_label is not None:
        space_separated_class_label = " ".join(str(label) for label in class_label)
        file.write(f"@classLabel true {space_separated_class_label}\n")
    else:
        file.write("@class_label false\n")

We're passing two conflicting names: @classLabel and @class_label; which itself is different from:

elif line.startswith("@classlabel"):
...

in load_from_tsfile_to_dataframe().

The right way would be to change these to @classlabel as @paulbauriegel has done. And I agree that we require a test that loads content stored by our write method, for all such properties. I see a similar problem with @equalLength and @equallength which I think are referring to the same metadata spec.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 21, 2022

Hm - I think this discrepancy between specifier strings indicates that we should write a file format specification for ts and tsf. We had people asking for this before. Unfortunately, the time series classification crowd has never properly documented the file formats, as far as I know.

We already have an issue here: #2043

FYI @ltsaprounis, @TNTran92 who were involved in another related discussion.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 21, 2022

I'll merge this PR so it is in the next days' release.
Later additions that would be much appreciated:

  • test for the classLabel=False case
  • file format specification doc

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@fkiraly fkiraly merged commit 82f75e7 into sktime:main Aug 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Fixes a known bug or removes unintended behavior module:datasets&loaders data sets and data loaders
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants