Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT] Changed line endings of ElectricDevices.csv and GunPoint.csv from CRLF to LF #4452

Merged
merged 1 commit into from
Apr 12, 2023
Merged

[MNT] Changed line endings of ElectricDevices.csv and GunPoint.csv from CRLF to LF #4452

merged 1 commit into from
Apr 12, 2023

Conversation

yarnabrina
Copy link
Collaborator

Reference Issues/PRs

See also #4446

What does this implement/fix? Explain your changes.

Most datasets in sktime have LF line endings, except ElectricDevices.csv and GunPoint.csv. This triggered auto-fix if configured in global gitattributes or global gitconfig. This PR attempts to convert those into LF line endings.

Before
git ls-files --eol | grep -E "*.csv"
i/lf    w/lf    attr/                   sktime/datasets/data/Airline/Airline.cs
i/lf    w/lf    attr/                   sktime/datasets/data/Longley/Longley.cs
i/lf    w/lf    attr/                   sktime/datasets/data/Lynx/Lynx.csv
i/lf    w/lf    attr/                   sktime/datasets/data/PBS_dataset/PBS_dataset.csv
i/lf    w/lf    attr/                   sktime/datasets/data/ShampooSales/ShampooSales.csv
i/lf    w/lf    attr/                   sktime/datasets/data/Uschange/Uschange.csv
i/mixed w/mixed attr/                   sktime/datasets/data/segmentation/ElectricDevices.csv
i/mixed w/mixed attr/                   sktime/datasets/data/segmentation/GunPoint.csv
i/lf    w/lf    attr/                   sktime/datasets/data/solar/solar.csv
After
git ls-files --eol | grep -E "*.csv"
i/lf    w/lf    attr/text               sktime/datasets/data/Airline/Airline.csv
i/lf    w/lf    attr/text               sktime/datasets/data/Longley/Longley.csv
i/lf    w/lf    attr/text               sktime/datasets/data/Lynx/Lynx.csv
i/lf    w/lf    attr/text               sktime/datasets/data/PBS_dataset/PBS_dataset.csv
i/lf    w/lf    attr/text               sktime/datasets/data/ShampooSales/ShampooSales.csv
i/lf    w/lf    attr/text               sktime/datasets/data/Uschange/Uschange.csv
i/lf    w/lf    attr/text               sktime/datasets/data/segmentation/ElectricDevices.csv
i/lf    w/lf    attr/text               sktime/datasets/data/segmentation/GunPoint.csv
i/lf    w/lf    attr/text               sktime/datasets/data/solar/solar.csv

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

  • Does any of the functionality break that is not covered by unit tests?

Did you add any tests for the change?

No

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors.
  • Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.
For new estimators
  • I've added the estimator to the online documentation.
  • I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

@yarnabrina yarnabrina marked this pull request as ready for review April 10, 2023 10:10
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks simple enough, although the diff says there is a change in 13405 lines.
Can I ask for the clarification: the problem is in every single line, not just in the final line?

@yarnabrina
Copy link
Collaborator Author

Can I ask for the clarification: the problem is in every single line, not just in the final line?

I would guess so, not certain. To be specific, this is all I did:

  1. open old file VS Code
  2. change from CRLF to LF in bottom right
  3. stage + commit

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 10, 2023

hm, I see.

Can I ask what is, to your knowledge, the easiest way to compare the content to be equal pre and post PR?
These are moderatly important benchmarking data sets, so we should avoid changing the content by accident (and I'm not sure how to best check that)

@yarnabrina
Copy link
Collaborator Author

Can I ask what is, to your knowledge, the easiest way to compare the content to be equal pre and post PR?

Something like this?

git difftool --tool=vimdiff origin/fix-line-endings origin/fix-line-endings~1

It shows me this (it's in LESS mode)

image

And ^M is carriage return character only, so not unexpected to see that difference from CRLF -> LF.

(There are probably better ways, I just don't know.)

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these have the same content except for line endings changed.
Line endings are also consiste across data files post-PR.

@fkiraly fkiraly added maintenance Continuous integration, unit testing & package distribution module:datasets&loaders data sets and data loaders labels Apr 12, 2023
@fkiraly fkiraly changed the title Changed line endings of ElectricDevices.csv and GunPoint.csv from CRLF to LF [MNT] Changed line endings of ElectricDevices.csv and GunPoint.csv from CRLF to LF Apr 12, 2023
@fkiraly fkiraly merged commit a095631 into sktime:main Apr 12, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Apr 12, 2023

Does any of the functionality break that is not covered by unit tests?

I don't think so - loading these is actually also tested, with some dependent functionality.

@yarnabrina yarnabrina deleted the fix-line-endings branch April 12, 2023 15:25
fkiraly pushed a commit that referenced this pull request Apr 22, 2023
… `statsmodels` based forecasters to reduce code duplication (#4465)

Closes #4447. Depends on #4439 and #4452.

The implementation of `_predict_interval` in `AutoETS`, `SARIMAX` and
`UnobservedComponents` were very similar. This PR helps to define these
at a single place at base level which should help maintaining easier.

Observations on multivariate estimators as being discussed in #4447.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Continuous integration, unit testing & package distribution module:datasets&loaders data sets and data loaders
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants