Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Programmatically fix (all) typos #5424

Merged
merged 7 commits into from Oct 22, 2023
Merged

Conversation

kianmeng
Copy link
Contributor

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR fixes typos found via:

  • codespell -S docs,./sktime/datasets -L bu,fpr,binay,afer,nd,methodd,strat,mape,lik,mapp,ser,te,ot,ue,fof,fwe,oen,vie,yur,ned,ths,als,mor,foor,reasy,dum,fo,yhe,mke,bui,som,caf,wqs,slq,veiw,yse
  • typos --format brief

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • Optionally, I've added myself and possibly others to the CODEOWNERS file - do this if you want to become the owner or maintainer of an estimator you added.
    See here for further details on the algorithm maintainer role.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

This PR fixes typos found via:
- `codespell -S docs,./sktime/datasets -L bu,fpr,binay,afer,nd,methodd,strat,mape,lik,mapp,ser,te,ot,ue,fof,fwe,oen,vie,yur,ned,ths,als,mor,foor,reasy,dum,fo,yhe,mke,bui,som,caf,wqs,slq,veiw,yse`
- `typos --format brief`
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -181,7 +181,7 @@ See docstring of ``AggrDist`` and ``FlatDist``.
AggrDist
FlatDist

Advanced time series kernels that cannot be expressed as aggrgates or flat applicates:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a typo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggrgates is

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, applicates

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, it is correcting applicates to applicants which is a false positive.

It is not correcting aggrgates to aggregates, which a false negative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean on the same line "aggrgates" is a typo, "aggregates" is the correct spelling

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's what I try to say in my last comment.

"applicates" -> "applicants", should not be corrected. "applicates" was correct.
So this is a false positive, because the spellchecker detects it but should not.
In my initial comment I was referring to this one.

"aggrgates" -> "aggregates" should be corrected, "aggrgates" is incorrect.
So this is a false negative, because the spellchecker does not detect it but should.

In my initial comment I overlooked this one, thank you for pointing this out, @szepeviktor!

Having said that, this is a very very specialized PR and a very peculiar location to help out at, but thanks nonetheless. Might I convince you to contribute to sktime in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might I convince you to contribute to sktime in the future?

I am highly blind to business features - what is the most important for you.
All software is the same cake to me, they just have different (thin) icing on top.

What I can do is

  • care about every byte in the repo
  • listen to what the (Python) ecosystem says
  • be in contact with dependency authors
  • run never before seen tests
  • find design errors
  • find implementation errors
  • invent end-to-end tests
    All my thoughts and tools are public here on GitHub.

The submerged part of your ice berg is my territory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥊 cakes vs. ice bergs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these sound very nice - everyone wants to eat the cake but not enough iceberg miners...

the largest impact is perhaps bugfixes?
https://github.com/sktime/sktime/projects/18

There are also some larger projects and design questions if interested - we can chat at one of the meetups.

@@ -783,7 +783,7 @@ def is_flat(obj):


class _ColumnEstimator:
"""Mixin class with utilities for by-column applicates."""
"""Mixin class with utilities for by-column applicants."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a typo

@@ -10,7 +10,7 @@
from sktime.utils.validation._dependencies import _check_soft_dependencies


def is_initalised_estimator(estimator: BaseEstimator) -> bool:
def is_initialised_estimator(estimator: BaseEstimator) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public function names should not be changed, even if they contain a typo

@@ -39,7 +39,7 @@
"EditDist",
"CNNClassifier",
"FCNClassifier",
"InceptionTimeClassifer",
"InceptionTimeClassifier",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has an impact on testing, but it is fine (the estimator should be excluded but was not)

@@ -79,7 +79,7 @@ def _make_series(
index = _make_index(n_timepoints, index_type)
if n_columns == 1 and return_mtype is None or return_mtype == "pd.Series":
return pd.Series(data.ravel(), index)
elif return_mtype is None or return_mtype == "pd.DataFrane":
elif return_mtype is None or return_mtype == "pd.DataFrame":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this changes the logic. I hope it does not break the test...

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, very useful!

Blocking:

  • a small number of suggested changes are small positives, I marked them
  • small number of instances: we must not change names of public functions or variables without deprecation, that's perhaps something for a separate PR where we respect the deprecation procedure: https://www.sktime.net/en/stable/developer_guide/deprecation.html
  • but just to reinforce, changes to private function and variable names, including tests that do not appear in public checker modules, are fine.

Non-blocking, but perhaps an idea for another PR:

it would be nice to have this as an automated pre-commit action or regular PR opened automatically! Where we would of course have to think about protecting function names, known false positives, etc.

@fkiraly fkiraly changed the title [DOC] Fix typos [DOC] Programmatically fix (all) typos Oct 14, 2023
@fkiraly fkiraly added documentation Documentation & tutorials maintenance Continuous integration, unit testing & package distribution labels Oct 14, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Oct 14, 2023

@kianmeng, I'd say you can get both the doc and maintenance badges for this - the maintenance one as this is a programmatic way to fix all the typos.

@szepeviktor
Copy link
Contributor

@kianmeng There is a mistake in the last commit.
"applicates" is okay.

fkiraly
fkiraly previously approved these changes Oct 21, 2023
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, thanks a lot!

Would be nice to have this as a regular CI element.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 21, 2023

FYI @yarnabrina, I updated this PR with the new CI workflow, and (as expected) it triggers tests for all modules.

However, there are a number of timeout failures which I cannot diagnose - any idea what is going on here?
Noting, that the timeouts do not seem to occur on the full diagnostic run, or other PRs.

@yarnabrina
Copy link
Collaborator

@fkiraly, as far as I can see, these are the errors in the new CI workflow: https://github.com/sktime/sktime/actions/runs/6599294180

There are two clear patterns:

  1. transformations failed in python 3.10 on all operating systems, complaining SignatureTransformer do not support 3.10 and above. This looks like a proper error, as python_version tag seems ignored.

  2. networks failed on all python versions on all operating systems, as it failed to collect any unit tests. But it seems that directory indeed has no unit tests in tests sub-directory, which is very surprising to me. I'm not much familiar with this component, so can you comment on that please? If it's decided to skip tests on this, we can remove it from CI.

  3. I couldn't find any timeout, can you please share a link?

Regarding failures in old CI workflow, one seems to be coming from occasional pytest-xdist parallelisation problem, the other from classification looks like proper error. But given it didn't occur in new CI or in any other job even in old CI, it also suggests may be something of sporadic nature? I'm not sure, so cannot comment really.


Also, I do note few other things:

  1. New CI is not testing all directories, as I created extras for few major ones only. Should we create new extras and/or plan how to integrate these in new CI, if you and other core developers agree to eventually replace old CI?

  2. This PR fixes a lot of typos automatically in all components. I did same thing in two of my open diagnostic PR to test the new CI: [MNT] fix typos in base module #5313 and [MNT] fix typos in forecasting module #5314 , which I planned to merge. Do you want me to close those in favour of this PR ([DOC] Programmatically fix (all) typos #5424) which handles a lot more to avoid conflict/confusion/etc.?

  3. This is a question for @kianmeng. I see you are using both codespell and typos. I'm not familiar with typos, but codespell does have pre-commit hooks. But in my experience, it's difficult to control skipping spell check for few words or lines with codespell automatically as some word may be indeed wrong by some typographical mistake where it may be intentionally introduced somewhere else (not saying it's recommended, but I know it happens sometimes and indeed I've seen in our codebase as well). As @fkiraly has asked above as well, do you have any suggestions how to handle all these cases automatically? Does typos help here in any way?

@kianmeng
Copy link
Contributor Author

@yarnabrina Unfortunately, nothing I can think off, perhaps @szepeviktor have a better approach?

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 22, 2023

hm, since the "other things" discussion is off-topic for this PR, should we move to #5101 ? I'll reply there to the discussion that is not specific to failures in CI of this PR.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 22, 2023

@fkiraly, as far as I can see, these are the errors in the new CI workflow: https://github.com/sktime/sktime/actions/runs/6599294180

Yes.

  1. transformations failed in python 3.10 on all operating systems, complaining SignatureTransformer do not support 3.10 and above. This looks like a proper error, as python_version tag seems ignored.

Hm, yes. So either it is an error with the new CI or at least two errors, as it does not error out on the old CI. I.e., it is either:

  • error in the new CI, it collects estimators that it should not
  • error in the old CI, it does not collect estimators that it should, so it did not detect a second error, in relation to the python version check.

A third scenario, where the trafo is used as component, is unlikely as the error message is not from the constructor.

  1. networks failed on all python versions on all operating systems, as it failed to collect any unit tests. But it seems that directory indeed has no unit tests in tests sub-directory, which is very surprising to me. I'm not much familiar with this component, so can you comment on that please? If it's decided to skip tests on this, we can remove it from CI.

The networks module is under construction, @luca-miniati and @achieveordie are afaik the most knoledgeable about it. It has networks used in estimators in other modules, so most - if not all - tests will be indirect by integration in other estimators.

Mid-term, clearly, we want to have systematic tests for the network module, too.

I couldn't find any timeout, can you please share a link?

Sorry, confusion on my side - I did not interpret the error logs correctly.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 22, 2023

FYI @kianmeng, most of the above does not have to do with your PR, so there is no action needed on your side - we are just using it to test new CI elements.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 22, 2023

Do you want me to close those in favour of this PR (#5424) which handles a lot more to avoid conflict/confusion/etc.?

I was aware, @yarnabrina - my idea was, we first use this PR to test the full CI - like you were doing in the two other PR, and then merge the "typo fixing" PRs in this sequence:

  • this
  • the two of yours

In particular, I would not close yours.
(I also hoped this would work silently if there are no issues coming from the new CI, but it didn't...)

Why: the fixes may not be identical, and in merging any conflicts have a good chance to indicate actual errors in the typo fixes.
We also test the new CI triggers for file changes.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 22, 2023

Re SignatureTransformer, this check works as it should:

from sktime.utils.validation._dependencies import _check_estimator_deps
from sktime.transformations.panel.signature_based import SignatureTransformer

_check_estimator_deps(SignatureTransformer)

so I now find it more likely that there's something in collection in the new CI?

@achieveordie
Copy link
Collaborator

It has networks used in estimators in other modules, so most - if not all - tests will be indirect by integration in other estimators.

That's correct. The network module is an abstraction of the Tensorflow / pytorch implementation of various estimators that are used by the classifier/regressor modules. They contain the model implementation which will be the same for both kinds of estimators, minus the head.

Therefore, there are no tests for this module.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 22, 2023

Therefore, there are no tests for this module.

Well, there could be...

  • testing build_network runs
  • testing tags and the like

@achieveordie
Copy link
Collaborator

Wouldn't we end up testing the same module twice during a single run? Once directly via networks/tests and once indirectly by other modules that use these networks?

@yarnabrina
Copy link
Collaborator

yarnabrina commented Oct 22, 2023

so I now find it more likely that there's something in collection in the new CI?

I am biased for sure, so if you don't mind, can you please share your python version and operating system? And, whether esig is installed in th environment or not?


My reasoning of why old CI do not fail is this:

'esig==0.9.7; python_version < "3.10"'

This is from all_extras_pandas2, which does not even install esig in case of python version is >=3.10. So, the test is not collected.

Based on my understanding, this should not be the general solution as same soft dependency may be required for other estimators. I was thinking that python version support in extras should not be affected by specific estimators, and it should be handled based on tags only, which does not seem to be working. Let me know if I am making sense.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 22, 2023

can you please share your python version and operating system? And, whether esig is installed in th environment or not?

heres my show_versions output:

System:
    python: 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)]
executable: [c:\ProgramData\anaconda3\envs\sktime-skbase-311\python.exe](file:///C:/ProgramData/anaconda3/envs/sktime-skbase-311/python.exe)
   machine: Windows-10-10.0.22621-SP0

Python dependencies:
          pip: 23.2.1
       sktime: 0.24.0
      sklearn: 1.3.1
       skbase: 0.6.0
        numpy: 1.25.2
        scipy: 1.11.2
       pandas: 2.1.1
   matplotlib: 3.8.0
       joblib: 1.3.2
        numba: None
  statsmodels: 0.14.0
     pmdarima: 2.0.3
statsforecast: None
      tsfresh: None
      tslearn: None
        torch: None
   tensorflow: None
tensorflow_probability: None

I do not have esig installed.

The error message I get is this, as expected:
ModuleNotFoundError: type requires python version to be <3.10, but system python version is 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)].

(there seems to be an issue with the print, it should be estimator name, but logic seems ok)

@yarnabrina
Copy link
Collaborator

(there seems to be an issue with the print, it should be estimator name, but logic seems ok)

I agree. The new CI fails with same as well, so I assumed the logic in test collection is okay.

=========================== short test summary info ============================
FAILED sktime/transformations/panel/signature_based/tests/test_method.py::test_generalised_signature_method - ModuleNotFoundError: SignatureTransformer requires python version to be <3.10, but system python version is 3.10.13 (main, Aug 28 2023, 08:28:42) [GCC 11.4.0].
FAILED sktime/transformations/panel/signature_based/tests/test_method.py::test_window_error - ModuleNotFoundError: SignatureTransformer requires python version to be <3.10, but system python version is 3.10.13 (main, Aug 28 2023, 08:28:42) [GCC 11.4.0].

My interpretation is this needs to be changed:

@pytest.mark.skipif(
not _check_soft_dependencies("esig", severity="none"),
reason="skip test if required soft dependency esig not available",
)
def test_generalised_signature_method():

Since SignatureTransformer has a specific python version requirement, viz <3.10, it should use _check_estimator_deps instead of _check_soft_dependencies, and that should solve this error.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 22, 2023

it should use _check_estimator_deps instead of _check_soft_dependencies, and that should solve this error.

Yes, it should use run_test_for_class, actually.

My explanation though would be, can the python bound on SignatureTransformer be relaxed? But That's off-topic.

Fix is here: #5474

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failures are due to new CI, and we have now understood them - so, merging.

@fkiraly fkiraly merged commit 416acec into sktime:main Oct 22, 2023
147 of 165 checks passed
@fkiraly
Copy link
Collaborator

fkiraly commented Oct 22, 2023

@kianmeng, you could get the doc and maintenance badges if you would like.

@kianmeng
Copy link
Contributor Author

🥳 🥳 🥳 🥳 🥳

yarnabrina added a commit to yarnabrina/sktime-fork that referenced this pull request Oct 25, 2023
…heck

* origin/main:
  [ENH] Add tecator dataset for time series regression as `sktime` onboard dataset (sktime#5428)
  [MNT] fix typos in `forecasting` module (sktime#5314)
  [MNT] fix typos in `base` module (sktime#5313)
  [DOC] Programmatically fix (all) typos (sktime#5424)
yarnabrina pushed a commit that referenced this pull request Oct 27, 2023
This PR fixes a minor issue in one of the error messages in
`_check_python_version`.

Example failure here:
#5424 (comment)
fkiraly pushed a commit that referenced this pull request Oct 29, 2023
#### Reference Issues/PRs

Follow-up #5424
Thank you @kianmeng

#### What does this implement/fix? Explain your changes.

Found few more misspellings.
@fkiraly
Copy link
Collaborator

fkiraly commented Oct 29, 2023

@kianmeng, you could get the doc and maintenance badges if you would like.

For the badges, you have to make a PR to .all-contributorsrc, @kianmeng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation & tutorials maintenance Continuous integration, unit testing & package distribution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants