Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mypy errors #16726

Merged
merged 19 commits into from Mar 30, 2020
Merged

Fix mypy errors #16726

merged 19 commits into from Mar 30, 2020

Conversation

rth
Copy link
Member

@rth rth commented Mar 19, 2020

Solves mypy errors and closes #12953

Solving these errors is a pre-requisite to consider adding some light type annotations #16705

The full error log on master can be found below (58 errors),

mypy sklearn/ --ignore-missing-imports
sklearn/externals/_arff.py:331: error: Incompatible types in assignment (expression has type "str", base class "ArffException" defined the type as "None")
sklearn/externals/_arff.py:335: error: Incompatible types in assignment (expression has type "str", base class "ArffException" defined the type as "None")
sklearn/externals/_arff.py:349: error: Incompatible types in assignment (expression has type "str", base class "ArffException" defined the type as "None")
sklearn/externals/_arff.py:386: error: Incompatible types in assignment (expression has type "str", base class "ArffException" defined the type as "None")
sklearn/externals/_arff.py:390: error: Incompatible types in assignment (expression has type "str", base class "ArffException" defined the type as "None")
sklearn/externals/_arff.py:394: error: Incompatible types in assignment (expression has type "str", base class "ArffException" defined the type as "None")
sklearn/utils/fixes.py:54: error: Name 'pinvh' already defined (possibly by an import)
sklearn/utils/fixes.py:151: error: All conditional function variants must have identical signatures
sklearn/utils/fixes.py:173: error: Name 'MaskedArray' already defined on line 159
sklearn/__init__.py:63: error: Cannot determine type of '__SKLEARN_SETUP__'
sklearn/utils/_pprint.py:327: error: "Type[PrettyPrinter]" has no attribute "_dispatch"
sklearn/model_selection/_split.py:2147: error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no attribute "__test__"
sklearn/svm/_classes.py:973: error: Decorated property not supported
sklearn/svm/_classes.py:980: error: Decorated property not supported
sklearn/svm/_classes.py:1307: error: Decorated property not supported
sklearn/svm/_classes.py:1314: error: Decorated property not supported
sklearn/svm/_base.py:6: error: Module 'sklearn.svm' has no attribute '_libsvm'
sklearn/svm/_base.py:7: error: Module 'sklearn.svm' has no attribute '_liblinear'
sklearn/svm/_base.py:8: error: Module 'sklearn.svm' has no attribute '_libsvm_sparse'
sklearn/linear_model/_stochastic_gradient.py:290: error: Decorated property not supported
sklearn/linear_model/_stochastic_gradient.py:296: error: Decorated property not supported
sklearn/linear_model/_stochastic_gradient.py:302: error: Decorated property not supported
sklearn/linear_model/_stochastic_gradient.py:308: error: Decorated property not supported
sklearn/linear_model/_least_angle.py:22: error: Module 'sklearn.utils' has no attribute 'arrayfuncs'
sklearn/linear_model/_coordinate_descent.py:28: error: Module 'sklearn.linear_model' has no attribute '_cd_fast'
sklearn/covariance/_graph_lasso.py:23: error: Module 'sklearn.linear_model' has no attribute '_cd_fast'
sklearn/manifold/_isomap.py:170: error: Decorated property not supported
sklearn/manifold/_t_sne.py:25: error: Module 'sklearn.manifold' has no attribute '_barnes_hut_tsne'
sklearn/ensemble/_base.py:109: error: Need type annotation for '_required_parameters' (hint: "_required_parameters: List[<type>] = ...")
sklearn/dummy.py:398: error: Decorated property not supported
sklearn/dummy.py:625: error: Decorated property not supported
sklearn/tests/test_random_projection.py:28: error: Unsupported operand types for + ("List[Callable[[Any, Any, Any, Any], Any]]" and "List[Callable[[Any, Any, Any], Any]]")
sklearn/tests/test_random_projection.py:33: error: Unsupported operand types for + ("List[Type[SparseRandomProjection]]" and "List[Type[GaussianRandomProjection]]")
sklearn/experimental/enable_iterative_imputer.py:18: error: Module has no attribute "IterativeImputer"
sklearn/svm/tests/test_svm.py:31: error: Module 'sklearn.svm' has no attribute '_libsvm'
sklearn/manifold/tests/test_t_sne.py:24: error: Module 'sklearn.manifold' has no attribute '_barnes_hut_tsne'
sklearn/impute/tests/test_impute.py:21: error: Module 'sklearn.impute' has no attribute 'IterativeImputer'
sklearn/impute/tests/test_common.py:12: error: Module 'sklearn.impute' has no attribute 'IterativeImputer'
sklearn/experimental/enable_hist_gradient_boosting.py:29: error: Module has no attribute "HistGradientBoostingClassifier"; maybe "GradientBoostingClassifier"?
sklearn/experimental/enable_hist_gradient_boosting.py:30: error: Module has no attribute "HistGradientBoostingRegressor"; maybe "GradientBoostingRegressor"?
sklearn/ensemble/tests/test_forest.py:105: error: Argument 1 to "update" of "dict" has incompatible type "Dict[str, Type[ForestRegressor]]"; expected "Mapping[str, Type[ForestClassifier]]"
sklearn/ensemble/tests/test_forest.py:106: error: Argument 1 to "update" of "dict" has incompatible type "Dict[str, Type[RandomTreesEmbedding]]"; expected "Mapping[str, Type[ForestClassifier]]"
sklearn/ensemble/tests/test_forest.py:109: error: Argument 1 to "update" of "dict" has incompatible type "Dict[str, Type[ForestRegressor]]"; expected "Mapping[str, Type[ForestClassifier]]"
sklearn/ensemble/tests/test_forest.py:1262: error: Variable "sklearn.ensemble.tests.test_forest.DEFAULT_JOBLIB_BACKEND" is not valid as a type
sklearn/ensemble/tests/test_forest.py:1262: note: See https://mypy.readthedocs.io/en/latest/common_issues.html#variables-vs-type-aliases
sklearn/ensemble/tests/test_forest.py:1262: error: Invalid base class "DEFAULT_JOBLIB_BACKEND"
sklearn/tests/test_pipeline.py:38: error: Module 'sklearn.ensemble' has no attribute 'HistGradientBoostingClassifier'; maybe "GradientBoostingClassifier"?
sklearn/tests/test_pipeline.py:1146: error: "_BaseComposition" has no attribute "steps"
sklearn/tests/test_docstring_parameters.py:36: error: Module has no attribute "__path__"
sklearn/model_selection/tests/test_search.py:71: error: Module 'sklearn.ensemble' has no attribute 'HistGradientBoostingClassifier'; maybe "GradientBoostingClassifier"?
sklearn/feature_selection/tests/test_from_model.py:14: error: Module 'sklearn.ensemble' has no attribute 'HistGradientBoostingClassifier'; maybe "GradientBoostingClassifier"?
sklearn/ensemble/_hist_gradient_boosting/tests/test_warm_start.py:12: error: Module 'sklearn.ensemble' has no attribute 'HistGradientBoostingRegressor'; maybe "GradientBoostingRegressor"?
sklearn/ensemble/_hist_gradient_boosting/tests/test_warm_start.py:13: error: Module 'sklearn.ensemble' has no attribute 'HistGradientBoostingClassifier'; maybe "GradientBoostingClassifier"?
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py:12: error: Module 'sklearn.ensemble' has no attribute 'HistGradientBoostingRegressor'; maybe "GradientBoostingRegressor"?
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py:13: error: Module 'sklearn.ensemble' has no attribute 'HistGradientBoostingClassifier'; maybe "GradientBoostingClassifier"?
sklearn/ensemble/_hist_gradient_boosting/tests/test_compare_lightgbm.py:9: error: Module 'sklearn.ensemble' has no attribute 'HistGradientBoostingRegressor'; maybe "GradientBoostingRegressor"?
sklearn/ensemble/_hist_gradient_boosting/tests/test_compare_lightgbm.py:10: error: Module 'sklearn.ensemble' has no attribute 'HistGradientBoostingClassifier'; maybe "GradientBoostingClassifier"?
sklearn/inspection/tests/test_partial_dependence.py:19: error: Module 'sklearn.ensemble' has no attribute 'HistGradientBoostingClassifier'; maybe "GradientBoostingClassifier"?
sklearn/inspection/tests/test_partial_dependence.py:20: error: Module 'sklearn.ensemble' has no attribute 'HistGradientBoostingRegressor'; maybe "GradientBoostingRegressor"?
Found 58 errors in 31 files (checked 600 source files)

and generally errors are due to mypy not being able to determine type. That's happens very rarely as by default anything unknown is of type Any. In particular following type of errors are found,

  • mypy needing some help with specifying types manually
  • unsupported features that can also be ignored such as,
    • decorated properties
    • importing C extensions (only some, as far as I can tell). Maybe it's due to the way we build them and could deserve more investigation.
    • some non standard edge cases in tests that don't matter and where typing can be ignored.

mypy is also added in CI. I don't have strong feeling about it. pandas does this. I would propose to try it and if it's too annoying for any reason in PRs disable it.

CC maybe @thomasjpfan @jnothman

@@ -271,7 +271,7 @@
_FILE_CONTENT_TEMPLATE = """
# THIS FILE WAS AUTOMATICALLY GENERATED BY deprecated_modules.py
import sys
from . import {new_module_name}
from . import {new_module_name} # type: ignore
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are often C extensions (which raise mypy type errors), and since this module is deprecated it doesn't matter anyway.

@@ -395,7 +395,7 @@ def score(self, X, y, sample_weight=None):
X = np.zeros(shape=(len(y), 1))
return super().score(X, y, sample_weight)

@deprecated(
@deprecated( # type: ignore
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally decorators of properties are not supported in mypy

HistGradientBoostingClassifier # type: ignore
)
ensemble.HistGradientBoostingRegressor = (
HistGradientBoostingRegressor # type: ignore
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors with a message that module doesn't include HistGradientBoostingRegressor.

@rth
Copy link
Member Author

rth commented Mar 19, 2020

If we want to use type hints, we should also add some documentation for it similar to https://pandas.pydata.org/docs/development/contributing.html?highlight=mypy#type-hints. I would rather not do it in this PR however, so maybe meanwhile we should disable mypy in CI...

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why a bunch of the places in this PR need the type: ignore directive. Should we put a comment before every single one, or should we put somewhere in maintaners.rst or something and explain where those are needed?

- bash: conda create --name flake8_env --yes flake8
- bash: |
conda create --name flake8_env --yes python=3.8
source activate flake8_env
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source activate is deprecated though.

Comment on lines 7 to 13
if typing.TYPE_CHECKING:
# Workaround for type checkers (e.g. mypy) to avoid
# import errors for experimenal estimators.
# TODO: remove the above check once the estimator is no longer
# experimental.
from ._iterative import IterativeImputer # noqa

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this this to where we explain how to have an experimental module? is it in maintainer.rst?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section on experimental estimators, says to copy and adapt the code from HistGradientBoosting and/or Iterative regressor, which should be no need to more specific documentation, I think.

@rth
Copy link
Member Author

rth commented Mar 21, 2020

Thanks @adrinjalali !

I don't know why a bunch of the places in this PR need the type: ignore directive. Should we put a comment before every single one, or should we put somewhere in maintaners.rst or something and explain where those are needed?

I added a comment above each line where type: ignore was needed. Also applied workaround to remove some number of those.

In the end I also added a sentence about workarounds needed for experimental modules to maintainer.rst and a short section about mypy and type annotations to contributing guide.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I do not understand the following:

pckg[1] for pckg in walk_packages(
prefix='sklearn.',
# mypy error: Module has no attribute "__path__"
path=sklearn.__path__) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this causing a typing error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a known mypy issue python/mypy#1422. For now the the workaround is to ignore it. I'll add an inline comment.

@@ -22,7 +22,8 @@
from ..decomposition import PCA
from ..metrics.pairwise import pairwise_distances
from . import _utils
from . import _barnes_hut_tsne
# mypy error: Module 'sklearn.manifold' has no attribute '_barnes_hut_tsne'
from . import _barnes_hut_tsne # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is _barnes_hut_tsne causing a typing error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is _barnes_hut_tsne causing a typing error?

There is some mypy issue with importing full C extension modules (but not individual objects from these modules). Couldn't find relevant information in the mypy issue tracker and opened python/mypy#8575

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of the above linked mypy issue is that's expected behavour,

Mypy doesn't actually import or interact with C extension code. It can only parse and gain type information from .py and .pyi files.

so it sounds like all C extensions need to be ignored for typing in some way, either explicitly with type: ignore or via the --ignore-missing-import option.

@WillAyd since you worked on this in pandas, what's the current approach for mypy and C extensions there? As far as I can tell pandas doesn't use neither of the above options, and yet I don't see any C extension related errors with mypy pandas or errors about missing stabs for dependencies for that matter such as,

sklearn/cluster/_spectral.py:10: error: Skipping analyzing 'numpy': found module but no type hints or library stubs

that we currently skip in this PR using --ignore-missing-import. Thanks!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better or worse we set ignore_missing_imports=True in our config, so that this problem is "solved" globally rather than having to ignore individual imports

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, indeed, finally found pandas mypy config. Was previously grepping option names with "-" instead of "_". Thanks for confirming @WillAyd!

For the record, the above mentioned # type: ignore is still necessary since this particular import doesn't get skipped by the global --ignore-missing-import option as discussed in the above linked mypy issue.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rth

This feels like it can be maintainable. If passing mypy becomes a significant burden for contributors, we can turn it off.

doc/developers/maintainer.rst Show resolved Hide resolved
@rth
Copy link
Member Author

rth commented Mar 24, 2020

Thanks @thomasjpfan, I added a comment to the documentation about mypy workarounds as you suggested. Please let me know if you have other comments. Also cc @adrinjalali

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rth

@rth
Copy link
Member Author

rth commented Mar 30, 2020

Merging with +2; also since I didn't hear much objections to this while it was mentioned in the dev meeting today.

Please ping me if it mypy linting becomes too annoying in PRs.

@rth rth merged commit 5a43408 into scikit-learn:master Mar 30, 2020
@rth rth deleted the fix-mypy-errors branch March 30, 2020 13:41
@rth
Copy link
Member Author

rth commented Mar 30, 2020

Also we should have a discussion in #16705 as to which parts of the code it might be good to add (moderate) type annotations first.

gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate mypy errors
5 participants