Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure mypy and typing stubs for custom_ops, fix errors #711

Merged
merged 11 commits into from Apr 15, 2022
Merged

Conversation

mcwitt
Copy link
Collaborator

@mcwitt mcwitt commented Apr 8, 2022

  • Adds mypy check to pre-commit. This adds linting based on type annotations, e.g. ensuring annotations are correct and catching logic errors exposed by annotations. Initially configured with the following options:
  • Fixes errors exposed by the mypy check, e.g.
  • Adds auto-generated stub file for the custom_ops extension module and updates the CMake config to generate the stub file using mypy-stubgen
    • the stub file contains signatures extracted from the compiled .so file, and enables typechecking and auto-completion for timemachine.lib.custom_ops
    • Note that the docstrings defined in wrap_kernels.cpp are currently not included. There is an alternative to mypy-stubgen, pybind11-stubgen, that does include docstrings in the stubs -- however, there were other issues with this (e.g. needing to replace s/timemachine:://g for syntactically-valid stubs), and it seemed better to stick with the official mypy-stubgen
    • Adds pre-commit check that the stub file is up-to-date (example failure output) and a step in the CI unit-tests stage to verify this Note: the pre-commit check for stubs is skipped in CI, because it requires the .so file to be built, and this is not feasible in the linting stage. Instead, this adds a check to unit-tests, after the module has been built building the extension module as part of a pre-commit check does work and doesn't seem as unreasonable as I'd originally thought -- this is the method the PR is currently using

Review notes

  • Most of additions are from the auto-generated custom_ops.pyi file. This probably doesn't need careful review, as it's only used for typechecking and completions

  • Even if we choose not to adopt mypy as a pre-commit check, we might consider merging the generated stub file and the annotation fixes. The mypy check should be fully contained in 157cd7f.

Some common warnings and fixes

Using the same name for variables of different types

See docs

a = [1, 2, 3]
a = np.array(a)

Fix

a_ = [1, 2, 3]
a = np.array(a_)

Initializing empty container and appending

See docs

a = []
a.append(1)

Fix

a: List[int] = []
a.append(1)

or

a: List[Any] = []
a.append(1)

or (not recommended)

a: Any = []
a.append(1)

@mcwitt mcwitt force-pushed the mypy branch 5 times, most recently from aa44804 to 028a261 Compare April 9, 2022 02:33
timemachine/potentials/jax_utils.py Outdated Show resolved Hide resolved
setup.py Outdated
@@ -104,6 +104,7 @@ def build_extension(self, ext):
"isort==5.10.1",
"mypy==0.942",
"pre-commit==2.17.0",
"pybind11-stubgen==0.10.5",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks useful, independent of whether we adopt mypy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up switching to use stubgen bundled with mypy; unfortunately this doesn't include docstrings defined in the pybind11 wrapping code, but seemed more reliable in my testing (for example, pybind11-stubgen generated syntactically-invalid .pyi files with timemachine:: in types)

generate_stubs.sh Outdated Show resolved Hide resolved
timemachine/md/minimizer.py Outdated Show resolved Hide resolved
@mcwitt mcwitt force-pushed the mypy branch 4 times, most recently from 46a947e to 666788c Compare April 12, 2022 15:59
@mcwitt mcwitt changed the title Configure mypy in pre-commit, fix errors Configure mypy and typing stubs for custom_ops, fix errors Apr 12, 2022
@mcwitt mcwitt force-pushed the mypy branch 10 times, most recently from a85214c to 0e86f68 Compare April 12, 2022 22:32
.gitlab-ci.yml Outdated
@@ -44,6 +44,7 @@ lint:
- if: $CI_EXTERNAL_PULL_REQUEST_IID
- if: $NIGHTLY_TESTS
script:
- pip install -r ci/requirements.txt
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unfortunate, since we already specify the mypy dependency in 2 places (dev extras in setup.py and build system depends in pyproject.toml), but neither of these can be used here because I think we want to avoid building the package (this would add significant time to the linting stage)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is no longer required, since it is in the docker container?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch; removed in 829e253

requires = ["setuptools>=43.0.0", "wheel", "cmake==3.22.1", "versioneer-518"]
requires = [
"cmake==3.22.1",
"mypy==0.942",
Copy link
Collaborator Author

@mcwitt mcwitt Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to add mypy here because we now use stubgen (part of mypy) in the CMake build to generate a stub file. Dependencies listed here are installed before the extension module builds, while those in setup.py are installed after.

@mcwitt mcwitt marked this pull request as ready for review April 12, 2022 23:17
@mcwitt mcwitt requested review from maxentile and badisa April 12, 2022 23:18
timemachine/md/moves.py Outdated Show resolved Hide resolved
@@ -75,6 +75,10 @@ RUN pip install --no-cache-dir pre-commit==2.17.0
COPY .pre-commit-config.yaml /code/timemachine/
RUN cd /code/timemachine && git init . && pre-commit install-hooks

# Install CI requirements
COPY ci/requirements.txt /code/timemachine/ci/requirements.txt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to expand this dockerfile to support more CI logic, we should create a separate layer just for CI. Otherwise right now these ci requirements will get baked into production containers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds good to me. We could also move over the pre-commit installation. Do you think it's worth doing in this PR or a separate one? (not sure how much work would be involved in updates to CI infra config)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be relatively straight forward, just replicate the 'final' docker stage and run the two CI specific commands within it and change the docker target in CI.

Seems best to do it in this PR, unless it isn't as trivial as I am imagining

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done in 5d36eb4 (and fixed ordering in ba2d125)

@@ -0,0 +1,150 @@
from typing import List, Optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Does this need to be checked in? It seems like CMAKE will create this file on the fly?

Copy link
Collaborator Author

@mcwitt mcwitt Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like CMAKE will create this file on the fly?

Correct; the original reason to check it in was so it can be used by the mypy check during linting. But since we're now building the extension during linting anyway to verify that the stubs are up to date, seems like we might be able to avoid checking it in. I'll look into this.

Copy link
Collaborator Author

@mcwitt mcwitt Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another benefit of checking it in is to get typechecking and completions to work when developing the Python code on non-Linux machines. Not sure how common of a use case that is, though.. It seems like most of us are either developing locally on Linux, or running a remote IDE server on a Linux box?

@maxentile @proteneer @jkausrelay
do you know if having the stub file locally would be useful for your dev setups?

Copy link
Collaborator

@maxentile maxentile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes this suggests mostly look like improvements to me!

@@ -158,15 +158,6 @@ void declare_context(py::module &m) {
py::arg("store_du_dl_interval") = 0,
py::arg("store_x_interval") = 0)
// .def("multiple_steps", &timemachine::Context::multiple_steps)
.def(
"get_x_t",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice -- didn't realize this was duplicated!

Comment on lines 23 to 24
host_schedule: np.ndarray,
host_schedule: List[float],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mild surprise: I thought this was often an array

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call; I don't think we have any cases of passing an array in the timemachine code, but there are plenty of examples in the tests (we don't currently run mypy on tests; this seemed like maybe too much overhead).

I think a good general approach might be to use the most general type that has the capabilities exercised by the implementation. In this case, we use __getitem__ and __len__, which suggests Sequence might be appropriate. numpy.ndarray does not implement one of the required methods of Sequence (.index()), but we don't use that anyway. So, an appropriate type here might be something like Union[Sequence[float], NDArray].

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this in ff5c69b

def validate_map(n_nodes: int, relative_inds: np.array, absolute_inds: np.array) -> bool:
def validate_map(n_nodes: int, relative_inds: List[Tuple[int, int]], absolute_inds: List[int]) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangent / curiosity: It's often convenient to use array containers and python containers interchangeably in this sort of code.

Is there a more generic type for this situation? (should accept: int array of shape (n_pairs, 2), a list of 2-tuples, etc... Initial thought is something like Union[Array, List[Tuple[int, int]]], but that seems counterproductive since more verbose, and since array typing is less precise)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial thought is something like Union[Array, List[Tuple[int, int]]]

I think this is a reasonable compromise -- we lose some power to ensure that the input is the correct shape, but this still contains useful information (if not an array, we know the input is a list of 2-tuples)

Copy link
Collaborator Author

@mcwitt mcwitt Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I think the most general type compatible with the implementation might be something like

validate_map(n_nodes: int, relative_inds: Iterable[Tuple[int, int]], absolute_inds: Collection[int]) -> bool

For example, this would work with an array for relative_inds and a tuple for absolute_inds

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think this sort of generalization might be best done as an iterative process; trying to come up with the most general types initially is a fair amount of work and error-prone. For example, I might make the above update in a PR where I add some code that passes an array to relative_inds)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice: Iterable[Tuple[int, int]] seems both generic and precise

(I think this sort of generalization might be best done as an iterative process; trying to come up with the most general types initially is a fair amount of work and error-prone.

Agreed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants