Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stubgen: unify C extension and pure python stub generators with object oriented design #15770

Merged
merged 12 commits into from Oct 15, 2023

Conversation

chadrik
Copy link
Contributor

@chadrik chadrik commented Jul 27, 2023

This MR is a major overhaul to stubgen. It has been tested extensively in the process of creating stubs for multiple large and varied libraries (detailed below).

User story

The impetus of this change is as follows: as a maintainer of third-party stubs I do not want to use stubgen as a starting point for hand-editing stub files, I want a framework to regenerate stubs against upstream changes to a library.

Summary of Changes

  • Introduces an object-oriented design for C extension stub generation, including a common base class that is shared between inspection-based and parsing-based stub generation.
  • Generally unifies and harmonizes the behavior between inspection and parsing approaches. For example, function formatting, import tracking, signature generators, and attribute filtering are now handled with the same code.
  • Adds support for --include-private and --export-less to c-extensions (inspection-based generation).
  • Adds support for force enabling inspection-based stub generation (the approach used for C extensions) on pure python code using a new --inspect-mode flag. Useful for packages that employ dynamic function or class factories. Also makes it possible to generate stubs for pyc-only modules (yes, this is a real use case)
  • Adds an alias --no-analysis for --parse-only to clarify the purpose of this option.
  • Removes filtering of __version__ attribute from modules: I've encountered a number of cases in real-world code that utilize this attribute.
  • Adds a number of tests for inspection mode. Even though these run on pure python code they increase coverage of the C extension code since it shares much of hte same code base.

Below I've compiled some basic information about each stub library that I've created using my changes, and a link to the specialized code for procedurally generating the stubs.

Library code type other notes
USD boost-python integrates types from doxygen
katana pyc and C extensions uses epydoc docstrings. has pyi-only packages
mari pure python and C extensions uses epydoc docstrings
opencolorio pybind11
pyside2 shiboken
substance_painter pure python basic / non-custom. reads types from annotations
pymel pure python integrates types parsed from custom docs

I know that this is a pretty big PR, and I know it's a lot to go through, but I've spent a huge amount of time on it and I believe this makes mypy's stubgen tool the absolute best available. If it helps, I also have 13 merged mypy PRs under my belt and I'll be around to fix any issues if they come up.

@github-actions

This comment has been minimized.

@chadrik chadrik changed the title stubgen: unify inspection and parsing based stub generators with object oriented design stubgen: unify C extension and pure python stub generators with object oriented design Aug 1, 2023
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@@ -59,7 +64,68 @@ def __eq__(self, other: Any) -> bool:
class FunctionSig(NamedTuple):
name: str
args: list[ArgSig]
ret_type: str
ret_type: str | None
Copy link
Contributor Author

@chadrik chadrik Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ret_type is now optional to match ArgSig.type. When either of these is None, FunctionSig .format_sig will omit the type from the signature.

mypy/stubgen.py Show resolved Hide resolved
mypy/stubgen.py Show resolved Hide resolved
mypy/stubgen.py Show resolved Hide resolved
mypy/stubgen.py Show resolved Hide resolved
def __hash__(self) -> int: ...
def __index__(self) -> int: ...
def __int__(self) -> int: ...
def __ne__(self, other: object) -> bool: ...
def __setstate__(self, state: int) -> None: ...
Copy link
Contributor Author

@chadrik chadrik Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the AST-based stub generator specifically excludes __getstate__/__setstate__, so I think it makes sense for the inspection-based generator to also exclude these. I can't think of a circumstance where exposing these within stubs would be useful, but if we decide that it is then I think it makes sense to do it for both types of stub generators (which is sort of the theme of this PR).

mypy/stubgen.py Show resolved Hide resolved
) -> None:
"""Ensure that objects which implement old-style iteration via __getitem__
are considered iterable.
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: add a test for this

@github-actions

This comment has been minimized.

4 similar comments
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

mypy/stubdoc.py Outdated
"""Return if this signature is the catchall identity: (*args, **kwargs) -> Any"""
return self.has_catchall_args() and self.ret_type in (None, "Any", "typing.Any")

def format_sig(self, any_val: str | None = None, suffix: str = ": ...") -> str:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previous the two types of stub generators each had their own code for formatting signatures. now they both use FunctionSig.format_sig

@github-actions

This comment has been minimized.

@@ -94,6 +94,7 @@
from mypy.visitor import NodeVisitor


@trait
@mypyc_attr(allow_interpreted_subclasses=True)
class TraverserVisitor(NodeVisitor[None]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this a trait because ASTStubGenerator inherits from both BaseStubGenerator and this.

@@ -0,0 +1 @@
from . import basics as basics
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is a result of the improved inspection capabilities, and it is accurate. As you can see the sub-module is imported by default:

>>> import pybind11_mypy_demo
>>> dir(pybind11_mypy_demo)
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'basics']

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why the content of the __init__.pyi actually differs for the two generated output folders:

stubgen-include-docs/pybind11_mypy_demo/__init__.pyi (this one is empty)

vs

stubgen/pybind11_mypy_demo/__init__.pyi (this one is not empty)

Is that expected / on purpose?

@github-actions

This comment has been minimized.

@chadrik
Copy link
Contributor Author

chadrik commented Aug 5, 2023

@ikonst @hauntsaninja @JelleZijlstra Can someone give this PR a look, please! It's totally awesome, I promise.

@@ -112,7 +112,6 @@ def run(self):
"stubtest.py",
"stubgenc.py",
"stubdoc.py",
"stubutil.py",
Copy link
Contributor Author

@chadrik chadrik Aug 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One remaining question is "which stubgen modules to compile?"

Currently, stubutil contains the base classes and utilities for stubgen and stubgenc, however, it's a bit asymmetrical, because of the two leaf modules, only stubgen is compiled with mypyc.

            stubutil.so*
            /        \
     stubgenc.py    stubgen.so*

The reason for skipping compilation of these is provided in the comment above: "Skip these to reduce the size of the build".

Now that many of the supporting classes have moved into stubutil, which is now compiled, I'm tempted to skip compilation of stubgen, to strike a balance between size and performance:

            stubutil.so*
            /        \
     stubgenc.py    stubgen.py

Is there any concern about hurting performance of stubgen if I do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably fine to skip compiling stubgen. stubtest is already non-compiled, and in my experience it's plenty fast enough. Moreover, for stubgen, performance isn't nearly as much as an issue as with mypy proper or stubtest. Unlike with mypy proper or stubtest, you're unlikely to be running stubgen repeatedly in CI -- you generally generate stubs once, and then you're done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those were my thoughts as well. stubgen does perform an analysis of pure python source files (if not disabled), but that will use the compiled mypy modules.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Aug 5, 2023

Thanks for working on this, and your previous work on stubgen! I probably won't be able to review this soon — this is a big PR and I'm not that familiar with stubgen. However, if someone else takes a look and feels confident, I'd be happy to take a secondary look, just tag me.

@chadrik
Copy link
Contributor Author

chadrik commented Aug 20, 2023

Thanks for working on this, and your previous work on stubgen! I probably won't be able to review this soon — this is a big PR and I'm not that familiar with stubgen. However, if someone else takes a look and feels confident, I'd be happy to take a secondary look, just tag me.

Hi all, we're already getting merge conflicts. Can someone have a look at this, please! Thanks in advance.

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good and I'm happy to merge it once my small comments are addressed and CI is green.

Two bigger points:

  • It seems we don't really handle the case where a name we want to use is locally shadowed, e.g. we want to emit type[X] from builtins but type is something else in this module. I assume that's out of scope, though.
  • There aren't any test cases for the pyc support as far as I can see. It's probably a pain to do in general, but maybe we can add a test case where we generate a pyc from some Python code, then generate a stub from that pyc.

mypy/stubdoc.py Outdated Show resolved Hide resolved
mypy/stubdoc.py Outdated Show resolved Hide resolved
mypy/stubgen.py Show resolved Hide resolved
mypy/stubgen.py Outdated Show resolved Hide resolved
mypy/stubgenc.py Outdated Show resolved Hide resolved
mypy/stubutil.py Outdated Show resolved Hide resolved
mypy/stubutil.py Outdated Show resolved Hide resolved
mypy/stubutil.py Outdated Show resolved Hide resolved
PI: float
__version__: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we previous excluded this on purpose

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this change. I think I've seen several typeshed PRs where adding __version__ to the stubs fixed mypy false positives on user code (according to mypy_primer). I don't think there's a good reason to exclude __version__ from the generated stubs.

test-data/unit/stubgen.test Show resolved Hide resolved
@chadrik chadrik force-pushed the stubgen/shared-sig-gen-mr branch 2 times, most recently from 2780a54 to a714309 Compare October 13, 2023 02:22
@github-actions

This comment has been minimized.

2 similar comments
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@chadrik
Copy link
Contributor Author

chadrik commented Oct 15, 2023

@JelleZijlstra @AlexWaygood Sorry for the delay, I recently changed jobs. I've rebased the changes and addressed the notes. Hopefully we can get this merged. Thanks for your help!

@github-actions

This comment has been minimized.

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some minor changes in the docs; I'll apply that change and then land this.

docs/source/stubgen.rst Outdated Show resolved Hide resolved
mypy/stubgen.py Show resolved Hide resolved
@github-actions

This comment has been minimized.

@chadrik
Copy link
Contributor Author

chadrik commented Oct 15, 2023

I've noticed that mypyc-38-macos is flaky, so I rebased and pushed.

@JelleZijlstra
Copy link
Member

Thanks, was just going to retry it after the other jobs finish.

@chadrik
Copy link
Contributor Author

chadrik commented Oct 15, 2023

I poked around looking for an option to retry a single failed job. Is there a way to do that?

@JelleZijlstra
Copy link
Member

If you're a maintainer, after all CI has finished there is a button for it. But I don't think you can retry until all other jobs are done.

@AlexWaygood
Copy link
Member

I poked around looking for an option to retry a single failed job. Is there a way to do that?

Yes, but only maintainers can do it, and the "retry" button only appears after all jobs have finished

@github-actions
Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@JelleZijlstra JelleZijlstra merged commit e435594 into python:master Oct 15, 2023
20 checks passed
@chadrik
Copy link
Contributor Author

chadrik commented Oct 15, 2023

Thanks!

JelleZijlstra pushed a commit that referenced this pull request Nov 29, 2023
This addresses several regressions identified in
#16486

The primary regression from #15770 is
that pybind11 properties with docstrings were erroneously assigned
`typeshed. Incomplete`.

The reason for the regression is that as of the introduction of the
`--include-docstring` feature
(#13284, not my PR, ftr),
`./misc/test-stubgenc.sh` began always reporting success. That has been
fixed.

It was also pointed out that `--include-docstring` does not work for
C-extensions. This was not actually a regression as it turns out this
feature was never implemented for C-extensions (though the tests
suggested it had been), but luckily my efforts to unify the pure-python
and C-extension code-paths made fixing this super easy (barely an
inconvenience)! So that is working now.

I added back the extended list of `typing` objects that generate
implicit imports for the inspection-based stub generator. I originally
removed these because I encountered an issue generating stubs for
`PySide2` (and another internal library) where there was an object with
the same name as one of the `typing` objects and the auto-import created
broken stubs. I felt somewhat justified in this decision as there was a
straightforward solution -- e.g. use `list` or `typing.List` instead of
`List`. That said, I recognize that the problem that I encountered is
more niche than the general desire to add import statements for typing
objects, so I've changed the behavior back for now, with the intention
to eventually add a flag to control this behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants