Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stub files for main cython classes #8427

Merged
merged 7 commits into from
Aug 7, 2021
Merged

Add stub files for main cython classes #8427

merged 7 commits into from
Aug 7, 2021

Conversation

ezorita
Copy link
Contributor

@ezorita ezorita commented Jun 17, 2021

Description

This PR adds .pyi stub files to achieve broader compatibility with type checkers (based on discussion #8373 ). Adds type hint annotations with the following stub files:

  • spacy/lexeme.pyi
  • spacy/strings.pyi
  • spacy/vocab.pyi
  • spacy/matcher/matcher.pyi
  • spacy/tokens/doc.pyi
  • spacy/tokens/morphanalysis.pyi
  • spacy/tokens/_retokenize.pyi
  • spacy/tokens/span_group.pyi
  • spacy/tokens/span.pyi
  • spacy/tokens/token.pyi

This aims to raise the discussion on the convenience of using manually-created stub files to improve the compatibility of type hints with Pylance.

Types of change

Enhancement on type hints compatibility.

Checklist

  • I have submitted the spaCy Contributor Agreement.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@polm polm changed the base branch from develop to master June 17, 2021 15:19
@ezorita ezorita changed the base branch from master to develop June 17, 2021 16:27
@ezorita ezorita changed the base branch from develop to master June 17, 2021 16:27
@svlandeg svlandeg added the types Issues related to typing or typing tools label Jun 18, 2021
@ezorita
Copy link
Contributor Author

ezorita commented Jun 18, 2021

I am missing two things:

  • What is the expected return type for hash()? I assume int.
  • I used thinc.types.Floats#d and thinc.types.Ints#d for ndarray type, is the thinc interface compatible with all vector types here?

@ldorigo
Copy link
Contributor

ldorigo commented Jun 19, 2021

Great initiative and thanks for the work. Also interested in seeing this working. Stubs look good but I'd suggest using generics and overloads rather than Any in some instances: for example, in Matcher.pipe, the second element of the tuples taken as input should have the same type as the tuples produced as output when as_tuples is true. I will suggest some changes if I can get spacy to compile from source.

@ldorigo
Copy link
Contributor

ldorigo commented Jun 22, 2021

I can't push to this PR directly (or if I can I haven't found how, feel free to point me to instructions. I think you might have to make me a contributor on your own fork? Or maybe I need to open a PR on your fork?). I'm using your annotations now and will make some comments/suggestions as I encounter problems.

One thing I have noted so far is the __getitem__ function on Doc should be overloaded to properly express that you will get a Token when doing doc[i] and a Span when doing doc[i:j]:

    @overload
    def __getitem__(self, i: int) -> Token: ...
    @overload
    def __getitem__(self, i: slice) -> Span: ...

@ldorigo
Copy link
Contributor

ldorigo commented Jun 22, 2021

Another thing is that Doc.__iter__ should be annotated as

    def __iter__(self) -> Iterator[Token]: ...

It is currently marked as returning a Token, which is incorrect

@adrianeboyd
Copy link
Contributor

We really appreciate the effort here! How much of this is automatically generated vs. created by hand? We'd be happy to include automatically-generated type stubs if it's easy for us to incorporate those steps into the packaging, but we're a bit hesitant to include type stubs that would require a lot of manual maintenance and syncing every time changes are made.

@ezorita
Copy link
Contributor Author

ezorita commented Jun 30, 2021

Thanks @adrianeboyd .These files have no automation whatsoever, but only target the API interface. So unless some interface is overloaded or made backward-incompatible there should be no need to update it.

@ldorigo
Copy link
Contributor

ldorigo commented Jul 1, 2021

While I'd obviously like to see these stubs become part of spaCy, one issue that's important to solve IMO is that, at least on VScode, they result in tooltips that show the typing information but not the documentation related to each function. We should check what happens in at least a few other major editors and maybe jupyter notebooks/jupyter lab to make sure that the stubs don't shadow the source files and prevent the documentation from being displayed.

@joostkremers
Copy link

While I'd obviously like to see these stubs become part of spaCy, one issue that's important to solve IMO is that, at least on VScode, they result in tooltips that show the typing information but not the documentation related to each function. We should check what happens in at least a few other major editors and maybe jupyter notebooks/jupyter lab to make sure that the stubs don't shadow the source files and prevent the documentation from being displayed.

Does Emacs fall under your definition of "major editor"? 😄 I use Emacs together with pyright and I'd be happy to try out the stub files.

Mind you, I don't get any documentation popups on methods from classes that pyright currently flags as "unknown import symbol" such as Doc, Token or Span. I do get them for methods of, e.g., the Language class, which pyright doesn't flag. Are those the ones you don't see anymore?

@ezorita
Copy link
Contributor Author

ezorita commented Jul 2, 2021

@ldorigo good point, we should make sure these don't interfere in other editors. However, as @joostkremers suggests, no interface docs were available so far with vscode & pylance either.

Sure emacs is a major editor ❤️

@joostkremers
Copy link

I took the .pyi files and dumped them into my venv's spaCy 3.1 installation (wholly unprofessional, I know...) and they seem to essentially work. On the line:

from spacy.tokens import Doc, Span, Token

Doc, Span and Token are no longer flagged as unknown imports.

I do have some issues, though. First, documentation popups only seem to show whatever pyright can glean from the stub file. That is, for classes I get just the word "Class" and for methods I get the argument list as provided in the stub file. No documentation strings.

Second, some attributes appear to be missing, specifically the doc attribute of Span and the strings attribute of Vocab.

I also get an error on the method argument for Span.set_extension saying:

Argument of type "(sent: Span) -> List[List[Token]]" cannot be assigned to parameter "method" of type "SpanMethod | None" in function "set_extension"

The signature of the method argument of set_extension is Optional[Callable[[Span, ...], Any]], which AFAICT my function adheres to (plus, the code works as intended), but feel free to correct me if I'm wrong.

@jakebailey
Copy link

Just in relation to cython support, I think you'd also be interested in getting cython to emit pyi files in addition to its compiled code (which would then allow for a single definition of types/documentation). I know there was cython/cython#3818, but that unfortunately stalled.

While I'd obviously like to see these stubs become part of spaCy, one issue that's important to solve IMO is that, at least on VScode, they result in tooltips that show the typing information but not the documentation related to each function.

I'm interested in the cases you're seeing with this; the intent is that even when stubs are present, we can go to the real source to find docstrings, do navigation, etc; the only case where that doesn't work is where the "actual source" is compiled, and therefore not statically available without executing the library (which we don't do).

@svlandeg svlandeg added the enhancement Feature requests and improvements label Jul 14, 2021
@ldorigo
Copy link
Contributor

ldorigo commented Aug 2, 2021

I'm interested in the cases you're seeing with this; the intent is that even when stubs are present, we can go to the real source to find docstrings, do navigation, etc; the only case where that doesn't work is where the "actual source" is compiled, and therefore not statically available without executing the library (which we don't do).

@jakebailey: since you're from microsoft, I assume "we" is vscode's python extension. Since pylance doesn't currently support cython, having only typing information from the stubs rather than the full docstrings is expected behaviour (although it would definitely be great if pylance were to support cython :-)).

What I meant to say is that we should check that in other editors/language servers, who currently do show docstrings for all of spacy's functions and clases, the addition of these stubs does not prevent them from being shown.

@svlandeg svlandeg assigned svlandeg and unassigned svlandeg Aug 3, 2021
@svlandeg svlandeg self-requested a review August 3, 2021 21:40
@svlandeg
Copy link
Member

svlandeg commented Aug 6, 2021

Hi all,

Thanks a lot @ezorita and @ldorigo for your work on this, and to @joostkremers and @jakebailey for testing and chiming in!

We really do appreciate the effort here, but were still a bit relunctant to merge as it might introduce some maintenance burden. It would be great to find a more automated solution in the future to keep this in-sync, but for now I think having this really will help many developers.

What I meant to say is that we should check that in other editors/language servers, who currently do show docstrings for all of spacy's functions and clases, the addition of these stubs does not prevent them from being shown.

I checked with Pycharm Professional, this is what I see on the current master branch:

And this is what I get with the current PR:

So it looks like the docstring at least is still there, just the "accessor kind" bit got lost. I'm not sure that's such a big problem, given the fact that for other editors there is a big win here.

Copy link
Member

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favour of merging this - cf comment above.

@jakebailey
Copy link

since you're from microsoft, I assume "we" is vscode's python extension. Since pylance doesn't currently support cython, having only typing information from the stubs rather than the full docstrings is expected behaviour (although it would definitely be great if pylance were to support cython :-)).

There are many variants of "support cython". The one that matters is getting docstrings from libraries that are installed that were produced by cython, and that's the one that we can't do without either executing the library to inspect it (not a good idea), or getting cython to emit pyi files in addition to its compiled code (as it already has the docstrings and types in pyx files). I'm not really sure how to get that going; my impression is that it has stalled upstream.

Supporting pyx files themselves is a different story, but only helps library authors (and doesn't help end users who want to use the libraries themselves).

@svlandeg svlandeg merged commit 439f30f into explosion:master Aug 7, 2021
@ezorita ezorita mentioned this pull request Jan 21, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements types Issues related to typing or typing tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants