Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add example stubs (3) #12801

Merged
merged 17 commits into from Aug 2, 2023
Merged

feat: add example stubs (3) #12801

merged 17 commits into from Aug 2, 2023

Conversation

svlandeg
Copy link
Member

@svlandeg svlandeg commented Jul 6, 2023

Third time's a charm (maybe) - PR following up on #12679 after Github sync & auth issues.

Description

This PR adds a stubs file for spacy.training.example. It also fixes a few typing-related issues.

As this is targeting develop, the history will look bad until the branches are synced. Done

Types of change

enhancement

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

Comment on lines 23 to 25
class ReaderProtocol(Protocol):
def __call__(self, nlp: "Language") -> Iterable[Example]:
pass
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrianeboyd : regarding your comment saying that Iterator was correct here.

The ReaderProtocol introduced in this PR aligns with the status of the create_X_reader methods before this PR, them being typed as returning a Callable[["Language"], Iterable[Example]]. So introducing this ReaderProtocol here is not a modification in any way.

There was however an inconsistency between the typing of these create_X_reader methods and the implementations of the XCorpus.__call__ methods, the latter returning Iterator[Example]. As an Iterator is also an Iterable, we can make the types consistent by typing the latter ones as Iterable too.

That said - I'm fine with taking this contribution out of this PR and have this one focus only on adding the example.pyi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to move this to a separate PR. I still think that Iterable[Example] should be Iterator[Example] throughout since Iterator[Example] is the correct type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is Iterator the correct type? You could also argue that it's nice to define a ReaderProtocol that is more generic/permissive in case you want to implement a different reader.

Anyway I don't want this discussion to hold up the PR and the unrelated changes & improvements of adding the example.pyi so I'm reverting those changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the changes in corpus.py makes mypy break on the CI. It's unclear to me why this incompatibility between Iterable and Iterator only shows up when adding example.pyi, I guess because otherwise the statements were left unchecked. So that's why Basile had these as part of this PR in the first place.

So, let's make a final decision and make things consistent. I felt like typing to Iterable is the most generic and least breaking because you can stil return an iterator, while narrowing down to Iterator for all readers might be more breaking. What's your counter argument to type everything as Iterator @adrianeboyd ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what isn't breaking is to widen argument types or to narrow return types? This is widening the return type, which could potentially break code if someone was indeed counting on it being an Iterator. I don't think that we're using it as an Iterator anywhere in our code or projects or examples, but I'm not even 100% sure.

For the *Corpus classes modified here, the correct type is Iterator[Example] and I don't see why this needs to be modified?

For the readers the ReaderProtocol as proposed seems fine to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looking at this again with fresh eyes, it appears that the whole Iterator/Iterable changes were made chasing a red herring. mypy started failing, and rightfully so, after introducing example.pyi because there was a wrongly typed return type Iterable[Doc] in spacy.PlainTextCorpus.v1, which should have been Iterable[Example]. Fixing that, makes the CI green with no other edits needed.

We might still want to introduce the ReaderProtocol, I thought it was nice too, but let's do that in a separate PR to keep the changes minimal here and so we can get this merged in hopefully soonish.

@svlandeg svlandeg mentioned this pull request Jul 7, 2023
3 tasks
@svlandeg svlandeg added enhancement Feature requests and improvements types Issues related to typing or typing tools labels Jul 7, 2023
@svlandeg svlandeg marked this pull request as draft July 10, 2023 18:21
@svlandeg svlandeg marked this pull request as ready for review August 1, 2023 12:46
@property
def ents(self) -> Sequence[Span]: ...
@ents.setter
def ents(self, value: Sequence[Span]) -> None: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setter should also accept a wider range of values including tuples, so this is incorrect in many ways. I don't really think we should encourage the use of tuples here, but the general bug/constraint from mypy is pretty limiting.

@adrianeboyd adrianeboyd merged commit 0737443 into develop Aug 2, 2023
18 checks passed
@adrianeboyd adrianeboyd deleted the feature/type_fixes branch August 2, 2023 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements types Issues related to typing or typing tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants