Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make mypy more strict for prototype datasets #4513

Merged
merged 18 commits into from
Oct 21, 2021
Merged

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Sep 30, 2021

Instead of retrofitting more strictness for mypy later, we can start off with strict settings.

cc @pmeier @mthrok @bjuncek

Comment on lines +9 to +21
; untyped definitions and calls
disallow_untyped_defs = True

; None and Optional handling
no_implicit_optional = True

; warnings
warn_unused_ignores = True
warn_return_any = True
warn_unreachable = True

; miscellaneous strictness flags
allow_redefinition = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the default values for those options?
If they're not the default, do we have a strong reason to use them instead of the defaults? Is this going to be clearly beneficial to the code-base and to us as developers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the default values for those options?

Nope.

If they're not the default, do we have a strong reason to use them instead of the defaults? Is this going to be clearly beneficial to the code-base and to us as developers?

Let's go through them one by one:

  • disallow_untyped_defs: by default mypy simply accepts untyped functions and uses Any for the input and output annotations. If our ultimate goal is to declare torchvision typed, we should make sure that we don't miss some functions. This flag enforces that.

  • no_implicit_optional: By default mypy allows this:

    def foo(bar: int = None) -> int:
        pass

    With this option enabled, it has to be

    def foo(bar: Optional[int] = None) -> int:
        pass

    Given that None is a valid input, we should also explicitly mention it in the annotation.

  • warn_unused_ignores: Sometimes we use # type: ignore directives on stuff that is actually wrong in other libraries. For example fix annotation for Demultiplexer pytorch#65998 will make some ignore directives obsolete that are needed now. Without this flag, we would never know.

  • warn_return_any: If a function does something with dynamic types, mypy usually falls back to treating the output as Any. This will warn us if something like this happened, but we specified a more concrete output type.

  • warn_unreachable: This is more a test functionality, as mypy will now warn us if some code is unreachable. For example, with this flag set, mypy will warn that the if branch is unreachable.

    def foo(bar: str) -> str:
        if isinstance(bar, int):
            bar = str(bar)
        return bar
  • allow_redefinition: See Set allow_redefinition = True for mypy #4531. If we have this globally, we can of course remove it here.

Apart from warn_return_any and warn_unreachable I think these flags are clearly beneficial. For the other two, they were beneficial for me in the past, but I can others object to them.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmeier !
Just some questions for my own understanding, but LGTM

for line in lines[1:-1]
]
return tuple(zip(*sorted(categories_and_labels, key=lambda category_and_label: int(category_and_label[1]))))[0]
categories_and_labels = cast(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering why we need to cast anything here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pattern.match(line).groups() returns a Tuple[Optional[str], ...]. So we need to cast to tell it that this will be a tuple of length 2 and every group was actually matched.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to cast because of the Optional bit or because of the exact length of the tuple? Or both?
Would List[Tuple[str, ...]], be enough?

Also can we remove the # type: ignore[union-attr] below now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to cast because of the Optional bit or because of the exact length of the tuple? Or both?
Would List[Tuple[str, ...]], be enough?

List[Tuple[str, ...]] seems to work out. I assumed I needed a two element tuple due to the assignment in L177.

Also can we remove the # type: ignore[union-attr] below now?

Nope. re.match returns Optional[Match] and since we don't check for match is None because we are sure that we will always match, mypy complains that None has no attribute groups.

@@ -12,4 +13,4 @@ def raw(buffer: io.IOBase) -> torch.Tensor:


def pil(buffer: io.IOBase, mode: str = "RGB") -> torch.Tensor:
return pil_to_tensor(PIL.Image.open(buffer).convert(mode.upper()))
return cast(torch.Tensor, pil_to_tensor(PIL.Image.open(buffer).convert(mode.upper())))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to call cast because pil_to_tensor is not typed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. For untyped functions mypy assumes Any and then complains because we return the more specific torch.Tensor here. I've added a warn_redundant_casts = True option that will emit a warning that this cast can be removed as soon as pil_to_tensor is typed.

@@ -8,7 +8,7 @@


# FIXME
def compute_sha256(_) -> str:
def compute_sha256(path: pathlib.Path) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol I'm afraid to ask

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs heavy refactoring as soon as the torchdata download API is stable-ish. Adding the type was just faster than adding an ignore.

@pmeier pmeier merged commit 4ba91bf into pytorch:main Oct 21, 2021
@pmeier pmeier deleted the prototype-mypy branch October 21, 2021 15:24
facebook-github-bot pushed a commit that referenced this pull request Oct 26, 2021
Summary:
* make mypy more strict for prototype datasets

* fix code format

* apply strictness only to datasets

* fix more mypy issues

* cleanup

* fix mnist annotations

* refactor celeba

* warn on redundant casts

* remove redundant cast

* simplify annotation

* fix import

Reviewed By: NicolasHug

Differential Revision: D31916328

fbshipit-source-id: 55eac940a3ed5bc3197debeb8b7bdb20ea543578
cyyever pushed a commit to cyyever/vision that referenced this pull request Nov 16, 2021
* make mypy more strict for prototype datasets

* fix code format

* apply strictness only to datasets

* fix more mypy issues

* cleanup

* fix mnist annotations

* refactor celeba

* warn on redundant casts

* remove redundant cast

* simplify annotation

* fix import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants