Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type annotations #2025

Open
pmeier opened this issue Mar 28, 2020 · 44 comments
Open

Type annotations #2025

pmeier opened this issue Mar 28, 2020 · 44 comments

Comments

@pmeier
Copy link
Collaborator

pmeier commented Mar 28, 2020

🚀 Feature

Type annotations.

Motivation

Right now, if a project depends on torchvision and you want to static type check it, you have no choice but either ignore it or write your own stubs. torch already has partial support for type annotations.

Pitch

We could add type annotations for torchvision as well.

Additional context

If we want this, I would take that up.

@fmassa
Copy link
Member

fmassa commented Mar 30, 2020

Yes, please! Python-3 type annotations to torchvision would be great!

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 30, 2020

Two questions:

  1. Should I stick to stubs, i.e. create an .pyi file for every .py file, or do we want the annotations inline?
  2. Do you want me to do this in one gigantic PR or should I split it by module or package?

@fmassa
Copy link
Member

fmassa commented Mar 30, 2020

@pmeier
1 - let's do the annotations inline. I think the reason PyTorch used the .pyi file for the annotations was due to Python2 compatibility (but I might be wrong). @t-vi do you have thoughts on this?
2 - let's split it in smaller chunks which are easier to review. It doesn't even need to be the full module at once, for example transforms is very big, can we can send a few functions at a time.

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 31, 2020

Type annotations ToDo

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 31, 2020

I don't know if there is already an internal discussion about this, but in the light of inline type annotations I think we really should talk about the code format. With type annotations the signatures get quite big and with in our current style this not only looks ugly but also hinders legibility:

def save_image(tensor: Union[torch.Tensor, Sequence[torch.Tensor]], fp: Union[str, io.FileIO, io.BytesIO],
               nrow: int = 8, padding: int = 2, normalize: bool = False, range: Optional[Tuple[int, int]] = None,
               scale_each: bool = False, pad_value: int = 0, format: Optional[str, io.FileIO] = None) -> None:

The same signature formatted with black looks like this

def save_image(
    tensor: Union[torch.Tensor, Sequence[torch.Tensor]],
    fp: Union[str, io.FileIO, io.BytesIO],
    nrow: int = 8,
    padding: int = 2,
    normalize: bool = False,
    range: Optional[Tuple[int, int]] = None,
    scale_each: bool = False,
    pad_value: int = 0,
    format: Optional[str, io.FileIO] = None,
) -> None:

which (subjectively) looks better and is more legible.


Note that I'm not advocating specifically for black, but rather for a code formatter or more general a unified code format.

@t-vi
Copy link
Contributor

t-vi commented Mar 31, 2020

I'm probably not the best person to ask because I never liked the pyi files for Python-implemented bits in the first place. That said, my understanding is that think it's fine to do them inline now.

Regarding the formatting: I do agree that cramming as much as possible in each line probably isn't the best way. You might see if PyTorch has a formatting preference in its formatting checks and what's the generally recommended Python way (I never know).

@fmassa
Copy link
Member

fmassa commented Mar 31, 2020

Thanks for the input @t-vi!

@pmeier good point about the code getting messy with type annotations. I don't have a good answer now, although indeed the output of black does seem more readable.

Let me add @cpuhrsch @vincentqb and @zhangguanheng66 for discussion as well

@vincentqb
Copy link
Contributor

vincentqb commented Mar 31, 2020

I second the use of black-style inline type hint formatting.

@zhangguanheng66
Copy link
Contributor

Yes. black makes sense to me as well.

@pmeier
Copy link
Collaborator Author

pmeier commented Apr 1, 2020

This would also make reviewing easier. If you have a look at this review comment is unclear at first glance which parameter is meant since GitHub only allows to review a complete line.

@fmassa
Copy link
Member

fmassa commented Apr 6, 2020

Although black is a nice formatter, I would be careful with re-formatting the whole codebase at once -- this effectively messes up with blame so that we don't know anymore what was added when and by who.

@pmeier
Copy link
Collaborator Author

pmeier commented Apr 6, 2020

The author talked very briefly about integrating black into an existing codebase. I've had no contact with blame so its hard for me to asses if hyper-blame could work for us. Could you have a look?

@fmassa
Copy link
Member

fmassa commented Apr 6, 2020

The problem is that most of the users (including Github UI) most probably only uses git blame, so asking it to be changed to hyper-blame is a bit hard

@pmeier
Copy link
Collaborator Author

pmeier commented Apr 7, 2020

As we will touching every signature while adding annotations we could simply format them with black and leave everything else as is. This way git blame should still work, but we would enhance legibility. If we want to adopt black for the complete codebase we could do so by one commit at a time later.

@fmassa
Copy link
Member

fmassa commented Apr 7, 2020

@pmeier I think we could adopt the function-signature style of black, but leave the rest of the code-base unchanged? If that's what you proposed, I'm ok with it!

@datumbox
Copy link
Contributor

datumbox commented Aug 16, 2021

My 2 cents are the following:

  • I agree it might be worth pausing for now the efforts of typing methods that lead to conflicts between JIT, ONNX and mypy OR require lots of workarounds to address their limitations (such as Fixed typing annotations of torchvision/ops #4237)
  • I believe that typing the rest of the methods is still valuable. As long as the typing annotations that we use are correct and there are no shenanigans, the typing information improves code readability (for example see Added typing annotations to models/video #4229 where it exposes the fact that in many methods it's not obvious what is the intended variable types).
  • I think we should continue reviewing open PRs (such as Added typing annotations to io/__init__ #4224) that introduced typing annotations on non-problematic cases. The effort of reviewing them is typically low and can help keeping our contributor community engaged.

@NicolasHug
Copy link
Member

Thanks for your comment @datumbox , I will try to reply below. First, let me point out a few things, in order to explain where I'm coming from:

  • The goal of this issue is for our users to be able to type-check their code. This won't happen until we have a proper py.typed file, and this won't be possible until all of torchvision is properly typed. We won't be able to fully type torchvision until torchscript and mypy aren't conflicting, and for now we don't have an ETA on that. Unfortunatelly, partially typing a codebase has very little benefit for its users: mypy just ignores untyped interfaces, and so there is a risk that our annotations are just incorrect. [1]
  • As a result, for now, and until we have a py.typed file, all annotations are only available and usable to us, the torchvision contributors. It does almost nothing for our users for now, since they can't type-check their code. This is an important thing to keep in mind for the ongoing discussion.

Regarding the improved readability of typed code: I will agree that there is a degree of subjectivity here, but IMHO, I don't find this to be the case. For readability, there's not much that an annotation will bring compared to a decent docstring. And those annotations are intended to us: torchvision developers, not users (for now). I completely agree with you that the signatures in https://github.com/pytorch/vision/pull/4229/files are unclear and need to be improved; but to me this is better solved by clarifying / adding docstrings than by adding annotations.

Taking the example of #4224 that you mentioned, to be fair, I don't see how this is a net positive for the library. Those annotations are either redundant with the docstrings, or really obvious [2]. In some cases mypy is being extra picky / difficult and requires a fair amount of change for very little added value (#4171 (comment) is an example). Sometimes, mypy is just plain wrong as well, and it significantly raises the barrier to entry when it comes to contributions [3].

I don't share the point of view that reviewing typing PRs is a low reviewing effort: like documentation PRs, typing PRs require extra vigilance and very thorough checking, as an incorrect annotation is much more harmful than a lack of annotation. On top of that, we'll get a lot of typing PRs, leading to a quite significant effort overall.

I also fully agree about keeping the community engaged. As you know, I've been pushing strongly towards this myself. Personally, I believe there's a lot of potential towards having better docstrings (or having docstrings at all in a lot of cases :p ).


[1] a simple example of this is this code, for which annotations are completely wrong, and yet mypy is perfectly happy:

def _g(a):
    return a + 0.5

def f(a : int) -> int:
    return _g(a)

[2] I want to emphasize that my comments here do not undermine the quality of the contributions from @frgfm or @oke-aditya and the rest of our contributors. I'm only criticizing the mypy / type annotations value, not the contributions or the contributors, which we are very grateful for.

[3] On a PR I submitted to pandas, I got into this:

def return_4() -> int:
    l = [2, 4, 6]
    assert 4 in l   # basically something that you *know* is True for at least one element in l

    def cond(x):
        return x == 4

    for x in l:
        if cond(x):
            return x

mypy complains with

lol.py:1: error: Missing return statement  [return]
    def return_4() -> int:
    ^
Found 1 error in 1 file (checked 1 source file)

Wanna make mypy happy? easy, just add

raise StopIteration("This thing never gets raised anyway")  # or whatever exception

at the very end of the code, without changing anything else. Ironically, we didn't change the return, but mypy is still happy. It's also a net negative as this exception will never be raised and we're decreasing test coverage.

As another workaround, we can write something like

def return_4() -> int:
    l = [2, 4, 6]
    assert 4 in l   # basically something that you *know* is True for at least one element in l

    def cond(x):
        return x == 4

    return next(x for x in l if cond(x))

Which is arguably more pythonic, but also less readable for a beginner. Also, this form doesn’t make it more obvious at all that the condition is always hit for a reader (if we ignore the obvious assert above). This is the kind of stuff that makes me believe more typing = much greater contribution barrier. This stuff really isn't obvious for a non-experienced dev IMHO.

@datumbox
Copy link
Contributor

Thanks for the comments @NicolasHug.

The goal of this issue is for our users to be able to type-check their code. This won't happen until we have a proper py.typed file, and this won't be possible until all of torchvision is properly typed.

Adding types to our code-base will take time and will require significant effort which we can start early. I agree that JIT, ONNX and mypy can be give us hard time sometimes and certainly some places will require too many workaround with dubious benefits initially. My proposal is to merge those PRs which have a net positive on readability such as the ones mentioned earlier.

I will agree that there is a degree of subjectivity here, but IMHO, I don't find this to be the case. For readability, there's not much that an annotation will bring compared to a decent docstring.

I think using typing annotations is a much more natural way to encode this information and additionally it future proofs our code-base. Having said that, I too acknowledge that the notion of readability has some subjective parts. The same applies for what one considers obvious.

I don't share the point of view that reviewing typing PRs is a low reviewing effort: like documentation PRs, typing PRs require extra vigilance and very thorough checking, as an incorrect annotation is much more harmful than a lack of annotation.

I definitely agree on the fact that incorrect/misleading annotations (or docstrings) are harmful. Personally I don't mind reviewing PRs that introduce typing where it makes sense, provided they use good judgement. The maintainers listed here are experienced and given the above clarifications they can apply their judgement on where typing is beneficial and where it will cause more confusion than help.

@NicolasHug
Copy link
Member

My proposal is to merge those PRs which have a net positive on readability such as the ones mentioned earlier.

Well, I'm trying to argue above that IMO, these aren't net positives.

I think using typing annotations is a much more natural way to encode this information

I quite firmly disagree here. First, annotations aren't part of the docstring, so in the html docs, they're not rendered next to the parameters description. Second, annotations are meant to be machine-readable, not human-readable. Let's take the colors parameter for example: https://github.com/pytorch/vision/blob/master/torchvision/utils.py#L144:L144

The annotation for it is Optional[Union[List[Union[str, Tuple[int, int, int]]], str, Tuple[int, int, int]]].
Its description is:

List containing the colors or a single color for all of the bounding boxes. The colors can be represented as str or Tuple[int, int, int].

I would find it hard to argue that the annotation is clearer than the human description :)

@datumbox
Copy link
Contributor

@NicolasHug I understand that you prefer docstrings over typing annotations, that you really don't like mypy and that you prefer a less explicit approach for defining types. This is clearly documented here and on your previous comments on other PRs.

The reason why I share my opinion here is because I want to engage with all the contributors and solicit their feedback. The decision of stop adding typing annotations in favour of docstrings is quite major and can't be made without quorum. This is especially true when the matter boils down to code readability and aesthetics where there is a certain degree of subjectivity.

@pmeier
Copy link
Collaborator Author

pmeier commented Aug 16, 2021

@NicolasHug and I had a similar offline discussion a while back and I want to just add my two cents to the examples:

def _g(a):
    return a + 0.5

def f(a : int) -> int:
    return _g(a)

The problem stems from the fact that you are mixing typed and untyped definitions here. In these cases, mypy is very lenient by default. Multiple options out here:

  1. Run with --disallow-untyped-calls:
lol.py:6: error: Call to untyped function "_g" in typed context
Found 1 error in 1 file (checked 1 source file)
  1. Run with --warn-return-any
lol.py:6: error: Returning Any from function declared to return "int"
Found 1 error in 1 file (checked 1 source file)

So if you want this strictness, it is only one flag away.


def return_4() -> int:
    l = [2, 4, 6]
    assert 4 in l   # basically something that you *know* is True for at least one element in l

    def cond(x):
        return x == 4

    for x in l:
        if cond(x):
            return x

This example looks quite non-sensical to me. Could you link the original code so I can have a look at the "real" example?

In general, if you hit such an obscure case this either warrants a refactor of the current implementation or simply a # type: ignore[foo] comment.


Speaking of #type: ignore comments, I'm not sure why you basically insist on mypy being perfect. If the implementation is logically correct and you are happy with it but mypy is not, just silence it. I mean for other static checkers such as flake8 we also flat-out ignore some error codes and this doesn't seem to bother you.

ignore = F401,E402,F403,W503,W504,F821

Plus git grep '# noqa' | wc -l reveals 14 instances where we silence flake8 locally.

@NicolasHug
Copy link
Member

NicolasHug commented Aug 16, 2021

The reason why I share my opinion here is because I want to engage with all the contributors and solicit their feedback. The decision of stop adding typing annotations in favour of docstrings is quite major and can't be made without quorum. This is especially true when the matter boils down to code readability and aesthetics where there is a certain degree of subjectivity.

And I appreciate you sharing your thoughts, as I'm the one that started this discussion and invited people to comment. It's not just about readability and aesthetics though: I hope I properly described above how the ongoing work is a) not as useful as we originally thought it would be, b) hard work, c) will raise the barrier to entry for the less experienced contributors. If you disagree with these points I'm happy to discuss further, but at the beginning of your comment in #2025 (comment) it looked to me like you were simply re-stating what you had said earlier, without taking my replies into account. Apologies if I'm mistaken here.

So if you want this strictness, it is only one flag away.

Well, no. We can't use the flags you mentioned until everything is typed, which brings us back to my original point: we have no ETA w.r.t. torchscript / mypy compatibility, so we don't know when we'll be able to type everything properly.

This example looks quite non-sensical to me. Could you link the original code so I can have a look at the "real" example?

I agree the example may look strange, but it happens in real code: pandas-dev/pandas#29944 (comment)

Speaking of #type: ignore comments, I'm not sure why you basically insist on mypy being perfect. If the implementation is logically correct and you are happy with it but mypy is not, just silence it

I'm sorry, it's not as simple. It sometimes take quite a while to decipher mypy's error message, and to realize that mypy is actually wrong. And for inexperienced contributors, it can take time to properly silence mypy as well. A recent example is @vmoens who struggled to find a solution for #4256 - look at what mypy is complaning about, and the amount of "amend" commit. The mypy error message makes little sense. We finally sorted it out after a bit of back and forth on the PR and on a private chat. Again: higher barrier to contribute.

@pmeier
Copy link
Collaborator Author

pmeier commented Aug 16, 2021

Well, no. We can't use the flags you mentioned until everything is typed, which brings us back to my original point: we have no ETA w.r.t. torchscript / mypy compatibility, so we don't know when we'll be able to type everything properly.

Not true. You can set these flags on a module level in the configuration file like we are currently doing with the ignores:

vision/mypy.ini

Lines 7 to 9 in 8e2bd0e

[mypy-torchvision.io._video_opt.*]
ignore_errors = True

So if we go module by module in adding annotations that should not be an issue.

I agree the example may look strange, but it happens in real code: pandas-dev/pandas#29944 (comment)

From reading just the snippet, I would say it is non-obvious that self.subplots always contains col_idx. If you are sure that is the case at runtime, just ignore the error. This brings us to the last point:

t sometimes take quite a while to decipher mypy's error message, and to realize that mypy is actually wrong.

In the PR you linked, mypy is not wrong, we are using it wrong. It is a static type checker. So if you do shenanigans with dynamic attributes, I wouldn't expect it to be working. Neither nn.Conv2d nor nn.Linear have an stddev attribute and this is where the error stems from. The confusion comes from the fact that an nn.Module is setup to return Union[Tensor, nn.Module] as default type for unknown attributes:

from torch import nn

m = nn.Module()
reveal_type(m.foo)
main.py:4: note: Revealed type is 'Union[torch._tensor.Tensor, torch.nn.modules.module.Module]

A remedy for this, is to simply tell mypy "this dynamic attribute that I'm accessing is actually a float":

diff --git a/torchvision/models/inception.py b/torchvision/models/inception.py
index b5cefefa7..cb67aea08 100644
--- a/torchvision/models/inception.py
+++ b/torchvision/models/inception.py
@@ -4,7 +4,7 @@ import torch
 from torch import nn, Tensor
 import torch.nn.functional as F
 from .._internally_replaced_utils import load_state_dict_from_url
-from typing import Callable, Any, Optional, Tuple, List
+from typing import Callable, Any, Optional, Tuple, List, cast
 
 
 __all__ = ['Inception3', 'inception_v3', 'InceptionOutputs', '_InceptionOutputs']
@@ -120,7 +120,7 @@ class Inception3(nn.Module):
         if init_weights:
             for m in self.modules():
                 if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
-                    stddev = float(m.stddev) if hasattr(m, 'stddev') else 0.1  # type: ignore
+                    stddev = cast(float, m.stddev) if hasattr(m, 'stddev') else 0.1
                     torch.nn.init.trunc_normal_(m.weight, mean=0.0, std=stddev, a=-2, b=2)
                 elif isinstance(m, nn.BatchNorm2d):
                     nn.init.constant_(m.weight, 1)

So yes, mypy increases the burden as an inexperienced contributor. But then, given its complexity, PyTorch is not really a beginner project. I'm not saying this discourage new contributors, but I think it is fair in such a case to say "This seems right to me, but I don't know how to make mypy happy. Can someone with more experience please tell me what is going on?".

@datumbox
Copy link
Contributor

If you disagree with these points I'm happy to discuss further, but at the beginning of your comment in #2025 (comment) it looked to me like you were simply re-stating what you had said earlier, without taking my replies into account.

I don't think this remark is fair. Please when it doubt, assume positive intent and seek clarifications.

Coming to the discussion, my original position was that mypy is the defacto static analysis tool used for validating typing and that we should use it as much as possible (except in cases where it makes mistakes where we turn it off). This is pretty much the way we use it right now in TorchVision. By taking into account your remarks along with those of @pmeier and @oke-aditya, it became clear that sometimes it's very hard to please it and this could lead to convoluted workarounds (see #4237). Given that perhaps a more reasonable approach is to try to adjust, configure and use mypy with caution, as @pmeier advocates.

@NicolasHug
Copy link
Member

Philip,

That we can set flags on a per-file basis does not change the fact that these annotations won't help users until mypy doesn't conflict with torchscript, thus (IMO) drastically reducing the benefits of these annotations.

Also, yes, there are technical explanations and technical solutions to all of our problems here. Thank you for taking the time and effort to clarify these. But just because we can logically explain something doesn't mean that this thing is easy, or natural, or expected for most contributors. The barrier to contribute is something that I care deeply about, and it's also something that our upper management cares about (it came up various times in internal meetings). We're all trying to foster external contributions in this very issue, so surely, we all care about it here. And indeed, contributors can ask for help, but from my own experience these mypy-related issues popup so frequently that this isn't just something we can ignore, or disregard, or just waive off as a minor hindrance.

Vasilis,

my apologies if my comment came out as unfair. Please rest assured that I am assuming positive intent and seeking clarifications. On my end, I believe I missed your position on a few points that I tried to make, which left me with the impression that my comments had been ignored. With that regard, in order for me to better understand where we all stand, would you mind clarifying your position on the following items:

Do you disagree with my POV that type annotations won't help users much until mypy doesn't conflict with torchscript?

From my understanding, you believe that annotations are still relevant info for us (the developers), and you prefer annotations to docstrings. Which leads me to these next questions:
Do you you always find annotations to be better than docstrings, even though docstrings contain more info / descriptions about the parameters? Would you mind providing an example where a docstring is not as clear or not as convenient as a type annotation? Also, do you find the annotation of the colors parameter clearer than its English description? #2025 (comment)

What is your position regarding mypy raising the barrier for contributors?

Thanks a lot for you patience and feedback.

@datumbox
Copy link
Contributor

Do you disagree with my POV that type annotations won't help users much until mypy doesn't conflict with torchscript?

I believe that anyone who reads the code, either they are a user or a developer of the library, can benefit from seeing good typing information. Provided that the info is not wrong or misleading, they improve code readability and make the code more explicit. You did convince me though that in cases where mypy requires lots of workarounds, we should omit them.

you prefer annotations to docstrings

I'm not sure why docstrings are even considered an option as they are meant for documentation not for defining types (perhaps when Python did not support typing it was a workaround but not anymore). Python's effort of introducing typing is still in its infancy and the static analysis tools like mypy are not mature. Still I believe it's a step towards the right direction to make the language more precise and potentially faster. For me the typing annotations is the right tool to encode typing and using them will future-proof our code (covering the code-base will take time and it's not an all-or-nothing thing, will unlock potential speed benefits as the language evolves, will improve readability etc).

Also, do you find the annotation of the colors parameter clearer than its English description?

We can use type aliases if we want to reduce complexity.
But I think this example hints that we got a very overloaded API and an implementation that does a lot of things.

What is your position regarding mypy raising the barrier for contributors?

I don't think it's going to be a problem. We are a friendly bunch and we can help getting this fixed during the code reviews. Moreover contributors can still submit code without annotations and we can take care of that either on a follow up (good trade-off to reduce the back-and-forth and frustration on new contributors). I would even say that adding typing info is a good "bootcamp" task for someone who wants to learn the code-base and get involved.

@pmeier
Copy link
Collaborator Author

pmeier commented Aug 17, 2021

@NicolasHug

That we can set flags on a per-file basis does not change the fact that these annotations won't help users until mypy doesn't conflict with torchscript, thus (IMO) drastically reducing the benefits of these annotations.

True, not debating that. Same for the legibility debate in the docstrings. There are definitively cases where the annotation is less legible than a short description. Unfortunately, this will usually be the case for heavily overloaded keyword arguments in convenience functions. These functions are probably used by a lot of users so good documentation is key here. We need to find a solution for that. An easy solution would be to let the annotations and the docstring diverge. We are not enforcing equality here.

My point is that in the examples you have shown the nuisance stems from the fact that either mypy is misconfigured or used in a dynamic context which it is not built for. In these cases it is up to us to configure it properly, refactor the code to remove the problematic parts, or simply silence it. mypy is usually (yes, sometimes it is actually a mypy issue) not to blame. In doing so, we are missing the point.

But just because we can logically explain something doesn't mean that this thing is easy, or natural, or expected for most contributors. The barrier to contribute is something that I care deeply about, and it's also something that our upper management cares about (it came up various times in internal meetings).

Note that you are switching your argument here from "we should hold adding type annotations until mypy and torchscript have converged" to "we should not add type annotations". Barrier of entry will only slightly change in the future and all the "weirdness" you gave examples for above will still be there.

And indeed, contributors can ask for help, but from my own experience these mypy-related issues popup so frequently that this isn't just something we can ignore, or disregard, or just waive off as a minor hindrance.

I don't think asking for help is an issue. This is true not only for the contributors, but also for us maintainers. If you see some "weird" mypy behavior feel free to reach out to me. There is no need for you or anyone else to battle this if they don't want to get into the guts of it.

@NicolasHug
Copy link
Member

NicolasHug commented Aug 17, 2021

Thank you both for taking the time to reply. I will address a few last points below that I believe are important to clarify, but I think it's probably best to move on. I've made my case and I understand that I didn't convince you yet, so I probably won't be able to convince you with more discussion. On my side, I still fail to foresee the benefits of this current typing effort, but I'll be patient and hopefully the benefits will be appear clearer in the future, should you decide to move forward with it.

Thanks again for your input :)


I'm not sure why docstrings are even considered an option as they are meant for documentation not for defining types (perhaps when Python did not support typing it was a workaround but not anymore)

It seems that we're referring to different things here. I will definitely agree that docstrings aren't a substitute for annotations, and annotations aren't a substitute for docstrings either. They have different purposes (documentation vs type-checking) and a different audience (humans vs computers). My original point here, which was a reply to an argument of yours #2025 (comment), is that when it comes to code readability, annotations add no value over a docstring. Again: the point of an annotation is not to specify types for human readers, they specify types for the type-checker; types should be documented for humans as well, but this is the docstrings' job.

My point is that in the examples you have shown the nuisance stems from the fact that either mypy is misconfigured or used in a dynamic context which it is not built for. In these cases it is up to us to configure it properly, refactor the code to remove the problematic parts, or simply silence it. mypy is usually (yes, sometimes it is actually a mypy issue) not to blame. In doing so, we are missing the point.

I generally agree with your opinion, but I would like to add a bit of nuance. The first one is that the word "simply" in "simply silence it" looks out of place for me :). Getting to understand and silence mypy was never simple or obvious to me. The second is that, while I agree that we should ideally take the time to properly refactor the code when mypy flags some code smell, the reality is that this is usually not the mode that we operate in. For better or worse, we tend to get sh!t done and silence / merge fast, rather than fully address the underlying issues.

Note that you are switching your argument here from "we should hold adding type annotations until mypy and torchscript have converged" to "we should not add type annotations". Barrier of entry will only slightly change in the future and all the "weirdness" you gave examples for above will still be there.

To clarify: you're right that the barrier of entry will hardly change in the future. However, once mypy and torchscript are compatible, we'll be able to fully type-check torchvision. I can understand that fully type-checking torchvision is a strong enough reason to eventually raise the contribution bar. But until we can properly type check, I believe that annotations have a very limited value, and so I believe that it's not enough of a reason at the moment for raising the bar.

@pmeier
Copy link
Collaborator Author

pmeier commented Aug 17, 2021

The first one is that the word "simply" in "simply silence it" looks out of place for me :). Getting to understand and silence mypy was never simple or obvious to me.

Maybe we are finally getting to the source of your aversion. Let's take your example from above:

def return_4() -> int:
    l = [2, 4, 6]
    assert 4 in l   # basically something that you *know* is True for at least one element in l

    def cond(x):
        return x == 4

    for x in l:
        if cond(x):
            return x

Running mypy on this gives you:

main.py:1: error: Missing return statement  [return]
    def return_4() -> int:
    ^
Found 1 error in 1 file (checked 1 source file)

If you just want to silence mypy here, you only need two pieces of information from this:

  1. Where did this happen? main.py:1
  2. What is the error code? [return]

If you now place a # type: ignore[return] on line 1 in the file main.py, mypy is silenced.

def return_4() -> int:  # type: ignore[return]
    ...
Success: no issues found in 1 source file

IMHO, this is as simple as locally silencing other linters such as flake8.

The second is that, while I agree that we should ideally take the time to properly refactor the code when mypy flags some code smell, the reality is that this is usually not the mode that we operate in. For better or worse, we tend to get sh!t done and silence / merge fast, rather than fully address the underlying issues.

I can fully get behind that. On anything that is not user facing, we can just slap a # type: ignore on it and fix it later.

But until we can properly type check, I believe that annotations have a very limited value, and so I believe that it's not enough of a reason at the moment for raising the bar.

Again, I can get behind that. @oke-aditya and @frgfm have put in quite some effort into the PRs so I think it is fair to properly review them. After that I wouldn't push further until torchscript finally gets support for fundamental stuff. Of course, if the development of torchscript is halted we need to revisit this and see if it is worth to still add the annotations.

@frgfm
Copy link
Contributor

frgfm commented Aug 23, 2021

Hey everyone 👋

Just making sure you guys don't trouble yourself too much for @oke-aditya & me:

  • for my part, I do use PyTorch & torchvision quite a lot and develop many things with it. When I'm implementing / tuning / modifying PyTorch stuff, it's extremely useful to have typing annotations (saving a lot of time to match the interface of the core deep learning framework, and providing the same level of information easily on my end). So I do see the advantages for developers, but they are admittedly decreased for plain users 🤷‍♂️
  • seeing that we still have many open issues on the vision side, I think bringing some help is the least I could do! that being said, this is open source and I fully support the common decision if we don't move forward with this. I'm just here to help :)
  • one last part that we shouldn't neglect: in the long term, having enforced mypy checks indeed raise the difficulty for new contributions. Either we'll need to direct new contributors to some material to understand typing & submit PRs that properly integrate it, or each review will need to consider typing carefully (and suggest modifications).

Again, you all have brought us an extremely useful framework and related resources, I'm far from being the only one willing to help 👌 Happy to discuss this further, we all just want to continue making the PyTorch ecosystem as user/developer-friendly and useful as possible 😀

@mauvilsa
Copy link

mauvilsa commented Oct 14, 2022

Hello. I would like to add type annotations to file torchvision/transforms/transforms.py. The reason is not just for static type checking. The goal is to make torchvision.transforms.Compose work with pytorch-lightning's LightningCLI which makes classes automatically configurable by inspecting signatures at run time.

I am asking here because I saw the torch.jit types limitation, #2025 (comment). From a fast look at the code, the type for the transforms parameter of the Compose class seems to be List[Union[torch.nn.Module, Callable]]. Maybe I don't have the correct type yet, but not important. The actual question is, would List and Callable be a problem for torch.jit?

@pmeier
Copy link
Collaborator Author

pmeier commented Oct 14, 2022

The actual question is, would List and Callable be a problem for torch.jit?

List no, Callable yes. Unfortunately, torchscript only supports a very limited subset of types.

We are currently in the process of revamping the transforms. With this, we are likely to drop JIT scriptability for the transform classes (although there are options to retain it #6711). Thus, there is no reason to keep wrong or unnecessary strict annotations on these classes. See #5626 for a discussion.

@frgfm
Copy link
Contributor

frgfm commented Oct 15, 2022

On this topic, should we add / update the TODO list in the issue description? Or open another perhaps? I can easily this GH issue sticking around forever otherwise :)

@pmeier
Copy link
Collaborator Author

pmeier commented Oct 17, 2022

@frgfm Yes, it would be a good idea to get a better summary of what needs to be done and what is already finished. Core team is swamped at the moment so we won't be making progress on that soon. If you are willing, could you post a comment like I did in #2025 (comment) (probably finer grained) and link open PRs. I freely admit that I have no idea what is still open and blocked by something (I vaguely remember there was a PR from you where the JIT simply didn't respond on core).

After that we can make a decision how to move forward. For datasets and transforms it is probably harder since they are still in prototype mode. We already have #5626 and #6668 that deal with annotations there so you might have a look at them first.

@frgfm
Copy link
Contributor

frgfm commented Oct 26, 2022

Sure! Here is where we are at:

I'll update this message with future evolutions

@oke-aditya
Copy link
Contributor

A bunch of PRs actually that kind of be useful
#4630
#4612
#4599
#4323

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants