New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type annotations #2025
Comments
Yes, please! Python-3 type annotations to torchvision would be great! |
Two questions:
|
@pmeier |
Type annotations ToDo
|
I don't know if there is already an internal discussion about this, but in the light of inline type annotations I think we really should talk about the code format. With type annotations the signatures get quite big and with in our current style this not only looks ugly but also hinders legibility: def save_image(tensor: Union[torch.Tensor, Sequence[torch.Tensor]], fp: Union[str, io.FileIO, io.BytesIO],
nrow: int = 8, padding: int = 2, normalize: bool = False, range: Optional[Tuple[int, int]] = None,
scale_each: bool = False, pad_value: int = 0, format: Optional[str, io.FileIO] = None) -> None: The same signature formatted with def save_image(
tensor: Union[torch.Tensor, Sequence[torch.Tensor]],
fp: Union[str, io.FileIO, io.BytesIO],
nrow: int = 8,
padding: int = 2,
normalize: bool = False,
range: Optional[Tuple[int, int]] = None,
scale_each: bool = False,
pad_value: int = 0,
format: Optional[str, io.FileIO] = None,
) -> None: which (subjectively) looks better and is more legible. Note that I'm not advocating specifically for |
I'm probably not the best person to ask because I never liked the pyi files for Python-implemented bits in the first place. That said, my understanding is that think it's fine to do them inline now. Regarding the formatting: I do agree that cramming as much as possible in each line probably isn't the best way. You might see if PyTorch has a formatting preference in its formatting checks and what's the generally recommended Python way (I never know). |
Thanks for the input @t-vi! @pmeier good point about the code getting messy with type annotations. I don't have a good answer now, although indeed the output of Let me add @cpuhrsch @vincentqb and @zhangguanheng66 for discussion as well |
I second the use of |
Yes. |
This would also make reviewing easier. If you have a look at this review comment is unclear at first glance which parameter is meant since GitHub only allows to review a complete line. |
Although black is a nice formatter, I would be careful with re-formatting the whole codebase at once -- this effectively messes up with |
The author talked very briefly about integrating |
The problem is that most of the users (including Github UI) most probably only uses |
As we will touching every signature while adding annotations we could simply format them with |
@pmeier I think we could adopt the function-signature style of black, but leave the rest of the code-base unchanged? If that's what you proposed, I'm ok with it! |
My 2 cents are the following:
|
Thanks for your comment @datumbox , I will try to reply below. First, let me point out a few things, in order to explain where I'm coming from:
Regarding the improved readability of typed code: I will agree that there is a degree of subjectivity here, but IMHO, I don't find this to be the case. For readability, there's not much that an annotation will bring compared to a decent docstring. And those annotations are intended to us: torchvision developers, not users (for now). I completely agree with you that the signatures in https://github.com/pytorch/vision/pull/4229/files are unclear and need to be improved; but to me this is better solved by clarifying / adding docstrings than by adding annotations. Taking the example of #4224 that you mentioned, to be fair, I don't see how this is a net positive for the library. Those annotations are either redundant with the docstrings, or really obvious [2]. In some cases mypy is being extra picky / difficult and requires a fair amount of change for very little added value (#4171 (comment) is an example). Sometimes, mypy is just plain wrong as well, and it significantly raises the barrier to entry when it comes to contributions [3]. I don't share the point of view that reviewing typing PRs is a low reviewing effort: like documentation PRs, typing PRs require extra vigilance and very thorough checking, as an incorrect annotation is much more harmful than a lack of annotation. On top of that, we'll get a lot of typing PRs, leading to a quite significant effort overall. I also fully agree about keeping the community engaged. As you know, I've been pushing strongly towards this myself. Personally, I believe there's a lot of potential towards having better docstrings (or having docstrings at all in a lot of cases :p ). [1] a simple example of this is this code, for which annotations are completely wrong, and yet mypy is perfectly happy: def _g(a):
return a + 0.5
def f(a : int) -> int:
return _g(a) [2] I want to emphasize that my comments here do not undermine the quality of the contributions from @frgfm or @oke-aditya and the rest of our contributors. I'm only criticizing the mypy / type annotations value, not the contributions or the contributors, which we are very grateful for. [3] On a PR I submitted to pandas, I got into this: def return_4() -> int:
l = [2, 4, 6]
assert 4 in l # basically something that you *know* is True for at least one element in l
def cond(x):
return x == 4
for x in l:
if cond(x):
return x mypy complains with
Wanna make mypy happy? easy, just add raise StopIteration("This thing never gets raised anyway") # or whatever exception at the very end of the code, without changing anything else. Ironically, we didn't change the return, but mypy is still happy. It's also a net negative as this exception will never be raised and we're decreasing test coverage. As another workaround, we can write something like def return_4() -> int:
l = [2, 4, 6]
assert 4 in l # basically something that you *know* is True for at least one element in l
def cond(x):
return x == 4
return next(x for x in l if cond(x)) Which is arguably more pythonic, but also less readable for a beginner. Also, this form doesn’t make it more obvious at all that the condition is always hit for a reader (if we ignore the obvious assert above). This is the kind of stuff that makes me believe more typing = much greater contribution barrier. This stuff really isn't obvious for a non-experienced dev IMHO. |
Thanks for the comments @NicolasHug.
Adding types to our code-base will take time and will require significant effort which we can start early. I agree that JIT, ONNX and mypy can be give us hard time sometimes and certainly some places will require too many workaround with dubious benefits initially. My proposal is to merge those PRs which have a net positive on readability such as the ones mentioned earlier.
I think using typing annotations is a much more natural way to encode this information and additionally it future proofs our code-base. Having said that, I too acknowledge that the notion of readability has some subjective parts. The same applies for what one considers obvious.
I definitely agree on the fact that incorrect/misleading annotations (or docstrings) are harmful. Personally I don't mind reviewing PRs that introduce typing where it makes sense, provided they use good judgement. The maintainers listed here are experienced and given the above clarifications they can apply their judgement on where typing is beneficial and where it will cause more confusion than help. |
Well, I'm trying to argue above that IMO, these aren't net positives.
I quite firmly disagree here. First, annotations aren't part of the docstring, so in the html docs, they're not rendered next to the parameters description. Second, annotations are meant to be machine-readable, not human-readable. Let's take the The annotation for it is
I would find it hard to argue that the annotation is clearer than the human description :) |
@NicolasHug I understand that you prefer docstrings over typing annotations, that you really don't like mypy and that you prefer a less explicit approach for defining types. This is clearly documented here and on your previous comments on other PRs. The reason why I share my opinion here is because I want to engage with all the contributors and solicit their feedback. The decision of stop adding typing annotations in favour of docstrings is quite major and can't be made without quorum. This is especially true when the matter boils down to code readability and aesthetics where there is a certain degree of subjectivity. |
@NicolasHug and I had a similar offline discussion a while back and I want to just add my two cents to the examples: def _g(a):
return a + 0.5
def f(a : int) -> int:
return _g(a) The problem stems from the fact that you are mixing typed and untyped definitions here. In these cases,
So if you want this strictness, it is only one flag away. def return_4() -> int:
l = [2, 4, 6]
assert 4 in l # basically something that you *know* is True for at least one element in l
def cond(x):
return x == 4
for x in l:
if cond(x):
return x This example looks quite non-sensical to me. Could you link the original code so I can have a look at the "real" example? In general, if you hit such an obscure case this either warrants a refactor of the current implementation or simply a Speaking of Line 12 in 8e2bd0e
Plus |
And I appreciate you sharing your thoughts, as I'm the one that started this discussion and invited people to comment. It's not just about readability and aesthetics though: I hope I properly described above how the ongoing work is a) not as useful as we originally thought it would be, b) hard work, c) will raise the barrier to entry for the less experienced contributors. If you disagree with these points I'm happy to discuss further, but at the beginning of your comment in #2025 (comment) it looked to me like you were simply re-stating what you had said earlier, without taking my replies into account. Apologies if I'm mistaken here.
Well, no. We can't use the flags you mentioned until everything is typed, which brings us back to my original point: we have no ETA w.r.t. torchscript / mypy compatibility, so we don't know when we'll be able to type everything properly.
I agree the example may look strange, but it happens in real code: pandas-dev/pandas#29944 (comment)
I'm sorry, it's not as simple. It sometimes take quite a while to decipher mypy's error message, and to realize that mypy is actually wrong. And for inexperienced contributors, it can take time to properly silence mypy as well. A recent example is @vmoens who struggled to find a solution for #4256 - look at what mypy is complaning about, and the amount of "amend" commit. The mypy error message makes little sense. We finally sorted it out after a bit of back and forth on the PR and on a private chat. Again: higher barrier to contribute. |
Not true. You can set these flags on a module level in the configuration file like we are currently doing with the ignores: Lines 7 to 9 in 8e2bd0e
So if we go module by module in adding annotations that should not be an issue.
From reading just the snippet, I would say it is non-obvious that
In the PR you linked, from torch import nn
m = nn.Module()
reveal_type(m.foo)
A remedy for this, is to simply tell diff --git a/torchvision/models/inception.py b/torchvision/models/inception.py
index b5cefefa7..cb67aea08 100644
--- a/torchvision/models/inception.py
+++ b/torchvision/models/inception.py
@@ -4,7 +4,7 @@ import torch
from torch import nn, Tensor
import torch.nn.functional as F
from .._internally_replaced_utils import load_state_dict_from_url
-from typing import Callable, Any, Optional, Tuple, List
+from typing import Callable, Any, Optional, Tuple, List, cast
__all__ = ['Inception3', 'inception_v3', 'InceptionOutputs', '_InceptionOutputs']
@@ -120,7 +120,7 @@ class Inception3(nn.Module):
if init_weights:
for m in self.modules():
if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
- stddev = float(m.stddev) if hasattr(m, 'stddev') else 0.1 # type: ignore
+ stddev = cast(float, m.stddev) if hasattr(m, 'stddev') else 0.1
torch.nn.init.trunc_normal_(m.weight, mean=0.0, std=stddev, a=-2, b=2)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1) So yes, |
I don't think this remark is fair. Please when it doubt, assume positive intent and seek clarifications. Coming to the discussion, my original position was that mypy is the defacto static analysis tool used for validating typing and that we should use it as much as possible (except in cases where it makes mistakes where we turn it off). This is pretty much the way we use it right now in TorchVision. By taking into account your remarks along with those of @pmeier and @oke-aditya, it became clear that sometimes it's very hard to please it and this could lead to convoluted workarounds (see #4237). Given that perhaps a more reasonable approach is to try to adjust, configure and use mypy with caution, as @pmeier advocates. |
Philip, That we can set flags on a per-file basis does not change the fact that these annotations won't help users until mypy doesn't conflict with torchscript, thus (IMO) drastically reducing the benefits of these annotations. Also, yes, there are technical explanations and technical solutions to all of our problems here. Thank you for taking the time and effort to clarify these. But just because we can logically explain something doesn't mean that this thing is easy, or natural, or expected for most contributors. The barrier to contribute is something that I care deeply about, and it's also something that our upper management cares about (it came up various times in internal meetings). We're all trying to foster external contributions in this very issue, so surely, we all care about it here. And indeed, contributors can ask for help, but from my own experience these mypy-related issues popup so frequently that this isn't just something we can ignore, or disregard, or just waive off as a minor hindrance. Vasilis, my apologies if my comment came out as unfair. Please rest assured that I am assuming positive intent and seeking clarifications. On my end, I believe I missed your position on a few points that I tried to make, which left me with the impression that my comments had been ignored. With that regard, in order for me to better understand where we all stand, would you mind clarifying your position on the following items: Do you disagree with my POV that type annotations won't help users much until mypy doesn't conflict with torchscript? From my understanding, you believe that annotations are still relevant info for us (the developers), and you prefer annotations to docstrings. Which leads me to these next questions: What is your position regarding mypy raising the barrier for contributors? Thanks a lot for you patience and feedback. |
I believe that anyone who reads the code, either they are a user or a developer of the library, can benefit from seeing good typing information. Provided that the info is not wrong or misleading, they improve code readability and make the code more explicit. You did convince me though that in cases where mypy requires lots of workarounds, we should omit them.
I'm not sure why docstrings are even considered an option as they are meant for documentation not for defining types (perhaps when Python did not support typing it was a workaround but not anymore). Python's effort of introducing typing is still in its infancy and the static analysis tools like mypy are not mature. Still I believe it's a step towards the right direction to make the language more precise and potentially faster. For me the typing annotations is the right tool to encode typing and using them will future-proof our code (covering the code-base will take time and it's not an all-or-nothing thing, will unlock potential speed benefits as the language evolves, will improve readability etc).
We can use type aliases if we want to reduce complexity.
I don't think it's going to be a problem. We are a friendly bunch and we can help getting this fixed during the code reviews. Moreover contributors can still submit code without annotations and we can take care of that either on a follow up (good trade-off to reduce the back-and-forth and frustration on new contributors). I would even say that adding typing info is a good "bootcamp" task for someone who wants to learn the code-base and get involved. |
True, not debating that. Same for the legibility debate in the docstrings. There are definitively cases where the annotation is less legible than a short description. Unfortunately, this will usually be the case for heavily overloaded keyword arguments in convenience functions. These functions are probably used by a lot of users so good documentation is key here. We need to find a solution for that. An easy solution would be to let the annotations and the docstring diverge. We are not enforcing equality here. My point is that in the examples you have shown the nuisance stems from the fact that either
Note that you are switching your argument here from "we should hold adding type annotations until
I don't think asking for help is an issue. This is true not only for the contributors, but also for us maintainers. If you see some "weird" |
Thank you both for taking the time to reply. I will address a few last points below that I believe are important to clarify, but I think it's probably best to move on. I've made my case and I understand that I didn't convince you yet, so I probably won't be able to convince you with more discussion. On my side, I still fail to foresee the benefits of this current typing effort, but I'll be patient and hopefully the benefits will be appear clearer in the future, should you decide to move forward with it. Thanks again for your input :)
It seems that we're referring to different things here. I will definitely agree that docstrings aren't a substitute for annotations, and annotations aren't a substitute for docstrings either. They have different purposes (documentation vs type-checking) and a different audience (humans vs computers). My original point here, which was a reply to an argument of yours #2025 (comment), is that when it comes to code readability, annotations add no value over a docstring. Again: the point of an annotation is not to specify types for human readers, they specify types for the type-checker; types should be documented for humans as well, but this is the docstrings' job.
I generally agree with your opinion, but I would like to add a bit of nuance. The first one is that the word "simply" in "simply silence it" looks out of place for me :). Getting to understand and silence mypy was never simple or obvious to me. The second is that, while I agree that we should ideally take the time to properly refactor the code when mypy flags some code smell, the reality is that this is usually not the mode that we operate in. For better or worse, we tend to get sh!t done and silence / merge fast, rather than fully address the underlying issues.
To clarify: you're right that the barrier of entry will hardly change in the future. However, once mypy and torchscript are compatible, we'll be able to fully type-check torchvision. I can understand that fully type-checking torchvision is a strong enough reason to eventually raise the contribution bar. But until we can properly type check, I believe that annotations have a very limited value, and so I believe that it's not enough of a reason at the moment for raising the bar. |
Maybe we are finally getting to the source of your aversion. Let's take your example from above: def return_4() -> int:
l = [2, 4, 6]
assert 4 in l # basically something that you *know* is True for at least one element in l
def cond(x):
return x == 4
for x in l:
if cond(x):
return x Running
If you just want to silence
If you now place a def return_4() -> int: # type: ignore[return]
...
IMHO, this is as simple as locally silencing other linters such as
I can fully get behind that. On anything that is not user facing, we can just slap a
Again, I can get behind that. @oke-aditya and @frgfm have put in quite some effort into the PRs so I think it is fair to properly review them. After that I wouldn't push further until torchscript finally gets support for fundamental stuff. Of course, if the development of torchscript is halted we need to revisit this and see if it is worth to still add the annotations. |
Hey everyone 👋 Just making sure you guys don't trouble yourself too much for @oke-aditya & me:
Again, you all have brought us an extremely useful framework and related resources, I'm far from being the only one willing to help 👌 Happy to discuss this further, we all just want to continue making the PyTorch ecosystem as user/developer-friendly and useful as possible 😀 |
Hello. I would like to add type annotations to file torchvision/transforms/transforms.py. The reason is not just for static type checking. The goal is to make I am asking here because I saw the |
We are currently in the process of revamping the transforms. With this, we are likely to drop JIT scriptability for the transform classes (although there are options to retain it #6711). Thus, there is no reason to keep wrong or unnecessary strict annotations on these classes. See #5626 for a discussion. |
On this topic, should we add / update the TODO list in the issue description? Or open another perhaps? I can easily this GH issue sticking around forever otherwise :) |
@frgfm Yes, it would be a good idea to get a better summary of what needs to be done and what is already finished. Core team is swamped at the moment so we won't be making progress on that soon. If you are willing, could you post a comment like I did in #2025 (comment) (probably finer grained) and link open PRs. I freely admit that I have no idea what is still open and blocked by something (I vaguely remember there was a PR from you where the JIT simply didn't respond on core). After that we can make a decision how to move forward. For datasets and transforms it is probably harder since they are still in prototype mode. We already have #5626 and #6668 that deal with annotations there so you might have a look at them first. |
🚀 Feature
Type annotations.
Motivation
Right now, if a project depends on
torchvision
and you want to static type check it, you have no choice but either ignore it or write your own stubs.torch
already has partial support for type annotations.Pitch
We could add type annotations for
torchvision
as well.Additional context
If we want this, I would take that up.
The text was updated successfully, but these errors were encountered: