Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type annotations #2025

Closed
pmeier opened this issue Mar 28, 2020 · 49 comments
Closed

Type annotations #2025

pmeier opened this issue Mar 28, 2020 · 49 comments

Comments

@pmeier
Copy link
Collaborator

pmeier commented Mar 28, 2020

🚀 Feature

Type annotations.

Motivation

Right now, if a project depends on torchvision and you want to static type check it, you have no choice but either ignore it or write your own stubs. torch already has partial support for type annotations.

Pitch

We could add type annotations for torchvision as well.

Additional context

If we want this, I would take that up.

@fmassa
Copy link
Member

fmassa commented Mar 30, 2020

Yes, please! Python-3 type annotations to torchvision would be great!

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 30, 2020

Two questions:

  1. Should I stick to stubs, i.e. create an .pyi file for every .py file, or do we want the annotations inline?
  2. Do you want me to do this in one gigantic PR or should I split it by module or package?

@fmassa
Copy link
Member

fmassa commented Mar 30, 2020

@pmeier
1 - let's do the annotations inline. I think the reason PyTorch used the .pyi file for the annotations was due to Python2 compatibility (but I might be wrong). @t-vi do you have thoughts on this?
2 - let's split it in smaller chunks which are easier to review. It doesn't even need to be the full module at once, for example transforms is very big, can we can send a few functions at a time.

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 31, 2020

Type annotations ToDo

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 31, 2020

I don't know if there is already an internal discussion about this, but in the light of inline type annotations I think we really should talk about the code format. With type annotations the signatures get quite big and with in our current style this not only looks ugly but also hinders legibility:

def save_image(tensor: Union[torch.Tensor, Sequence[torch.Tensor]], fp: Union[str, io.FileIO, io.BytesIO],
               nrow: int = 8, padding: int = 2, normalize: bool = False, range: Optional[Tuple[int, int]] = None,
               scale_each: bool = False, pad_value: int = 0, format: Optional[str, io.FileIO] = None) -> None:

The same signature formatted with black looks like this

def save_image(
    tensor: Union[torch.Tensor, Sequence[torch.Tensor]],
    fp: Union[str, io.FileIO, io.BytesIO],
    nrow: int = 8,
    padding: int = 2,
    normalize: bool = False,
    range: Optional[Tuple[int, int]] = None,
    scale_each: bool = False,
    pad_value: int = 0,
    format: Optional[str, io.FileIO] = None,
) -> None:

which (subjectively) looks better and is more legible.


Note that I'm not advocating specifically for black, but rather for a code formatter or more general a unified code format.

@t-vi
Copy link
Contributor

t-vi commented Mar 31, 2020

I'm probably not the best person to ask because I never liked the pyi files for Python-implemented bits in the first place. That said, my understanding is that think it's fine to do them inline now.

Regarding the formatting: I do agree that cramming as much as possible in each line probably isn't the best way. You might see if PyTorch has a formatting preference in its formatting checks and what's the generally recommended Python way (I never know).

@fmassa
Copy link
Member

fmassa commented Mar 31, 2020

Thanks for the input @t-vi!

@pmeier good point about the code getting messy with type annotations. I don't have a good answer now, although indeed the output of black does seem more readable.

Let me add @cpuhrsch @vincentqb and @zhangguanheng66 for discussion as well

@vincentqb
Copy link
Contributor

vincentqb commented Mar 31, 2020

I second the use of black-style inline type hint formatting.

@zhangguanheng66
Copy link
Contributor

Yes. black makes sense to me as well.

@pmeier
Copy link
Collaborator Author

pmeier commented Apr 1, 2020

This would also make reviewing easier. If you have a look at this review comment is unclear at first glance which parameter is meant since GitHub only allows to review a complete line.

@fmassa
Copy link
Member

fmassa commented Apr 6, 2020

Although black is a nice formatter, I would be careful with re-formatting the whole codebase at once -- this effectively messes up with blame so that we don't know anymore what was added when and by who.

@pmeier
Copy link
Collaborator Author

pmeier commented Apr 6, 2020

The author talked very briefly about integrating black into an existing codebase. I've had no contact with blame so its hard for me to asses if hyper-blame could work for us. Could you have a look?

@fmassa
Copy link
Member

fmassa commented Apr 6, 2020

The problem is that most of the users (including Github UI) most probably only uses git blame, so asking it to be changed to hyper-blame is a bit hard

@pmeier
Copy link
Collaborator Author

pmeier commented Apr 7, 2020

As we will touching every signature while adding annotations we could simply format them with black and leave everything else as is. This way git blame should still work, but we would enhance legibility. If we want to adopt black for the complete codebase we could do so by one commit at a time later.

@fmassa
Copy link
Member

fmassa commented Apr 7, 2020

@pmeier I think we could adopt the function-signature style of black, but leave the rest of the code-base unchanged? If that's what you proposed, I'm ok with it!

@datumbox
Copy link
Contributor

If you disagree with these points I'm happy to discuss further, but at the beginning of your comment in #2025 (comment) it looked to me like you were simply re-stating what you had said earlier, without taking my replies into account.

I don't think this remark is fair. Please when it doubt, assume positive intent and seek clarifications.

Coming to the discussion, my original position was that mypy is the defacto static analysis tool used for validating typing and that we should use it as much as possible (except in cases where it makes mistakes where we turn it off). This is pretty much the way we use it right now in TorchVision. By taking into account your remarks along with those of @pmeier and @oke-aditya, it became clear that sometimes it's very hard to please it and this could lead to convoluted workarounds (see #4237). Given that perhaps a more reasonable approach is to try to adjust, configure and use mypy with caution, as @pmeier advocates.

@NicolasHug
Copy link
Member

Philip,

That we can set flags on a per-file basis does not change the fact that these annotations won't help users until mypy doesn't conflict with torchscript, thus (IMO) drastically reducing the benefits of these annotations.

Also, yes, there are technical explanations and technical solutions to all of our problems here. Thank you for taking the time and effort to clarify these. But just because we can logically explain something doesn't mean that this thing is easy, or natural, or expected for most contributors. The barrier to contribute is something that I care deeply about, and it's also something that our upper management cares about (it came up various times in internal meetings). We're all trying to foster external contributions in this very issue, so surely, we all care about it here. And indeed, contributors can ask for help, but from my own experience these mypy-related issues popup so frequently that this isn't just something we can ignore, or disregard, or just waive off as a minor hindrance.

Vasilis,

my apologies if my comment came out as unfair. Please rest assured that I am assuming positive intent and seeking clarifications. On my end, I believe I missed your position on a few points that I tried to make, which left me with the impression that my comments had been ignored. With that regard, in order for me to better understand where we all stand, would you mind clarifying your position on the following items:

Do you disagree with my POV that type annotations won't help users much until mypy doesn't conflict with torchscript?

From my understanding, you believe that annotations are still relevant info for us (the developers), and you prefer annotations to docstrings. Which leads me to these next questions:
Do you you always find annotations to be better than docstrings, even though docstrings contain more info / descriptions about the parameters? Would you mind providing an example where a docstring is not as clear or not as convenient as a type annotation? Also, do you find the annotation of the colors parameter clearer than its English description? #2025 (comment)

What is your position regarding mypy raising the barrier for contributors?

Thanks a lot for you patience and feedback.

@datumbox
Copy link
Contributor

Do you disagree with my POV that type annotations won't help users much until mypy doesn't conflict with torchscript?

I believe that anyone who reads the code, either they are a user or a developer of the library, can benefit from seeing good typing information. Provided that the info is not wrong or misleading, they improve code readability and make the code more explicit. You did convince me though that in cases where mypy requires lots of workarounds, we should omit them.

you prefer annotations to docstrings

I'm not sure why docstrings are even considered an option as they are meant for documentation not for defining types (perhaps when Python did not support typing it was a workaround but not anymore). Python's effort of introducing typing is still in its infancy and the static analysis tools like mypy are not mature. Still I believe it's a step towards the right direction to make the language more precise and potentially faster. For me the typing annotations is the right tool to encode typing and using them will future-proof our code (covering the code-base will take time and it's not an all-or-nothing thing, will unlock potential speed benefits as the language evolves, will improve readability etc).

Also, do you find the annotation of the colors parameter clearer than its English description?

We can use type aliases if we want to reduce complexity.
But I think this example hints that we got a very overloaded API and an implementation that does a lot of things.

What is your position regarding mypy raising the barrier for contributors?

I don't think it's going to be a problem. We are a friendly bunch and we can help getting this fixed during the code reviews. Moreover contributors can still submit code without annotations and we can take care of that either on a follow up (good trade-off to reduce the back-and-forth and frustration on new contributors). I would even say that adding typing info is a good "bootcamp" task for someone who wants to learn the code-base and get involved.

@pmeier
Copy link
Collaborator Author

pmeier commented Aug 17, 2021

@NicolasHug

That we can set flags on a per-file basis does not change the fact that these annotations won't help users until mypy doesn't conflict with torchscript, thus (IMO) drastically reducing the benefits of these annotations.

True, not debating that. Same for the legibility debate in the docstrings. There are definitively cases where the annotation is less legible than a short description. Unfortunately, this will usually be the case for heavily overloaded keyword arguments in convenience functions. These functions are probably used by a lot of users so good documentation is key here. We need to find a solution for that. An easy solution would be to let the annotations and the docstring diverge. We are not enforcing equality here.

My point is that in the examples you have shown the nuisance stems from the fact that either mypy is misconfigured or used in a dynamic context which it is not built for. In these cases it is up to us to configure it properly, refactor the code to remove the problematic parts, or simply silence it. mypy is usually (yes, sometimes it is actually a mypy issue) not to blame. In doing so, we are missing the point.

But just because we can logically explain something doesn't mean that this thing is easy, or natural, or expected for most contributors. The barrier to contribute is something that I care deeply about, and it's also something that our upper management cares about (it came up various times in internal meetings).

Note that you are switching your argument here from "we should hold adding type annotations until mypy and torchscript have converged" to "we should not add type annotations". Barrier of entry will only slightly change in the future and all the "weirdness" you gave examples for above will still be there.

And indeed, contributors can ask for help, but from my own experience these mypy-related issues popup so frequently that this isn't just something we can ignore, or disregard, or just waive off as a minor hindrance.

I don't think asking for help is an issue. This is true not only for the contributors, but also for us maintainers. If you see some "weird" mypy behavior feel free to reach out to me. There is no need for you or anyone else to battle this if they don't want to get into the guts of it.

@NicolasHug
Copy link
Member

NicolasHug commented Aug 17, 2021

Thank you both for taking the time to reply. I will address a few last points below that I believe are important to clarify, but I think it's probably best to move on. I've made my case and I understand that I didn't convince you yet, so I probably won't be able to convince you with more discussion. On my side, I still fail to foresee the benefits of this current typing effort, but I'll be patient and hopefully the benefits will be appear clearer in the future, should you decide to move forward with it.

Thanks again for your input :)


I'm not sure why docstrings are even considered an option as they are meant for documentation not for defining types (perhaps when Python did not support typing it was a workaround but not anymore)

It seems that we're referring to different things here. I will definitely agree that docstrings aren't a substitute for annotations, and annotations aren't a substitute for docstrings either. They have different purposes (documentation vs type-checking) and a different audience (humans vs computers). My original point here, which was a reply to an argument of yours #2025 (comment), is that when it comes to code readability, annotations add no value over a docstring. Again: the point of an annotation is not to specify types for human readers, they specify types for the type-checker; types should be documented for humans as well, but this is the docstrings' job.

My point is that in the examples you have shown the nuisance stems from the fact that either mypy is misconfigured or used in a dynamic context which it is not built for. In these cases it is up to us to configure it properly, refactor the code to remove the problematic parts, or simply silence it. mypy is usually (yes, sometimes it is actually a mypy issue) not to blame. In doing so, we are missing the point.

I generally agree with your opinion, but I would like to add a bit of nuance. The first one is that the word "simply" in "simply silence it" looks out of place for me :). Getting to understand and silence mypy was never simple or obvious to me. The second is that, while I agree that we should ideally take the time to properly refactor the code when mypy flags some code smell, the reality is that this is usually not the mode that we operate in. For better or worse, we tend to get sh!t done and silence / merge fast, rather than fully address the underlying issues.

Note that you are switching your argument here from "we should hold adding type annotations until mypy and torchscript have converged" to "we should not add type annotations". Barrier of entry will only slightly change in the future and all the "weirdness" you gave examples for above will still be there.

To clarify: you're right that the barrier of entry will hardly change in the future. However, once mypy and torchscript are compatible, we'll be able to fully type-check torchvision. I can understand that fully type-checking torchvision is a strong enough reason to eventually raise the contribution bar. But until we can properly type check, I believe that annotations have a very limited value, and so I believe that it's not enough of a reason at the moment for raising the bar.

@pmeier
Copy link
Collaborator Author

pmeier commented Aug 17, 2021

The first one is that the word "simply" in "simply silence it" looks out of place for me :). Getting to understand and silence mypy was never simple or obvious to me.

Maybe we are finally getting to the source of your aversion. Let's take your example from above:

def return_4() -> int:
    l = [2, 4, 6]
    assert 4 in l   # basically something that you *know* is True for at least one element in l

    def cond(x):
        return x == 4

    for x in l:
        if cond(x):
            return x

Running mypy on this gives you:

main.py:1: error: Missing return statement  [return]
    def return_4() -> int:
    ^
Found 1 error in 1 file (checked 1 source file)

If you just want to silence mypy here, you only need two pieces of information from this:

  1. Where did this happen? main.py:1
  2. What is the error code? [return]

If you now place a # type: ignore[return] on line 1 in the file main.py, mypy is silenced.

def return_4() -> int:  # type: ignore[return]
    ...
Success: no issues found in 1 source file

IMHO, this is as simple as locally silencing other linters such as flake8.

The second is that, while I agree that we should ideally take the time to properly refactor the code when mypy flags some code smell, the reality is that this is usually not the mode that we operate in. For better or worse, we tend to get sh!t done and silence / merge fast, rather than fully address the underlying issues.

I can fully get behind that. On anything that is not user facing, we can just slap a # type: ignore on it and fix it later.

But until we can properly type check, I believe that annotations have a very limited value, and so I believe that it's not enough of a reason at the moment for raising the bar.

Again, I can get behind that. @oke-aditya and @frgfm have put in quite some effort into the PRs so I think it is fair to properly review them. After that I wouldn't push further until torchscript finally gets support for fundamental stuff. Of course, if the development of torchscript is halted we need to revisit this and see if it is worth to still add the annotations.

@frgfm
Copy link
Contributor

frgfm commented Aug 23, 2021

Hey everyone 👋

Just making sure you guys don't trouble yourself too much for @oke-aditya & me:

  • for my part, I do use PyTorch & torchvision quite a lot and develop many things with it. When I'm implementing / tuning / modifying PyTorch stuff, it's extremely useful to have typing annotations (saving a lot of time to match the interface of the core deep learning framework, and providing the same level of information easily on my end). So I do see the advantages for developers, but they are admittedly decreased for plain users 🤷‍♂️
  • seeing that we still have many open issues on the vision side, I think bringing some help is the least I could do! that being said, this is open source and I fully support the common decision if we don't move forward with this. I'm just here to help :)
  • one last part that we shouldn't neglect: in the long term, having enforced mypy checks indeed raise the difficulty for new contributions. Either we'll need to direct new contributors to some material to understand typing & submit PRs that properly integrate it, or each review will need to consider typing carefully (and suggest modifications).

Again, you all have brought us an extremely useful framework and related resources, I'm far from being the only one willing to help 👌 Happy to discuss this further, we all just want to continue making the PyTorch ecosystem as user/developer-friendly and useful as possible 😀

@mauvilsa
Copy link

mauvilsa commented Oct 14, 2022

Hello. I would like to add type annotations to file torchvision/transforms/transforms.py. The reason is not just for static type checking. The goal is to make torchvision.transforms.Compose work with pytorch-lightning's LightningCLI which makes classes automatically configurable by inspecting signatures at run time.

I am asking here because I saw the torch.jit types limitation, #2025 (comment). From a fast look at the code, the type for the transforms parameter of the Compose class seems to be List[Union[torch.nn.Module, Callable]]. Maybe I don't have the correct type yet, but not important. The actual question is, would List and Callable be a problem for torch.jit?

@pmeier
Copy link
Collaborator Author

pmeier commented Oct 14, 2022

The actual question is, would List and Callable be a problem for torch.jit?

List no, Callable yes. Unfortunately, torchscript only supports a very limited subset of types.

We are currently in the process of revamping the transforms. With this, we are likely to drop JIT scriptability for the transform classes (although there are options to retain it #6711). Thus, there is no reason to keep wrong or unnecessary strict annotations on these classes. See #5626 for a discussion.

@frgfm
Copy link
Contributor

frgfm commented Oct 15, 2022

On this topic, should we add / update the TODO list in the issue description? Or open another perhaps? I can easily this GH issue sticking around forever otherwise :)

@pmeier
Copy link
Collaborator Author

pmeier commented Oct 17, 2022

@frgfm Yes, it would be a good idea to get a better summary of what needs to be done and what is already finished. Core team is swamped at the moment so we won't be making progress on that soon. If you are willing, could you post a comment like I did in #2025 (comment) (probably finer grained) and link open PRs. I freely admit that I have no idea what is still open and blocked by something (I vaguely remember there was a PR from you where the JIT simply didn't respond on core).

After that we can make a decision how to move forward. For datasets and transforms it is probably harder since they are still in prototype mode. We already have #5626 and #6668 that deal with annotations there so you might have a look at them first.

@frgfm
Copy link
Contributor

frgfm commented Oct 26, 2022

Sure! Here is where we are at:

I'll update this message with future evolutions

@oke-aditya
Copy link
Contributor

A bunch of PRs actually that kind of be useful
#4630
#4612
#4599
#4323

@scm-aiml
Copy link

Hello all, I recently stumbled across this thread as I have been spending the past few days developing stubs for torchvision as PyTorch already had py.typed incorporated to support the code base. I spent a good deal of time reviewing this thread, the CONTRIB readme, and it appears that there is still value in moving forward with contributing to the type hinting while keeping in mind a few key points:

  1. Keep them smaller PR.s (per file ~)
  2. there is still compatibility issues with torchscript

I was going through the dev setup for Mac, but given some recent issues in #8421, I have a dev container for linux up and running.

One small question though: I notice quite a few times mypy throwing errors over import type of Any (which can be suppressed) but seems to be an issue with not have a proper annotation of -> None: on __init__ methods of class definitions. I have seen some examples of code here that do not have that, but many of the updates from @frgfm use the correct annotation as pointed out in pep-484. It's little but I just wanted to clarify before moving forward.

@NicolasHug
Copy link
Member

Hi @scm-aiml ,

Thanks for working on this. We're not planning on adding any new type annotation at this point. Those that currently exist in torchvision are often wrong because they are incompatible with torchscript, so type-checking torchvision code has little value: it'll error for irrelevant reasons.

Since here's no plan on adding more annotations, I'll close this issue to avoid misleading contributors. Sorry about that @scm-aiml

@NicolasHug
Copy link
Member

one last thing: if there's a community-supported stub package for torchvision that gets regular updates, I'd be more than happy to point users to it (without BC guarantees from torchvision's part though)

@scm-aiml
Copy link

scm-aiml commented Aug 19, 2024

Hi @NicolasHug,

I didn't intend to disregard your comments in the thread. My confusion just comes from seeing your comment in August about 3 years ago and having strong feelings opposing typing torchvision, but then there continued to be a large series of contribution(s) that were approved and brought in from PRs.

I guess my primary question would be are PRs not supported/received which make these contributions or are they just not going to be labeled for any priority as a "feature request"?

@NicolasHug
Copy link
Member

No worries @scm-aiml

A lot has passed since the old discussion was started. I was never convinced that type annotations made much sense for torchvision, and I haven't changed my mind on that. The discussion above was never truly resolved in the sense that active maintainers at the time just... disagreed to disagree. So, with inertia, the status quo continued: PRs were submitted and those who were in favor of annotating torchvision reviewed and merged this PR.
These days, I'm the only maintainer left in charge of torchvision. I have much less bandwidth to review stuff than when there were more of us,, and as a result I can only allocate time to high-priority issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests