Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the way imports are done to be more correct for static type checkers #47027

Open
gramster opened this issue Oct 28, 2020 · 31 comments
Open
Assignees
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: typing Related to mypy type annotations triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@gramster
Copy link

gramster commented Oct 28, 2020

🐛 Bug

There are a number of imports in torch of submodules that are implicitly reexported. E.g. you can do (and it is commonly done in examples):

import torch

torch.nn.<something>

In __init__.py, the submodules are imported as (using the nn example):

import torch.nn

But to be properly re-exported for static type checkers, this is not correct. Instead, something like:

import torch.nn as nn

is needed.

cc @ezyang @malfet @rgommers @xuzhao9 @gramster

@albanD albanD added enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: typing Related to mypy type annotations triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Oct 29, 2020
@jakebailey
Copy link
Contributor

jakebailey commented Oct 29, 2020

For context on this, there's a typing-sig thread here about settling on some common behaviors in py.typed packages: https://mail.python.org/archives/list/typing-sig@python.org/thread/YLJPWECBNPD2K4TRIBRIPISNUZJCRREY/

Which we distilled into a doc here: https://github.com/microsoft/pyright/blob/master/docs/typed-libraries.md

It might be the case that the form should be from torch import nn as nn to fit the PEP 484 pattern.

@ezyang
Copy link
Contributor

ezyang commented Oct 30, 2020

Thanks for the report, we should definitely fix these.

@gramster
Copy link
Author

We'd like this fix to get in for Pylance to work better with Pytorch in Visual Studio Code; should we do a PR?

@rgommers
Copy link
Collaborator

Thanks for asking @gramster, a PR would be great.

@nurpax
Copy link

nurpax commented Dec 18, 2020

@gramster, did you by any chance make a PR for this in pytorch?

@jakebailey Is the conclusion that the correct way to import would be from torch import nn as nn or is import torch.nn as nn ok too with pyright? It feels somewhat of a difficult guideline to follow given that Python developers will most likely encounter this only if they're using a specific static type checking tool that's strict about public imports. For example, it seems like mypy is more relaxed with these imports -- so it's easy to introduce this problem if you don't test with pyright. The harm in being super strict about imports means that a single wrong import line in a library means it's hard/impossible to use pylance/pyright with the problematic library. (EDIT: reading through https://mail.python.org/archives/list/typing-sig@python.org/thread/YLJPWECBNPD2K4TRIBRIPISNUZJCRREY/ it looks like import torch.nn as nn is the way to go.)

If no one's done a PR for this yet and there's a clear obviously standard way to do the imports, I could take a crack at it.

@gramster
Copy link
Author

gramster commented Dec 18, 2020 via email

@jakebailey
Copy link
Contributor

Doing a quick pass over the pyright code for this, I'd think the form would have to be from torch import nn as nn. I don't think the clarified PEP says anything about this case, and I don't recall it being mentioned in that thread. Maybe @erictraut can give an answer more quickly than me.

I know mypy has implemented this new strictness for stubs for a while, but I don't know if they've started to enforce this for py.typed modules yet. I also don't know if they attempt to emulate this side effect behavior or not. (Things start to get messy if you imagine importing a nested module like torch.foo.bar.baz as baz and asking both what you export and what you expect each module to contain.)

I wouldn't use __all__ here. You'd need to list everything out (and I mean everything), and it will have a runtime effect (unless you do want to enforce what is copied during a * import). There's a specific subset of operations that are supported to help build the list, but I don't think it scales too well.

(I'm on break for the next two weeks; I'm not sure if I'll be able to take a look or submit something for it at the moment.)

@erictraut
Copy link

The rules pyright uses for re-exported imports in a py.typed library can be found here. In summary: you need to use either the redundant form of import (from torch import nn as nn) or use __all__.

@MHDante
Copy link

MHDante commented Jan 17, 2021

Is it enough to update the imports on torch's __init__.py? would we have to recursively apply the same treatment to the __init__.py files in submodules?

If not, I've submitted #50665 for review.

@erictraut
Copy link

erictraut commented Jan 17, 2021

Thanks for the change @MHDante.

The answer is that it depends on which imported symbols you intend to re-export from which submodules. That's for the library authors & maintainers to decide. This mechanism gives you the ability to express that intent to the tooling.

This same change would apply to any imported symbol that you intend to re-export from any submodule. For example, if submodule "torch.a.b.c" imports symbol "foo", from another module and you want to allow consumers to import foo directly from torch.a.b.c (i.e. from torch.a.b.c import foo), then you'd need to apply a similar change to torch.a.b.c.__index__.py. If your intent is not to re-export "foo" in this case, then no change is needed because the tooling assumes no re-export by default.

Hope that makes sense.

@MHDante
Copy link

MHDante commented Jan 17, 2021

Right. I assume it would require a thorough revision by the torch team regarding the intentionality of exporting specific modules.

In the short-term, I will update PR #50665 to expose all submodules (and intermediate submodules) that are documented in API section of the documentation here: https://pytorch.org/docs/stable/index.html , As it is clear that the authors want those to be accessible by consumers. Hopefully with that change, at least the api as is listed will work with pylance/pyright in torch v1.7.(2/3)

@MHDante
Copy link

MHDante commented Jan 18, 2021

Hmm. In the process of writing these imports, I came across a few oddities.

  1. I assume calling from foo import bar as bar causes bar.__init__.py to be evaluated.
  2. Some init.py files have significant overhead/side-effects to them during runtime.

As a result, I would hesitate to modify the code to import additional submodules in order to avoid additional overhead or regressions. This can lead to some inconsistencies in the API:

for example, currently these two statements work (in python, not pyright) :

torch.backends.cuda.is_built()
torch.backends.cudnn.is_built()

however, only one of the two is imported.

My key assumption here is that torch.backends.cudnn.__init__.py is evaluated at the time that this
line is run: torch.backends.cudnn.is_built(). Whereas torch.backends.cuda.__init__.py was evaluated when import torch was run.

if my key assumption is correct, I would have 2 questions.

  1. For @erictraut (or anyone more familiar with the import system): Is my key assumption correct? Or am I misunderstanding the way that __init_.py works?
  2. For @erictraut : Is there a way in pylance/pyright/mypy to expose submodules without importing them?
  3. For the torch team: Would you be ok importing the rest of the submodules (listed in the docs as accessible from torch) during torch/__init__.py ?

So far, I've updated #50665 to the best of my ability without changing any of the loading orders. If there is no other way, I would like to move forward with those changes.

@jakebailey
Copy link
Contributor

jakebailey commented Jan 18, 2021

Imports are never done at an expression like that (the call succeeding in your example implies the module has already been executed). If the module is there at runtime but doesn't appear to be imported in the code, then it's getting imported as a side effect of another import eventually importing that module and it getting added to backends, i.e. some other file writing import torch.backends.cudann and then every other module who has access to backends now has that symbol in their own backends because they're all the same object.

You'd have to decide what you really want the API to be (since maybe one day, code gets rewriten and someone who was relying on this gets a nice surprise).

@jakebailey
Copy link
Contributor

jakebailey commented Jan 18, 2021

FWIW back a good year or so ago, I described this in microsoft/pyright#439 (as the old MPLS sort of handled this sort of thing to handle torch, which was the biggest reliant on this behavior; this was a behavior difference between the analyses).

@jakebailey
Copy link
Contributor

Is there a way in pylance/pyright/mypy to expose submodules without importing them?

I'm a bit confused by this question; if the module isn't actually imported or accessible, then I wouldn't expect a type checker or editor to accept or suggest it to me, no? Otherwise the code may crash at runtime because the editor or type checker told me I could use it but I couldn't.

@erictraut
Copy link

As Jake said, torch.backends.cudnn.is_built() will not implicitly execute cudnn/__init__.py. Since this doesn't crash at runtime, it must be relying on the side effect of some other import statement — a dangerous and fragile assumption.

Digging into this further, I think I understand the side effect that makes this work. The main module is importing torch.jit which imports torch.jit._async which imports torch.jit._builtins which imports torch.backends.cudnn. So the code is relying on this very tenuous and fragile assumption. Small refactoring changes could easily break this assumption and result in runtime errors. I'd consider this a bad bug if this were in my code base even though it happens to work at the moment.

The good news is that this is easy to fix, and it won't impact runtime performance. You can simply add an explicit import of torch.backends.cudnn in the module that depends on it. Once imported, the results will be cached, and any subsequent attempt to import it will effectively be "free".

@ezyang
Copy link
Contributor

ezyang commented Jan 19, 2021

Yeah, all of this is a really good reason why we should fix all of this.

My general thinking is that when doing this work, we should work to avoid breaking other people's code. So if there is code that is invisibly causing more modules to come into scope, we should simply pave the cowpaths, rather than try to fix it at the same time. We can always, alter, try to reduce the number of imports, but in general that has to be done more deliberately. Of course, fine to add comments to this effect.

@jakebailey
Copy link
Contributor

FWIW I have been working on this in an unpushed branch on my fork (based on some of #50665); there are some serious evaluation order issues at play here. Each time I fix one thing, another breaks, as things are very circular and are depending on the current ordering.

I'm considering sending a PR that at least fixes the top-level API of the torch module, e.g. from . import nn as nn and so on, without any semantic changes, and potentially other "safe" ones. Would such a PR be accepted as an incremental improvement? It would at least help out editors to know that nn is intended to be part of the API.

@rgommers
Copy link
Collaborator

That sounds fine @jakebailey, probably even preferred to break it up rather than a single PR that touched almost every module.

@jakebailey
Copy link
Contributor

Sure, I can send a low-risk small one to get the easy ones out of the way.

Out of curiosity, is it too far gone to attempt to some of this into your next release? I don't quite know what satisfies the requirement for a cherry-pick, and unfortunately I didn't have time to work on this before what appears to be the 1.8.0 branching 😞

@rgommers
Copy link
Collaborator

Out of curiosity, is it too far gone to attempt to some of this into your next release? I don't quite know what satisfies the requirement for a cherry-pick, and unfortunately I didn't have time to work on this before what appears to be the 1.8.0 branching

That's for the release team to decide, but I think the chance is quite low. Only critical changes are backported normally I believe.

@gchanan
Copy link
Contributor

gchanan commented Feb 17, 2021

One thing I'm trying to understand is when you are talking about "type checkers" if you are talking about python-level guidance or pyright-level guidance.

E.g. the import rules https://github.com/microsoft/pyright/blob/master/docs/typed-libraries.md#library-interface list PEP-561 but that doesn't make any claims about how to treat "redundant" import statements like "from A import X as X".

Am I correct that this guidance is pyright specific?

@jakebailey
Copy link
Contributor

It's not pyright specific, no, though some type checkers may be more lenient about things (I don't know to what extent mypy is run here on internal modules, or if it even understands some of the nuances of these side effects and what is "intended" to be visible).

The "redundant form" is something that's listed in PEP 484's section on stubbing, and the discussion on the typing-sig (which led to that doc) had general agreement that py.typed libraries with inlined types should behave in this way as well.

#47027 (comment) has a few links to the other sources, I just re-linked the clearer guidance doc here for reference.

@erictraut
Copy link

erictraut commented Feb 17, 2021

We haven't formalized these rules in the form of a PEP, but we have discussed them with the maintainers of other Python type checkers (mypy, pyre, pytype) in the typing-sig, as Jake mentioned. We've incorporated their feedback and have general agreement on this guidance.

The treatment of redundant imports is consistent with PEP 484. Admittedly, PEP 484 was talking specifically about type stubs, not ".py" files that contain inlined types, but this is a natural extension of PEP 484 guidance. Since it's solving the same problem (namely, differentiating re-exports that are intended versus those that are not), it makes sense to use the same consistent mechanism.

@erictraut
Copy link

The problem is that there's no good way for a type checker to intuit the difference between an imported symbol that is meant to be re-exported and an imported symbol that is not. Only the author of the module knows what was intended, and there needs to be some agreed-upon way to distinguish between these two cases. Such a mechanism was introduced in PEP 484 for stubs. Since it's the same problem faced by "py.typed" (PEP 571) packages that include inlined types, it makes sense to adopt the same approach rather than inventing a new one.

I'm sympathetic to the argument that it makes the code more a little more verbose, but I'd argue that humans are probably not typically reading these long lists of import statements anyway.

@jakebailey
Copy link
Contributor

jakebailey commented Feb 19, 2021

FWIW before all of the stubs were deleted and merged in as inline types, this pattern was done in them to document what is re-exported:

from ._tensor_str import set_printoptions as set_printoptions
from .functional import *
from .serialization import save as save, load as load
from .autograd import no_grad as no_grad, enable_grad as enable_grad, \
set_grad_enabled as set_grad_enabled
from ._ops import ops
from ._classes import classes
from . import autograd as autograd
from . import cuda as cuda
from . import optim as optim
from . import nn as nn
from . import multiprocessing as multiprocessing
from . import sparse as sparse
from . import onnx as onnx
from . import jit as jit
from . import hub as hub
from . import random as random
from . import distributions as distributions
from . import testing as testing
from . import quantization as quantization
from . import __config__ as __config__
from . import __future__ as __future__

From my experience, I've found re-exports like this to be pretty rare across projects. I think most libraries would have expected users to have to write import torch.nn themselves if they really wanted to use it (and we had to add hacks into MPLS specifically because torch users were expecting the opposite). I think that instances of this are going to be rare within pytorch, and __init__.py is just a hotspot.

@t-vi
Copy link
Collaborator

t-vi commented Feb 19, 2021

I'm sympathetic to the argument that it makes the code more a little more verbose, but I'd argue that humans are probably not typically reading these long lists of import statements anyway.

Heya, thank you for your answer. Clearly, the Zen of Python isn't what it used to be when we put secret handshakes into our code for the benefit of machines that read it. And I would argue that this is a stark contrast between pyi that never were aimed at people much and so the hacks for the sake of the machines had much less impact than in the py files. IMHO a # type: export would signal much clearer what is going on, even if it probably would not be considered a work of art on its own.

facebook-github-bot pushed a commit that referenced this issue Mar 2, 2021
Summary:
For #47027.

Some progress has been made in #50665, but in my testing trying to unwrap the circular dependencies is turning into a neverending quest.

This PR explicitly exports things in the top-level torch module without any semantic effect, in accordance with this py.typed library guidance: https://github.com/microsoft/pyright/blob/master/docs/typed-libraries.md#library-interface

It may be possible to do some of the other fixes just using `__all__` where needed, but `__all__` has a semantic effect I would like to further review. This PR at least fixes simple completions like `torch.nn` in Pylance/pyright.

Pull Request resolved: #52339

Reviewed By: smessmer

Differential Revision: D26694909

Pulled By: malfet

fbshipit-source-id: 99f2c6d0bf972afd4036df988e3acae857dde3e1
malfet pushed a commit to malfet/pytorch that referenced this issue Mar 10, 2021
…#52339)

Summary:
For pytorch#47027.

Some progress has been made in pytorch#50665, but in my testing trying to unwrap the circular dependencies is turning into a neverending quest.

This PR explicitly exports things in the top-level torch module without any semantic effect, in accordance with this py.typed library guidance: https://github.com/microsoft/pyright/blob/master/docs/typed-libraries.md#library-interface

It may be possible to do some of the other fixes just using `__all__` where needed, but `__all__` has a semantic effect I would like to further review. This PR at least fixes simple completions like `torch.nn` in Pylance/pyright.

Pull Request resolved: pytorch#52339

Reviewed By: smessmer

Differential Revision: D26694909

Pulled By: malfet

fbshipit-source-id: 99f2c6d0bf972afd4036df988e3acae857dde3e1
seemethere pushed a commit that referenced this issue Mar 10, 2021
…53675)

Summary:
For #47027.

Some progress has been made in #50665, but in my testing trying to unwrap the circular dependencies is turning into a neverending quest.

This PR explicitly exports things in the top-level torch module without any semantic effect, in accordance with this py.typed library guidance: https://github.com/microsoft/pyright/blob/master/docs/typed-libraries.md#library-interface

It may be possible to do some of the other fixes just using `__all__` where needed, but `__all__` has a semantic effect I would like to further review. This PR at least fixes simple completions like `torch.nn` in Pylance/pyright.

Pull Request resolved: #52339

Reviewed By: smessmer

Differential Revision: D26694909

Pulled By: malfet

fbshipit-source-id: 99f2c6d0bf972afd4036df988e3acae857dde3e1

Co-authored-by: Jake Bailey <5341706+jakebailey@users.noreply.github.com>
aocsa pushed a commit to Quansight/pytorch that referenced this issue Mar 15, 2021
…#52339)

Summary:
For pytorch#47027.

Some progress has been made in pytorch#50665, but in my testing trying to unwrap the circular dependencies is turning into a neverending quest.

This PR explicitly exports things in the top-level torch module without any semantic effect, in accordance with this py.typed library guidance: https://github.com/microsoft/pyright/blob/master/docs/typed-libraries.md#library-interface

It may be possible to do some of the other fixes just using `__all__` where needed, but `__all__` has a semantic effect I would like to further review. This PR at least fixes simple completions like `torch.nn` in Pylance/pyright.

Pull Request resolved: pytorch#52339

Reviewed By: smessmer

Differential Revision: D26694909

Pulled By: malfet

fbshipit-source-id: 99f2c6d0bf972afd4036df988e3acae857dde3e1
xsacha pushed a commit to xsacha/pytorch that referenced this issue Mar 31, 2021
…#52339)

Summary:
For pytorch#47027.

Some progress has been made in pytorch#50665, but in my testing trying to unwrap the circular dependencies is turning into a neverending quest.

This PR explicitly exports things in the top-level torch module without any semantic effect, in accordance with this py.typed library guidance: https://github.com/microsoft/pyright/blob/master/docs/typed-libraries.md#library-interface

It may be possible to do some of the other fixes just using `__all__` where needed, but `__all__` has a semantic effect I would like to further review. This PR at least fixes simple completions like `torch.nn` in Pylance/pyright.

Pull Request resolved: pytorch#52339

Reviewed By: smessmer

Differential Revision: D26694909

Pulled By: malfet

fbshipit-source-id: 99f2c6d0bf972afd4036df988e3acae857dde3e1
@pmeier
Copy link
Collaborator

pmeier commented Jul 1, 2021

@gramster Is this still relevant for torch>=1.8.1 (especially #53675) and the current pyright==1.1.154? I don't see any failures for

import torch

model = torch.nn.Conv2d(3, 3, 1)

Do you have another example where pyright failed?

@pmeier pmeier self-assigned this Jul 1, 2021
@jakebailey
Copy link
Contributor

nn works because I sent a PR to do the easy ones, but there are a load of modules that I couldn't do because they were really difficult to fix. If you check the __init__.py you'll see the easy ones changed but some of the more complicated ones left as-is.

@pmeier
Copy link
Collaborator

pmeier commented Jul 1, 2021

@jakebailey Could you give me a simple example to get this started?

@jakebailey
Copy link
Contributor

import torch.backends.cuda

Is an example; the init imports this to effectively make backends "export" cuda so you can do torch.backends.cuda. This should really be in backends as "from . import cuda as cuda".

This same thing applied to all of the other stuff in this area is the goal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: typing Related to mypy type annotations triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests