Tensorflow Type Stubs #7144

hmc-cs-mdrissi · 2022-02-06T14:19:17Z

I would be interested in contributing and working on typestubs for tensorflow. It does have a fairly massive api though and am unsure whether it would be reasonable to have them start at typeshed and then eventually go back to tensorflow one day when they decide to resolve this issue.

So first question is would working on them here be acceptable? If yes, what would be recommended way to start them? Start with a partial stub and just work on getting directory structure reasonable with a lot of __getattr__(name: str) -> Any? I think other main thing would be common type aliases that will be used for most functions. I already have been working on stubs for tensorflow for work usage and could start by cutting up that into small prs.

The text was updated successfully, but these errors were encountered:

Akuli · 2022-02-06T15:29:18Z

have them start at typeshed and then eventually go back to tensorflow one day when they decide to resolve this issue.

This seems like a good idea to me. I think there are tools that automatically merge type annotations from .pyi files into .py files.

So first question is would working on them here be acceptable?

Yes, it is acceptable :)

If yes, what would be recommended way to start them?

We usually auto-generate the stubs (see CONTRIBUTING.md for instructions). They are initially very incomplete and full of Any and the like, but we then add types to the most commonly used parts, or whatever parts of the library the contributor happens to need.

I already have been working on stubs for tensorflow for work usage and could start by cutting up that into small prs.

Awesome :)

hmc-cs-mdrissi · 2022-02-07T11:49:10Z

Trying stubgen on tensorflow, you have to raise timeout as default timeout is too short given libraries size. With a high timeout stubgen crashes,

...
Critical error during semantic analysis: /Users/mdrissi/.virtual_envs/mypy/lib/python3.8/site-packages/tensorflow/python/ops/parallel_for/pfor.py:4657: error: invalid syntax

I think that file is valid python given importing it works fine. So looks like a mypy bug (mypy also crashes on that file). I'll open a bug report for mypy later.

After tweaking file a bit I was able to get stubs. I'd recommend not using generated stubs from past experience using generated tf stubs (in past pyright ones). The generated stubs look better today, but still have some key issues I encountered before. Tensorflow does a lot of import/method patching magic that even with Any used still produces many false positive errors.

Most tensorflow types are not defined where they are publicly exported. Instead they're defined in private/internal files. An example is tensorflow Layer class is defined in keras.engine.base_layer. keras.engine is not a public part of the api. Instead it's defined like,

@keras_export('keras.layers.Layer')
class Layer(tf.Module, version_utils.LayerVersionSelector):
  ...

and the decorator (keras_export or tf_export) dynamically exports the class to a different public location. Most tensorflow objects rely on a decorator to modify how imports work. If you generate stubs, the stubs will document the private api matching the source code, but will not describe the public api/documentation properly and most classes will be defined in internal location instead. The worst case is occasionally an object is re-exported to a different module that doesn't even exist in the source code and the module is only dynamically defined. An example is this class. At runtime it is available as tf.initializers.Zero, but generated stubs lack that definition since it's only there from decorator magic.

This also brings one question of should I even have stubs document internal api. I would prefer not to as there's no backwards compatibility expectations of importing undocumented paths. Instead I plan to have stubs match public documentation. Almost all of generated stubs are internal files. A secondary challenge with stub generation is tensorflow uses some dynamic method definition. Several basic methods are not defined in class normally, but monkey patched elsewhere including basic ones like __add__/__subtract__ on tensors. This produces a lot of false positives and I remember my team writing our code in a weirder way to make generated stubs happy until I decided to stop using them.

A couple design questions before I start. These cover main things I've experienced so far working on these stubs,

How are recursive types handled? Recursive types are very common in tf's api, but my understanding is mypy doesn't support them well, pyright is happy with them, and unsure about pytype/pyre. For stubs I already have, I've been using recursive type aliases and type ignoring any mypy specific errors. Should I use recursive types or avoid them (either with Any or only describe them to one level). TensorCollection = Tensor | Mapping[str, TensorCollection], Sequence[TensorCollection] is one example of a very common type needed for accurate type hints. My experience is mypy requires ignores when alias is defined/used, but code calling those functions are fine (guessing recursive types treated like Any). If recursive types are not allowed I'll probably use something like TensorCollection = Tensor | Mapping[str, Tensor] | Sequence[Tensor] | Mapping[str, Any] | Sequence[Any]. Most of the time 1 level of recursion is enough, and deeper levels can collapse to Any.

2-4 are about type hinting tensors. If unfamiliar with tensorflow, tensors in tensorflow are very similar to arrays in numpy and I expect numpy has same challenges here.

Generic typing is next biggest issue. Tensors are tf basic object and accurate annotations require a lot of variadic type hints for describing shapes. My current stubs avoid variadic types entirely. Pep 646 is great first step, but is insufficient to cover a lot of shape types (broadcasting is not possible to type hint and is very common). My current plan here is avoid typevartuple entirely until a few years pass, it's better supported by ecosystem, and more tensor type features are present.
Tensors also are generic on data types (int vs float vs string vs bool tensors vs ...). That is possible to type hint today, but will be pretty complex that I plan to skip this for initial stubs. I would welcome people writing accurate data type annotations, but I think the complexity they add is very high (type conversion rules will create ton of overloads for almost all functions) that for initial stubs I plan to ignore them. There are also some issues like Literals only work for certain types, but data type of a tensor is not a valid Literal argument. There are workarounds here (pretend tf.DType is an enum), but that will probably need some lie in types to handle data types properly. I lean towards not making Tensor generic for initial stubs and later follow up work can explore handling genericness properly.
Without shape types, it'll be necessary to make most functions accept a tensor. int is treated equivalent to rank 0 int tensor. similarly bool is usually equivalent to rank 0 bool tensor. rank 0 tensors are treated equivalent to their data type for nearly all tensorflow functions. This is non-ideal, but without rank/shape types I think necessary. Similary rank 1 tensor is usually accepted any time a list/tuple are accepted.
Tensorflow has two apis, v1 and v2. v1 is deprecated from about 2.5 years ago and lives tf.compat.v1 namespace. I plan to not work on v1 api as I'm trying to discourage using deprecated methods in my codebase, but would be open accepting contributions to v1 namespace. v1 namespace is part of public documentation and is stable.

There are a couple more design questions I've encountered, but smaller design issues I think prs can discuss them.

A couple people that may be interested in this or their advice would be helpful, @BvB93, @mdanatg, @acvander, @shuttle1987, and @pradeep90. I've included people I know that have worked on tensor types before even if not tensorflow (assuming numpy/pytorch challenges will overlap).

shuttle1987 · 2022-02-07T12:33:06Z

It has been quite some time since I looked at the type stubs situation with tensorflow but I clearly remember the issues with the imports being a major challenge. It is highly likely others have more up to date knowledge of this than me but I'll share what I did before in case it's helpful.

What I had considered doing for the project I was working on was to make a code generation script that went extracted the symbols created by the decorators like @keras_export such that there could be a better mapping of importable symbols to the Python code that generated them. The idea being that it might be easier to generate the stubs if there's a mapping between the code that actually implements the functionality and the paths that can be imported. I did something along those lines here https://github.com/persephone-tools/tensorflow-stubs/blob/master/tensorflow-stubs/__init__.pyi, you might notice the comments in the stubs file is an example of trying to maintain that information about where things were defined. Maybe a similar code generation approach could be used for the monkey patched methods issues too? Due to this abuse of dynamic code it certainly makes it harder to get the type stubs working correctly without introducing some tooling that can scan the source code to track it all down.
Regarding shapes and dimensions the approach used by numpy for its included types might be of interest: https://numpy.org/doc/stable/reference/typing.html

Akuli · 2022-02-07T13:10:30Z

Instead of generating new stubs, it would be better to move existing stubs to typeshed, if that's possible. And again, it doesn't matter if the stubs are outdated or incomplete at first :)

mdanatg · 2022-02-07T14:53:14Z

Internally, we generated some type stubs with pytype; I don't know if those are exported (they might be for Keras, though).

Making Tensor a generic class is blocked by the C++ implementation of EagerTensor. I think it'd unlock reasonable dtype-generic types, at least for the core APIs. Shapes are more complicated, in many instanced they'd quickly collapse to Shape[Any], I think.

BvB93 · 2022-02-07T16:43:19Z

How are recursive types handled? Recursive types are very common in tf's api, but my understanding is mypy doesn't support them well, pyright is happy with them, and unsure about pytype/pyre. ...

There are some limitiations, but mypy does currently support recursive protocols as outlined in python/mypy#731 (comment). I'd say that this approach has worked quite satisfactory in numpy thus far.
Something that you should be mindfull of here is that Mapping/Sequence classes that do not fully implement the interface ABC-precsribed interface by the can fail to type check here, is the usage of protocols means switching to a purelly structural typing-based approach (case in point: range.index, which lacks the start and stop kwargs).

Generic typing is next biggest issue. ...

Tensors also are generic on data types (int vs float vs string vs bool tensors vs ...).
...
I lean towards not making Tensor generic for initial stubs and later follow up work can explore handling genericness properly.

I would recommend this approach as well. While the likes of dtype (and to a lesser extent shape)-typing-support are highly valuable, I can tell from experience that it adds a notable layer of additional complexity (i.e. overloads). There is also the practical issue that many type-checkers currently do not deal well with combinations of Any-parametrized generics and overloads, be it either due to bugs or features.

Without shape types, it'll be necessary to make most functions accept a tensor.

It could be helpful here to define (and potentially expose) a common alias for array-like objects as accepted by tensorflow, similar to what was done with numpy.typing.ArrayLike. It won't resolve the lack of 0D/ND shape-support, but it will make it easier to deal with a very common pattern.

mdanatg · 2022-02-07T16:52:23Z

a common alias for array-like objects as accepted by tensorflow

There is https://www.tensorflow.org/api_docs/python/tf/types/experimental/TensorLike. It probably needs a few extra items.

hmc-cs-mdrissi · 2022-02-07T18:10:35Z

TensorLike there is an incomplete type alias. It does not specify what arguments of list/tuple. So that type is list[Any] and tuple[Any, ...].

Specifying types is also problematic from experience. list[X] is a very inconvenient type because of covariance/invariance rules. list[float | str | int] is incompatible with list[int]. This invariance makes list types a pain to work with. My current leaning here is the type should be Sequence[TensorLike]. Which is broader then the truth as not all Sequence[X] are valid in tensorflow, but list[X] is also a bad choice when tensorflow functions act covariantly on X, but there's no way to tell type system that.

Ideal solution would be adjust convert_to_tensor function to work with any sequence (convert non-list/tuple to list) so types match implementation. I currently plan to use Sequence[X] for stubs to avoid invariance problems.

The rest of that alias I have been using successfully.

Looking at persephone-tools/tensorflow-stubs, they are smaller then my current stubs and they also document tensorflow v1 api. tensorflow did a large backwards incompatible change to v2 so a fair number of those stubs are no longer true. I will read over them more closely to see if I can re-use parts.

On imports I lean manual is good enough for a first attempt. So far manually reading aliases in each docs page has been working for me. We can improve them later if needed. If we did generate imports I'd likely generate them from parsing public docs.

hmc-cs-mdrissi · 2022-02-20T08:17:10Z

I think this will is blocked by this issue, #5768. tensorflow has a heavy type dependency on numpy and looks like it's not yet possible to have stubs here depend on non-stub packages.

hauntsaninja · 2022-02-20T09:11:01Z

While I very much appreciate you working on #5952 , typing all of tensorflow is already a lot to chew on. If you ever end up feeling like you're going down a rabbit hole, just wanted to say that typeshed maintainers are usually happy to help with stubs that live outside of typeshed, so hacking on some stubs in another repo and moving them here later is a viable option :-)

hmc-cs-mdrissi · 2022-02-20T09:22:53Z

Main value to typeshed is I think visibility. I'm happy to make a public repo with my draft tensorflow stubs I'm just worried if I made them few people would ever make a pr to them to assist. Here I expect initial prs, I'll work on, but am hoping after getting basic types usable that others will also contribute. I'll probably continue to do occasional prs, but most of my motivation come from my full-time job working on a library built heavily on top of tensorflow. Small relative to tensorflow but still like O(10k) lines and most of our type issues are because of tensorflow. So I would consider it victory once my team's codebase passes --strict and be happy to leave less common parts of tensorflow to other contributors.

There's a secondary bonus of I can rely on the CI typeshed already has for checking and uploading new stubs vs copying that over.

So for now I'm thinking try and see what progress I can make on #5952. Probably a good idea to just upload my existing tf stubs somewhere public too in case anyone wants to use them as rough draft stubs.

pradeep90 · 2022-02-23T21:51:45Z

cc @mrahtz who had created some stubs for tensorflow here and might have useful advice for non-variadic stubs (since you said you don't want to use PEP 646 shape types just yet).

How are recursive types handled? Recursive types are very common in tf's api, but my understanding is mypy doesn't support them well, pyright is happy with them, and unsure about pytype/pyre.

Pyre supports recursive aliases.

alkatar21 · 2023-02-10T10:38:11Z

I see there is a start here now with #8974. I am unsure if this is a typeshed or a pylance problem, but with pylance v2023.2.20 the first tensorflow stubs are included from here and after that almost no autocompletion works anymore. I don't know if this is a typeshed or a pylance problem, so I wanted to ask here if someone can point me in the right direction, if an issue here or at pylance would be useful?

hmc-cs-mdrissi · 2023-02-10T12:53:38Z

The typeshed tensorflow stubs are very incomplete. Pylance by default infers types from library implementation if there are no stubs. If there are stubs it uses them. So existence of some stubs is causing pylance to not inspect rest of implementation.

The main trade off with inferring stubs from implementation is that it will be less accurate then actual stubs. For auto completion the errors don’t matter, but if you enable type errors with pyright then you will get false positives with tensorflow relying on inferred types.

I think this is more an issue for pylance and maybe merge inferred types from implementation with stubs available. Over time stubs should get more filled out here and that will help but I’d be very surprised if stubs had majority of common methods/classes in a month. If there are specific methods/classes you commonly use and want to write stubs for I’d be happy to review that pr here.

alkatar21 · 2023-02-10T13:18:13Z

I have understood that so far, I think. My question was aimed at the fact that pyright no longer looks at the implementation as soon as stubs are present. I know myself unfortunately too little with the details of typeshed so far. With the pandas stubs there was something like partial, where I think existing stubs were used and what was missing I think came from the actual code, but that may also have been a https://github.com/microsoft/python-type-stubs quirk.

I use pyright strict typing in my projects and have a lot pyright: ignore for tensorflow currently. I think it's very good that there is progress in this area. It just restricts because autocompletion and at least seeing the untyped parameters is an essential tool for me to work with. But then I will create an issue at pylance indicating that this is an issue during the transition and it hopefully will be resolved before there are issues at Tensorflow itself that they broke autocompletion again like (tensorflow/tensorflow#56231).

I would like to support, but unfortunately I am completely occupied in the near future. Currently I use mainly tf.keras... and tf.data....

hmc-cs-mdrissi · 2023-02-10T17:18:46Z

That's helpful. I'll prioritize tf.keras/data folders. For tf.keras I'll cover main common base classes first (like here for defining Layer/Initializer/etc) and it should be easier to fill in any specific subclasses you need. Stub wise often for specific layer (dense, batch norm, etc)/optimizer only thing needed is constructor signature.

Avasam · 2023-02-11T00:29:04Z

idk if we have a way to mark stubs as partial in typeshed (I mean add the partial keyword in the py.typed marker). Because some stubs, like tensorflow, should definitely be marked as such.
(If we currently don't, it could follow the ignore_missing_stub setting, which will only require a stubuploader update, but I digress.)

Also make sure that you have "python.analysis.useLibraryCodeForTypes": true in Pylance/VSCode settings as well as "useLibraryCodeForTypes": true in your pyright settings.

Finally, any stubs found on https://github.com/microsoft/python-type-stubs will give you different results between Pylance and pyright. As they are included in the former, but not the latter. (tensorflow isn't one of them, so that doesn't apply here)

hmc-cs-mdrissi · 2023-02-11T15:15:57Z

Stub packages currently do not have py.typed file as they don't require one. Checking with pylance issue it also looks like it wouldn't make a difference if it had partial. So while adjusting stub uploader to add partial py.typed sounds fine, I don't think it would help on autocompletion issue.

alkatar21 · 2023-02-11T15:22:19Z

If I understand PEP 561 correctly, typeshed is the last fallback and the PEP does not anticipate partial stubs on typeshed.

AlexWaygood · 2023-11-01T18:38:08Z

We now have tensorflow stubs in typeshed, thanks to @hmc-cs-mdrissi's diligent work!

I'm going to close this as completed for now. If anybody has any specific issues with our tensorflow stubs, please feel free to open a followup issue.

alkatar21 · 2023-11-21T15:30:05Z

I just tried the stubs with the types-tensorflow package with VSCode and pyright and don't quite understand why in the following example both layers and Dense are Incomplete?

As this is implemented:

typeshed/stubs/tensorflow/tensorflow/keras/layers.pyi

Line 116 in 5fd2e9b

class Dense(Layer[tf.Tensor, tf.Tensor]):

Avasam · 2023-11-21T15:47:24Z

If anybody has any specific issues with our tensorflow stubs, please feel free to open a followup issue.

I'd prefer moving the discussion to a new issue. That being said, I wonder if pyright is picking up def __getattr__(name: str) -> Incomplete: ... as overriding explicit re-exports
https://github.com/python/typeshed/blob/main/stubs/tensorflow/tensorflow/keras/__init__.pyi#L14
(Untested here, on my phone)

hmc-cs-mdrissi mentioned this issue Feb 20, 2022

Minimal tensorflow stub structure #7319

Closed

AlexWaygood added stubs: request OUTDATED! Request to add stubs for a new package to typeshed size-large labels Jun 12, 2022

alkatar21 mentioned this issue Feb 10, 2023

New partial typeshed tensorflow stubs break pylance for tensorflow microsoft/pylance-release#3937

Closed

This was referenced Feb 11, 2023

Autocomplete fails for any submodules tensorflow/tensorflow#59597

Closed

tensorflow.keras IDE auto-completion fails tensorflow/tensorflow#56231

Closed

AlexWaygood closed this as completed Nov 1, 2023

alkatar21 mentioned this issue Nov 21, 2023

Existing tensorflow stubs modules and classes can not infered by pyright/pylance? #11052

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow Type Stubs #7144

Tensorflow Type Stubs #7144

hmc-cs-mdrissi commented Feb 6, 2022

Akuli commented Feb 6, 2022

hmc-cs-mdrissi commented Feb 7, 2022 •

edited

Loading

shuttle1987 commented Feb 7, 2022

Akuli commented Feb 7, 2022

mdanatg commented Feb 7, 2022

BvB93 commented Feb 7, 2022

mdanatg commented Feb 7, 2022

hmc-cs-mdrissi commented Feb 7, 2022 •

edited

Loading

hmc-cs-mdrissi commented Feb 20, 2022

hauntsaninja commented Feb 20, 2022

hmc-cs-mdrissi commented Feb 20, 2022

pradeep90 commented Feb 23, 2022

alkatar21 commented Feb 10, 2023 •

edited

Loading

hmc-cs-mdrissi commented Feb 10, 2023

alkatar21 commented Feb 10, 2023

hmc-cs-mdrissi commented Feb 10, 2023

Avasam commented Feb 11, 2023 •

edited

Loading

hmc-cs-mdrissi commented Feb 11, 2023

alkatar21 commented Feb 11, 2023

AlexWaygood commented Nov 1, 2023

alkatar21 commented Nov 21, 2023 •

edited

Loading

Avasam commented Nov 21, 2023 •

edited

Loading

Tensorflow Type Stubs #7144

Tensorflow Type Stubs #7144

Comments

hmc-cs-mdrissi commented Feb 6, 2022

Akuli commented Feb 6, 2022

hmc-cs-mdrissi commented Feb 7, 2022 • edited Loading

shuttle1987 commented Feb 7, 2022

Akuli commented Feb 7, 2022

mdanatg commented Feb 7, 2022

BvB93 commented Feb 7, 2022

mdanatg commented Feb 7, 2022

hmc-cs-mdrissi commented Feb 7, 2022 • edited Loading

hmc-cs-mdrissi commented Feb 20, 2022

hauntsaninja commented Feb 20, 2022

hmc-cs-mdrissi commented Feb 20, 2022

pradeep90 commented Feb 23, 2022

alkatar21 commented Feb 10, 2023 • edited Loading

hmc-cs-mdrissi commented Feb 10, 2023

alkatar21 commented Feb 10, 2023

hmc-cs-mdrissi commented Feb 10, 2023

Avasam commented Feb 11, 2023 • edited Loading

hmc-cs-mdrissi commented Feb 11, 2023

alkatar21 commented Feb 11, 2023

AlexWaygood commented Nov 1, 2023

alkatar21 commented Nov 21, 2023 • edited Loading

Avasam commented Nov 21, 2023 • edited Loading

hmc-cs-mdrissi commented Feb 7, 2022 •

edited

Loading

hmc-cs-mdrissi commented Feb 7, 2022 •

edited

Loading

alkatar21 commented Feb 10, 2023 •

edited

Loading

Avasam commented Feb 11, 2023 •

edited

Loading

alkatar21 commented Nov 21, 2023 •

edited

Loading

Avasam commented Nov 21, 2023 •

edited

Loading