Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore building third party stubs as packages #2491

Closed
srittau opened this issue Sep 27, 2018 · 115 comments
Closed

Explore building third party stubs as packages #2491

srittau opened this issue Sep 27, 2018 · 115 comments
Labels
project: policy Organization of the typeshed project

Comments

@srittau
Copy link
Collaborator

srittau commented Sep 27, 2018

This is an alternative to #2440 (disallowing third-party stubs). The idea is that typeshed remains/becomes a central repository for third-party stubs that are not bundled with the parent package, similar to DefinitelyTyped. In the future I expect type checkers will not want to bundle all third-party stubs for a variety of reasons, so third-party stubs would be distributed as separate PEP 561 stub-only packages, one per upstream package.

(I tried to integrate points raised there into this issue, especially those by @JukkaL in this comment.)

Advantages

  • Due to typeshed's tests, packages in typeshed will continue to work with the latests versions of mypy and pytype.
  • Basic level of consistency and standards due to review by typeshed maintainers.
  • Consistent naming scheme for third-party stubs packages, allowing users to just "pip install <guessed name>" and it will work when there are stubs.
  • Tooling (like tests) is easier to manage as it can remain part of the typeshed package.
  • Easier to contribute to stubs, since contributors don't need to learn the intricacies of multiple stubs projects.
  • No need to start a separate project just to distribute stubs for a new package.

Issues

  • Workload issues for typeshed maintainers.
  • Typeshed maintainers will often not be familiar with the package for which pull requests are opened.
  • Publishing stubs takes longer due to the necessary reviews.

Further Considerations

What should the generated packages be called? @ethanhs's PEP 561 actually requires stubs-only package to be named <package>-stubs. typeshed could squat these names and release them (and remove the stubs) on the request of upstream maintainers. Alternatively, typeshed could add a common prefix or suffix (ts, typeshed) or in addition to or instead of the -stubs suffix. This would be in violation of PEP 561, so we'd need to get broader consensus to amend the PEP. My personal favorite would be <package>-ts.

To guarantee a fairly quick turnaround on stubs, to minimize work for publishing stubs, and to prevent all third-party stub packages to be updated whenever a new typeshed version is released, stubs for a specific third-party package should be published automatically when it changes.

Possible Implementation

  • Add a generic setup-tp.py to typeshed that takes its package name from the directory it's in and uses the current date and time as version number.
  • Amend the CI process for master only so that after a successful test run, for every third-party package that was changed since the last successful run, the following is done automatically:
    1. Copy setup-tp.py into the third-party module directory as setup.py.
    2. Build the package in that directory.
    3. Upload to pypi.
@ethanhs
Copy link
Contributor

ethanhs commented Sep 27, 2018

I haven't had time to think over the full proposal yet but I did want to correct a misunderstanding and add some relevant information.

First a minor (but important!) nitpick: PEP 561 is about package names. The things you install off of PyPi are distributions, which have zero or more packages. The example I always give to explain this is that you pip install Django but import django. django is the package, Django is the distribution.

Therefore, it would be possible to install, say setuptools-stubs, along with any number of other stub packages from the typeshed distribution. So we don't really need to squat names.

Speaking of squatting names, there has already been some discussion of this topic in pypi/warehouse#4164. Donald Stufft is planning on namespacing packages in some form, so it is likely flask will end up with flask-stubs as a reserved distribution name anyway.

@JukkaL
Copy link
Contributor

JukkaL commented Sep 27, 2018

I like the proposal! Here are a few additional thoughts.

If we split (most) third party stubs to separate stub packages, type checkers could propose a relevant stub package to install if they can't find stubs for some package. This would be easy to do if third party stubs are managed centrally, but hard if they are spread out across numerous repositories.

I think that it would make sense to request permission proactively for the most popular packages on PyPI to be included on typeshed (I mentioned this in #2440). This would make it easier to contribute new stubs, as asking for permission is something many people are uncomfortable doing. I'm happy to try doing this if others think that this is a good idea.

I think that review workload is the biggest potential issue. I'm ready to volunteer to do more code reviews if we can pull some of the proposed changes off, in part because I expect they could well make code reviews easier. I have some ideas about making core review workload manageable below.

First, I think that it would be a big help if we had tests for third party stubs. At the very least, we should have a .py file that exercises the most important functionality in the stub. A test would pass if mypy (or some other tool) could type check the test file without errors. This can be refined later, but I think that this would be sufficient in the beginning.

Second, if we have support for tests, we can require that any new third party stubs have decent test coverage. Also, most fixes to a stub should include a test. Code reviews will be simpler if we can trust that tests catch any egregious errors at least. Also, the presence of tests makes it more likely that the PR author has done a reasonable job, thus hopefully requiring fewer review iterations. We can ask the creator of a new stub to actually verify that the test file can be run against the library (not just checked). It's probably impractical to execute tests in typeshed CI, though.

Third, since third party stubs wouldn't be installed by default, errors in those stubs would be less serious than now, as it would be easy to uninstall the stubs if there are problems, and it would be easy to revert back to an earlier stub version. Reporting bugs in stubs back to typeshed would perhaps also be simpler, as it would be easy to attribute blame to a certain typeshed stub package. Due to these reasons, code reviews could be less strict for third party packages (i.e. no need to double check against the implementation if something is unclear), excepting maybe the most popular packages. Here my rationale is that it will be much easier to encourage users to contribute small fixes to not-very-polished stubs than to create a new set of stubs from scratch. Automatically having the fixes available on PyPI immediately after the PR has been merged could be another big motivating factor.

Finally, maybe we should consider requiring the use of a code auto-formatter for stubs (at least new ones), if there is one that works for stubs. Or we could maybe introduce more aggressive linting for new stubs. These would improve the readability of contributed stubs.

@ilevkivskyi
Copy link
Member

There are few questions I wanted to clarify about the proposal:

  • What to do with stubs for large frameworks that require plugins to work correctly? (SQLAlchemy, Django, NumPy, etc.)
  • How are we going to manage the versioning of stub packages?
  • Will incomplete stubs be allowed? Will we require __getattr__ for such to avoid false positives, or just not allow them at all?

@JukkaL
Copy link
Contributor

JukkaL commented Sep 27, 2018

What to do with stubs for large frameworks that require plugins to work correctly? (SQLAlchemy, Django, NumPy, etc.)

I think that these could be decided on a case-by-case basis. If the stubs don't generate false positives without a plugin, maybe the stubs can be included in typeshed. We don't need to include all stubs in typeshed, but I think that it's a good place for the "long tail" of stubs where nobody may be motivated enough to set up a separate repository just for stubs.

How are we going to manage the versioning of stub packages?

I proposed that we could automatically generate a version number from the last modified date.

Will incomplete stubs be allowed? Will we require __getattr__ for such to avoid false positives, or just not allow them at all?

In my opinion __getattr__ should be acceptable for large or complex packages, but we should try to avoid it for small and medium sized stubs.

@JelleZijlstra
Copy link
Member

Finally, maybe we should consider requiring the use of a code auto-formatter for stubs (at least new ones), if there is one that works for stubs. Or we could maybe introduce more aggressive linting for new stubs. These would improve the readability of contributed stubs.

https://github.com/ambv/black supports auto-formatting stubs. (I added the support.)

@sproshev
Copy link
Contributor

sproshev commented Sep 27, 2018

I proposed that we could automatically generate a version number from the last modified date.

It means that version number would have no connection with runtime package version and non-descriptive as a result.
Let's imagine some environment requires numpy=1.14.1 to be installed, how a developer will figure out what stub package version to use?

@sproshev
Copy link
Contributor

We at PyCharm met this problem while developing stub packages advertiser. There is no way to determine what stub package version fits installed runtime package without installing them from the newest one to the eldest.

@JukkaL
Copy link
Contributor

JukkaL commented Sep 27, 2018

@sproshev Fair point. I'm not sure if there's anything we can do that would always work reliably.

We could perhaps include the package version in typeshed as metadata. The stub package version could be derived from that (and incremented by 1 for each update). For example, the stub package versions for numpy 1.14.1 could be something like 1.14.1, 1.14.1+1, 1.14.1+2, etc. However, in practice it would be quite likely that stubs wouldn't fully conform to any package version. Maybe the version would only be a best-effort hint, and would roughly correspond the latest package version that is known to have some support by the stub (i.e. works in a reasonable fashion, but is not necessarily complete).

I don't think that we need anything perfect here. I'd be happy with an approach that works well for simple cases (most modules probably have pretty simple interfaces) and supports gradual refinement for more complex cases. The key would be making it easy for users to contribute. Crowdsourcing seems the only feasible approach for stubs.

Other ideas for guidelines:

  1. We'd normally only actively maintain one set of stubs per package (the latest supported). Stubs for older versions would only be available on PyPI (and git history). Maybe we can make exceptions for certain very popular libraries if they break backward compatibility without changing the package name, but hopefully this is pretty rare.
  2. For existing stubs we should perhaps initially pick some arbitrary version (such as 0.0.1) until somebody validates which version the stubs best conform to.

@srittau srittau added the project: policy Organization of the typeshed project label Oct 1, 2018
@ethanhs
Copy link
Contributor

ethanhs commented Oct 4, 2018

Okay, so after thinking this over, I decided I really like this idea. I definitely think it would be a good idea to reach out to the typescript folks and hear about their experiences, since they likely have already hit problems we will in executing this plan.

As for versioning, this is actually covered in PEP 561. Essentially, the stub package declares which version(s) of the runtime package it supports in the install requires. Therefore we should be able to tell what package fulfill requirements based on the requires_dist metadata in a package.

At worst, you have to download (but not install!) several wheels. I think this gets a lot easier if we work with the warehouse folks to make sure the metadata we need is available through their JSON API (some if not all of it already is).

With regards to partial packages, I would say people can start with their own incomplete packages, and once they pass a minimum bar, they can be brought into typeshed.

As for plugins, I don't really see an issue with including them, because they are not likely to cause issues with other type checkers, and they don't take that much disk space. Alternatively, we could do some packaging magic and make a separate plugin package that gets installed if you ask for that extra (e.g. pip install pkg-stubs[plugin]). That wouldn't be too painful as we can automate a lot of it.

Lastly, and probably most importantly, reviewing the stubs. I am also happy to step up and do more review as well. I also think pulling something like the mypy test suite out of mypy and making it more generic is a good idea, as is requiring new packages/improvements to be tested.

Related to reviewing, I have been inspired by Marietta to hold "office hours". The idea is to help people with static typing and PEP 561 packaging for probably about an hour every week. I am hoping to start holding hours sometime this month. If there are others who want to do this too perhaps we could coordinate.

@srittau
Copy link
Collaborator Author

srittau commented Oct 21, 2018

For reference, better tests are discussed in #754.

@gvanrossum
Copy link
Member

+1

srittau added a commit to srittau/typeshed that referenced this issue Oct 23, 2018
@srittau
Copy link
Collaborator Author

srittau commented Oct 23, 2018

I have created a new branch third-party-dist for code related to this issue and submitted a first pull request (#2545) against that branch that adds a script to build wheels for individual third-party packages.

@srittau
Copy link
Collaborator Author

srittau commented Oct 26, 2018

I suggest we start adding METADATA files to third party stubs that for now only contain a Requires-Dist field per @ethanhs. This could then be merged into the wheel's METADATA file on build. This would mean we also need to convert simple modules into packages (itsdangerous.pyi to itsdangerous/__init__.pyi), but the build script needs to do that anyway.

Another question is how strict the version specifier is supposed to be. For example, if a package uses semantic versioning and the stub was written for version 1.4.1, what should we use? Possibilities are:

  • >= 1, < 2
  • >= 1.4, < 2
  • >= 1.4, < 1.5
  • == 1.4.1

I would suggest using the current minor level as lower bound, since using the stub with older versions will not catch some problems, such as using API additions from newer versions. On the other hand using the next major version as upper bound seems fine. This could give false positives when using API not yet supported in the stub, but you can use newer versions without using newer API and it can be encouragement to contribute to typeshed. So the second example above.

srittau added a commit that referenced this issue Oct 26, 2018
@srittau
Copy link
Collaborator Author

srittau commented Oct 26, 2018

An additional idea about the generated version number. Currently the build script uses 0.YYYYMMDD.HHMM. We could add a custom field Stubs-API-version or similar to METADATA, defaulting to 0.0. This will then get preprended to the date/time in the stub package version number. In our example above, this version would be 1.4 and a generated version could be 1.4.20181026.0815. This would allow users to add this to the their requirements.txt:

foo == 1.4.1
foo-ts >= 1.4, < 1.5

@bluetech
Copy link
Contributor

Some questions (mostly for @srittau's build-dist.py script):

  1. Does it allow dependencies between stub packages? There are cases where package A's stubs wants to refer to package B's stubs.

  2. IIUC the proposed name for stub packages is $package-ts. Leaving bikeshed aside, how would "squatting" be prevented? As far as I know, PyPI doesn't support package namespaces or reserving a range of names *-ts.

I also have a comment regarding the structure of the repository once every third party stub is a package. The current structure is third_party/{2,3,3.5,2and3}/$package and the script parses the path and writes Requires-Python and trove classifiers accordingly. In my opinion it would be better to do away with the 2,3,3.5,2and3 part and move that into the regular package metadata instead. This way the location for a package is consistently third_party/$package (and this can also be set in the Home-page field).

@srittau
Copy link
Collaborator Author

srittau commented Oct 27, 2018

One thing I'd like to do is "METADATA merging". I think it makes sense to have a METADATA.tmpl file that provides sensible defaults for stub packages. But this could then be amended with a per-package METADATA file that adds or overwrites the default fields. This METADATA file could then express dependencies.

I also agree that moving all packages to the third_party directory and removing the Python version directories would be good. But we can only do that after the type checkers have stopped distributing third-party stubs, otherwise we would break them.

Regarding the suffix and squatting: This is something that we have to check with the warehouse maintainers. They might have opinions and ideas about that.

@bluetech
Copy link
Contributor

METADATA merging

This sounds good to me. Although I think outright duplicating the file in every package should also be considered. That would be more direct and easy to understand, and since the packages are managed centrally in a single repository, every mass change can simply be applied at once across all of them. And it also allows for gradual (intentionally non atomic) changes if that's ever needed.

But we can only do that after the type checkers have stopped distributing third-party stubs, otherwise we would break them.

Are there currently any type checkers that install typeshed dynamically rather than vendor or pin it? If they all vendor it, they only need to adapt when they update typeshed, no?

@srittau
Copy link
Collaborator Author

srittau commented Oct 27, 2018

Not a mypy expert, but from my understanding at least mypy is vendoring typeshed using git submodule. Changing it now would break mypy's build process and would need quite a bit work to fix, for what is a temporary situation.

@srittau
Copy link
Collaborator Author

srittau commented Oct 27, 2018

I have now reached out to the pypa team in pypi/warehouse#4967.

@ethanhs
Copy link
Contributor

ethanhs commented Oct 27, 2018

@srittau Mypy uses typeshed as a submodule, but submodules are pinned to certain commits (which we manually update). The tests within typeshed would probably be quite broken since those use the master branch of typeshed instead of the submodule, but I expect changing the format of third_party would require corresponding lock-step changes in both pytype and mypy anyway.

As for package versioning, I think there are two fundamental features we want from a version:

  1. major version: This allows us to differentiate and support different major versions of big packages. (DefinitelyTyped has different folders for packages with two major versions)

  2. some incremental patch version: I think type stubs are likely to be some best effort approximation of the latest minor version (since I don't think anyone wants a different copy of the stubs for each minor version). Therefore, we can increment a patch number, so that if there is an error in the stubs that is serious or breaks something, people can role back to an earlier release.

Therefore I propose just MAJOR.patch, where MAJOR is not changed.

JukkaL added a commit to python/mypy that referenced this issue Jan 26, 2021
Support the new structure of typeshed (python/typeshed#2491) and only
bundle stdlib stubs with mypy. 

Most stubs for third-party packages now need to be installed using pip (for 
example, `pip install types-requests`). Add stubs for `typing_extensions` 
and `mypy_extensions` as mypy dependencies, since these are 
needed for basic operation.

Suggest a pip command to run if we encounter known missing stubs.

Add `--install-types` option that installs all missing stub packages. This
can be used as `mypy --install-types` to install missing stub packages from
the previous mypy run (no need to provide a full mypy command line).

This also replaces the typeshed git submodule with a partial copy of the
typeshed that only includes stdlib stubs. 

Add a script to sync stubs from typeshed (`misc/sync-typeshed.py`). 
This is still incomplete since typeshed hasn't actually migrated to the new 
structure yet.

Work towards #9971.
ilevkivskyi added a commit that referenced this issue Jan 27, 2021
See discussion in #2491

Co-authored-by: Ivan Levkivskyi <ilevkivskyi@dropbox.com>
@ilevkivskyi
Copy link
Member

OK, I merged the big PR with directory re-structure. I will continue working on typeshed -> PyPI auto-upload later this evening.

@ilevkivskyi
Copy link
Member

ilevkivskyi commented Jan 27, 2021

Oh, I forgot to update README.md and CONTRIBUTING.md they still reference some things from old directory structure (and typeshed being a submodule in mypy). I will make a follow-up PR later unless someone beats me to it.

rchen152 added a commit to google/pytype that referenced this issue Jan 27, 2021
See #821 and its linked issue for context.
The typeshed directory structure is changing significantly, so we need to
update pytype accordingly.
python/typeshed#2491 (comment) contains
a nice diagram of the new structure.

Note that I first developed this change on GitHub, then imported the PR. I'm
asking for a review on the import (rather than the PR) because the import
contains additional BUILD file changes (especially to third_party/py/toml - see
the diffbase).

PiperOrigin-RevId: 354138398
@ilevkivskyi
Copy link
Member

In the meantime, I have manually tested PyPI upload GitHub actions, they seem to work. After @JukkaL double-checks the code tomorrow morning, I will switch the cron diff upload action from --dry-run to normal. Then it should start auto-uploading wheels for modified third-party stubs every three hours. I will then make a trivial PR to check the integration end-to-end.

@ilevkivskyi
Copy link
Member

Thanks @rchen152 for updating the pytype test! Now we only need to fix stubtest and mypy_primer, cc @hauntsaninja.

@erictraut
Copy link
Contributor

Thanks @ilevkivskyi and @srittau for making this change. I just updated the import logic in pyright, and it now works with the new typeshed directory structure.

@ilevkivskyi
Copy link
Member

I think we are almost ready, sorry for a little delay, we were not able to enable the auto-upload today, we will enable it Monday morning. In the meantime, I think it is already OK to start merging PRs again.

@ymotongpoo
Copy link

First of all, thanks for working for the better type hinting. I look forward to all type information packages coming out to PyPI. Now I have a question on the upcoming updates on the type information metadata in all libraries in this repository.

How does the metadata file (METADATA.toml) will include the supporting runtime package versions? I saw a discussion around the versioning of stub packages but wasn't confirmed the concrete mention on the description on the metadata file.

If I missed the exact part that tells the answer for them, apologies in advance.

@ilevkivskyi
Copy link
Member

I finally switched the auto-upload task from --dry-run to normal typeshed-internal/stub_uploader@96cb137, so I think this issue may now be closed as implemented.

@ilevkivskyi
Copy link
Member

How does the metadata file (METADATA.toml) will include the supporting runtime package versions?

Currently you can include version = "x.y" in metadata which means that this stub package only valid for runtime package version x.y or newer.

@ymotongpoo
Copy link

@ilevkivskyi Thank you for the reply. Then now the stub packages will be automatically uploaded to PyPI based on that version information?

For example, the stub for paramiko is version "0.1", whose runtime package's current latest version is "2.7.2". In this example, given paramiko is following semver, 2 major version diff means huge interface breaking changes happened during the update, which means at least the stub version should be 2.0, preferably 2.7. And I see other similar cases.

Can we assume that now all stub packages are ready to accept the pull requests for such version number corrections?

@hauntsaninja
Copy link
Collaborator

Yes, we'd accept PRs that update METADATA.toml to the oldest version of the package that the stubs reflect support for (#4981)

@JukkaL
Copy link
Contributor

JukkaL commented Feb 3, 2021

When adding new stubs, the version should reflect the actual version of the package that stubs were created for. Since this will be hard to determine reliably for existing stubs, it's okay for the version to refer to some older version with which they are still compatible.

We probably also don't want the version to refer to a more recent version version of the package, or a version that is poorly supported. If a stub is created for version 2.3 of a library but also contains a few 2.4 features (but we know/suspect that some important 2.4 features are missing), I think it's fine to continue to use 2.3 as the version of the stubs.

@ymotongpoo
Copy link

@hauntsaninja @JukkaL Thank you for the reply.

I understand that we need to avoid the confusion that the version changes will bring and we can keep using versions in the existing METADATA.toml files. However, that applies only to the case the stub version is in the same major version as the package current, and not to the case I referred to. (i.e. 2 major version diffs)

Anyway, it's great that the discussion is already happening in #4981 and I'm happy to contribute to this part.

@ofek
Copy link
Sponsor

ofek commented Aug 18, 2024

FYI I just opened a PEP about package repository namespaces and I used typeshed as the first motivating example 🙂 https://discuss.python.org/t/pep-752-package-repository-namespaces/61227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project: policy Organization of the typeshed project
Projects
None yet
Development

No branches or pull requests