Reviewing stubs for correctness is hard, and the existing stubs have many bugs (that are gradually being fixed). We could perhaps improve both of these by having better tests for stubs. Here's a concrete idea for that (inspired by a proposal I remember having seen at Gitter):
We could blacklist the modules that currently don't pass the test and gradually burn down the blacklist.
This wouldn't perfect and several things couldn't be checked automatically:
However, this could still be pretty valuable, and this shouldn't be too hard to implement. We could start with very rudimentary checks and gradually improve the testing tool as we encounter new bugs in the stubs that we could automatically guard against.
IIUC this is similar to comparing the stub to what stubgen (mypy's stub generator) can easily discover, right? So maybe that would be an implementation strategy?
In general I worry that this would still be incredibly imprecise (since stubgen doesn't discover types). I also worry that the amount of test code per module, just to specify exceptions/improvements, could easily be larger than the size of the stub for the module. Which would give it a poor scaling behavior.
We might be able to reuse parts of stubgen.
This would have some nice benefits over just using stubgen:
This would certainly be imprecise, but I believe that it would still be useful, similar how the existing typeshed tests are useful and prevent an interesting set of errors. We'd have to experiment to see whether the number of exceptions required would make this impractical.
pytype has an option to test a .pyi file against the .py implementation, too:
pytype --check file.py:file.pyi
It won't do everything on your list, but it will find:
Cool! I wonder if we could use that in the CI scripts?
Another idea would be to check for things present only in a Python 2 or 3 stub, but not both, and giving a warning if the same thing is present at runtime in both versions.
I'm working on using typeshed stubs in PyCharm and I must admit I have very little confidence in many stub files. I'm especially worried about their incompleteness since it will result in false warnings about missing methods.
As a part of the idea to test typeshed stubs better I propose adding Python files that use the API from stubs along with the stubs into the test_data directory. See PR #862.
The DefinitelyTyped repo for TypeScript uses this approach for testing their stubs.
It seems odd to hand-craft a new .py file for every .pyi file, given that the .pyi already models an existing .py file.
@matthiaskramm A .py file shows the usage of the API in terms known to the users of a library. It serves as a test that is red before the fix in .pyi that later becomes green. Just adding a new type hint doesn't mean that the false error seen by the user went away.
In addition, having common test data that requires an analyzer to actually resolve references to symbols and check types makes our type checkers more compatible and compliant to PEP 484.
When I see a typeshed stub, how can I be sure that it's correct to any extent? And since a stub overrides the whole contents of a .py file, I'm worried about adding any untested typeshed stubs to PyCharm. Luckily, we have many internal PyCharm tests for at least some stdlib modules. So we'll add typeshed stubs to PyCharm gradually as our tests cover more of them over time.
If someone changes things in typeshed in an incompatible way, we at PyCharm will notice any regressions at least. It would be better to check not only for regressions, but for incompatibilities between type checkers as well. This is one of the main reasons I'm proposing to make static tests for stubs a part of typeshed.
Most fixes to the stubs come from real examples of false errors. It's not enough to just fix a stub and forget about it. We have to run a type checker manually in order to make sure the problem is fixed. And still there may be incompatibilities between the results of type checker and the other ones. Since we already have this code example that contains the problem, why don't we add it to automated tests so there will be no regression in the future?
Static tests for type checkers could co-exist with checks by introspection. We don't have to pick just one option.
Meanwhile I'll be sending my PRs on top of the master branch without any tests.
I think that hand-crafted .py test files would be valuable, but I don't think that we should expect them to be complete or available for every module. They wouldn't replace other, more automatic tests, but they could be a useful addition in some cases.
Here are some examples where I think that they would be useful:
Mypy already has a small number of tests like this (https://github.com/python/mypy/blob/master/test-data/unit/pythoneval.test) and they've been occasionally useful.
The idea of making contributions easier to review is a good point. I've already mentioned other points above, I think they are all valid.
Here are some more ideas for automated checks that might be pretty easy to implement, at least for an interesting subset of packages:
Sent a PR that suggests both static and run-time checking (via pytest) #917.