Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use strict optional checking in req_install.py #11379

Merged
merged 9 commits into from
Aug 4, 2023

Conversation

hauntsaninja
Copy link
Contributor

@hauntsaninja hauntsaninja commented Aug 14, 2022

Suggested by pradyunsg in #11374

The new assertions look sound to me in that they would currently raise TypeError if self.req was ever None. It's pretty straightforward to check this yourself, the only case I found tricky is the assertion in install where you have to go a couple layers deep to find where it would fail if self.name was None. link.hash_name is minorly tricky too, but there are plenty of places in Hashes where things would fail if it was ever None.

Suggested by pradyunsg in pypa#11374

Since half of the API of this class depends on self.req not being None,
it seems like we should just prevent users from passing None here.
However, I wasn't able to make that change.

Rather than sprinkle asserts everywhere, I added "checked" properties.
I find this less ad hoc and easier to adapt if e.g. we're able to make
self.req never None in the future.

There are now some code paths where we have asserts that we didn't
before. I relied on other type hints in pip's code base to be accurate.
If that is not the case and we actually relied on some function being
able to accept None when not typed as such, we may hit these asserts.
But hopefully tests would catch such a thing.
def checked_req(self) -> Requirement:
assert self.req is not None
return self.req

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m sorry but I don’t really like this approach. If we can’t (via type checks) guarantee that the input is never None, then that means we expect the assert to trigger. And we shouldn’t ever be adding asserts that we expect to get triggered - IMO asserts are just for cases that “cannot happen”.

If the problem is that the requirement can be in two states, “under construction” where it’s not safe to assume these values are set up, and “ready”, where everything is definitely not None, then we should work out how to express that in the type system - and if we can’t, raise the issue with the typing people.

Copy link
Contributor Author

@hauntsaninja hauntsaninja Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, these are all asserts that "cannot happen". I've only added these asserts on code paths that mypy indicates you'd get e.g. a TypeError or AttributeError if something was actually None. The current PR should preserve all non-exceptional semantics. (For completeness, in the PR description I mention that mypy could be wrong if existing annotations in pip are wrong, but I validated several of them and found no issues. I'm quite confident in this change)

I can't quite tell exactly what you're objecting to... If it's just the use of property I'd be happy to inline these asserts wherever they get used. Note that this sort of defensive programming / use of self-documenting preconditions is already commonly used in this file, e.g. see the use of assert self.source_dir on all the methods that touch it.

If the problem is that the requirement can be in two states,

It is possible to express this kind of thing in the type system, but a) it would require much larger changes than this one, b) knowing what I know of your feelings about typing, I suspect you wouldn't like it either ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I'm here because @pradyunsg indicated such a PR would be welcome and because I do think this is an improvement over unsound type checking.

I'd be happy to inline asserts instead of using properties and address uranusjr's comments. But if your hesitation about the approach goes beyond that, feel free to close this PR :-)

Copy link
Member

@pradyunsg pradyunsg Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually say that tackling a different file than this one would be a better place to start. But, too late for that. :)

That said, inline asserts are definitely a better thing to do here instead of these "checked_foo" properties.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AAARGH!!!! I had an almost-complete extended explanation of my position here, and github (or my browser) threw it away when I went to check a detail of the PR. Apologies if this is a bit disjointed, I'm trying to remember the points I made before...

It is possible to express this kind of thing in the type system, but a) it would require much larger changes than this one, b) knowing what I know of your feelings about typing, I suspect you wouldn't like it either ;-)

lol, thanks for your understanding :-) After looking more closely, I see a bit better what you're trying to do here. I am trying to look at this from the perspective that type checks are a good thing, apologies if I don't manage to completely hide my reservations1.

First of all, I'm honestly rather appalled by mypy's current behaviour here. We tell the type checker that req is an Optional[Requirement], and yet it still lets us try to access properties of req without ensuring it's not None? That's basically just not doing its job. I'm sorry, I don't want to hear any "practicality" arguments here, either check the damn types or don't claim to be a type checker...

So yes, I guess that means I do support making this change. My concern is that we do it in a way that doesn't simply ignore the issue that type checking has found. And my worry is that asserts do precisely that - they simply add a runtime check that replicates the type check, and yet doesn't give any more user friendly result than the exception you get if you try to access an attribute of None. If we're willing to dump a traceback on the user anyway, why not just use cast to ignore the type2?

More realistically, I do think we should consider refactoring InstallRequirement so that we can somehow say "until this happens, req is optional, and we can't use methods that expect req to be set". As things currently stand, I don't even know when that point would be, so that would definitely be a benefit. But that involves a potentially major refactor of one of the ugliest bits of pip, so I'm certainly not suggesting it's something we should tackle lightly.

So where does that leave us? Apparently, mypy knew about a problem in our code, so that's good - sort of. It seems like we have a few options:

  1. Defer fixing the issue until we can fix it "properly". This could take some time, and in the meantime we have to allow mypy to continue ignoring the issue.
  2. Try to find a way to check and report the problem at runtime, in a way that gives the user proper feedback on what went wrong (that would likely need to be done on a case by case basis).
  3. Patch over the problem with a runtime assert, to satisfy mypy but without improving the user experience if something goes wrong.
  4. Add complicated type machinery to express what's happening properly. I'm not against this, despite your suspicions, but I am worried that the more we use complex type features, the more we add to the maintenance burden as few of the maintainers will have the skills to work with that code.
  5. Do something adhoc like getattr(self.req, "name", "<unknown requirement>"). Is that better than an error? Who knows?
  6. Do nothing. Accept that we can't type this properly, but we don't want to spend time fixing the underlying problem.

At the moment, I'm not sure this problem is important enough to need anything more than (6). But if it does, I think we should at least consider a better solution than just adding asserts (which let us fix the types, but otherwise gain us nothing).

@pypa/pip-committers what do you think? Am I being too strict3 here?

I'd be happy to inline asserts instead of using properties and address uranusjr's comments. But if your hesitation about the approach goes beyond that, feel free to close this PR :-)

If we go with asserts, I agree that a property is a good way of handling it. And I certainly don't intend to just close this PR - your work is appreciated even if I have questions about the approach. Ultimately, all I wanted to do here was to raise the question of whether we should do something to tackle the underlying issue, and to make the point that if we're going to implement type checking, we should take type errors seriously - and at least consider other options before asserting.

Footnotes

  1. I'm slowly coming around to understanding the benefits of type checking, but I'm not all the way there yet...

  2. Don't take that too seriously, I know cast isn't meant to be used like that...

  3. Pun intended 🙂

Copy link
Member

@pradyunsg pradyunsg Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patch over the problem with a runtime assert, to satisfy mypy but without improving the user experience if something goes wrong.

IMO, this is the cleanest thing to do here to make things explicit at the point of use. The underlying problem here is of bad design of this object -- something that we've discussed at length in #6607. :)

Having these asserts would make it easier to identify the underlying assumptions and make it easier to spot what bits we need to refactor, when we finally come around to #6607.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're saying that we take the type error as flagging an implied contract that we've not been enforcing, rather than an actual error? OK, I can see that. And if that's the way we're looking at it, I agree that asserting at the point of use is more appropriate than having a checked_* property.

There's still something that makes me uncomfortable about this approach to type errors. I'm finding it very hard to put my reservation into words, but essentially it feels like we're not actually fixing type errors, but rather just finding ways to shut the checks up. Maybe that's just an artifact of the process of adding type annotations to an established codebase. But if we establish a "the code is right, get the type checker to understand that" mindset, we're not getting the benefit from type annotations.

🤷 At this point, I've probably spent more time (of mine and everyone else's) on this than it could ever justify. So I'll leave it there.

Copy link
Contributor Author

@hauntsaninja hauntsaninja Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree that replacing static type checks with runtime type checks alone doesn't get anyone very far. Static analysis works best when it confirms your invariants for you, which is not what happens with assert.

In fact, my first attempt at this PR was to disallow InstallRequirement from taking req=None, but I lack familiarity with the codebase and it ended up becoming a daunting change.

I do think there is still some incremental benefit to option 3 over your option 1:

  1. It lets us get rid of # mypy: strict-optional=False, which I agree is appalling. Doing so allows the type checker to point out potential problems when modifying or refactoring, even if we paper over existing issues to some extent.
  2. It states our invariants more clearly. Right now one has to trace code to know whether a method can be called when self.req is None.
  3. The user experience is marginally better with an assert, as opposed to e.g. "None has no attribute" on some line somewhere and having to figure out how far back in the stack that None comes from.

I'll go ahead and inline the checked* properties. I agree that an inline assert more clearly documents implied contracts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience while I thought this through in public 🙂

Comment on lines 243 to 250
return self.req.specifier
return self.checked_req.specifier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more appropriate fix would be

if self.req is None:
    return SpecifierSet()
return self.req.specifier

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather that we have assert self.req here, making the existing contract that req needs to be non-None explicit.

Comment on lines 293 to 301
if link and link.hash:
if link and link.hash and link.hash_name:
good_hashes.setdefault(link.hash_name, []).append(link.hash)
Copy link
Member

@uranusjr uranusjr Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder if this is actually an error in the annotation and good_hashes should really be typed as Dict[Optional[str], List[str]] instead.

Copy link
Member

@pradyunsg pradyunsg Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense to have None be the hash algorithm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be assert link.hash_name is not None or equivalent here, instead of changing the if statement here.

@pradyunsg
Copy link
Member

Thanks for picking this up @hauntsaninja!

Rather than sprinkle asserts everywhere, I added "checked" properties.

Personally, I don't think checked properties are a good way to do this. I'd much prefer to have asserts sprinkled in various places for this. That'll do a better job of making it explicit what the contract is, and can help guide refactors as well. FWIW, I'll say that you picked the most difficult one first. ;)

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Aug 15, 2022

Sure thing! Sounds good and can do. I'll wait to hear back from Paul to make sure he's onboard with adding asserts, just so I don't end up wasting my time with this!

@pradyunsg
Copy link
Member

First of all, I'm honestly rather appalled by mypy's current behaviour here. We tell the type checker that req is an Optional[Requirement], and yet it still lets us try to access properties of req without ensuring it's not None? That's basically just not doing its job.

That's because we'd have disabled the check that it does via the strict-optional=False comment that this PR removes! :)

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Aug 15, 2022

Okay, I've gotten rid of the checked properties. The one exception is that the pre-existing self.specifier is functionally doing the same things the checked properties were (it's used in a couple other files, and that use depends on it not returning None, so I didn't get rid of it).

@pfmoore
Copy link
Member

pfmoore commented Aug 16, 2022

That's because we'd have disabled the check that it does via the strict-optional=False comment that this PR removes! :)

Hmm, I’d got the impression this was the default for mypy, and we had enabled the check explicitly but then re-disabled it on a file by file basis. Regardless, I think it’s bizarre to be able to disable this check. Why just this one? Maybe people want to disable checks that things are str, so they can support str-like classes too…

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Aug 16, 2022

Since you seem curious for the history here / why the appalling behaviour existed. It did use to be the default for mypy a long time ago (like >5 years).

  • A lot of mypy decisions revolve around the idea of allowing users to gradually introduce types to a codebase and notch up strictness over time, rather than having to make huge sweeping changes all at once. mypy's goal was to be usable in large existing codebases by users who were unused to and resistant to type checking. And a lot of its success there is attributable to reducing the upfront pain of adoption and the pragmatic spirit of "starting with some checking is better than no checking".

  • mypy trailblazed a lot of the way the type system should work. Things that are clear today probably weren't so clear when static typing was niche. The appalling behaviour is basically how null works in many typed languages today (and I'm sure some of the initial audience for mypy was users from such languages). I think there was concern that Optional[int] was too verbose and users would revolt. Before PEP 526, the idiomatic way of typing instance variables invited confusion with None. And in the earlier days of mypy, I think there was also some concern that mypy didn't yet understand enough idioms that users use to ensure things are not None.

  • But yes, it is appalling, not the default behaviour in recent history, and generally unsupported by other type checkers

Here's the pip commit where the relevant mypy config was changed: #6704

@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Jun 19, 2023
@hauntsaninja
Copy link
Contributor Author

Bump! :-)

Copy link
Contributor Author

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

src/pip/_internal/req/req_install.py Show resolved Hide resolved
src/pip/_internal/req/req_install.py Show resolved Hide resolved
@uranusjr
Copy link
Member

Looks like there’s some edge case you’re not handling correctly. I think self.req can be None in some cases you assume it can’t. That’s causing the tests to fail.

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Jun 20, 2023

Fixed the bug, I'd incorrectly shortened an if condition. Wasn't related to case where self.req is None

I went through and audited all the assertions again. They all look sound to me in that all of the relevant ones would currently raise TypeError if self.req was ever None. It's actually pretty straightforward to check this yourself, the only case I found tricky is the assertion in install where you have to go a couple layers deep to find where it would fail if self.name was None

@@ -457,6 +459,7 @@ def is_wheel_from_cache(self) -> bool:
# Things valid for sdists
@property
def unpacked_source_directory(self) -> str:
assert self.source_dir, f"No source dir for {self}"
Copy link
Contributor Author

@hauntsaninja hauntsaninja Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is one place where we could raise an AssertionError where the existing code would not fail: if self.source_dir was an empty string. But assert self.source_dir appears many other times in this file, so I think is okay?

@hauntsaninja
Copy link
Contributor Author

Okay great. test_pip_wheel_ext_module_with_tmpdir_inside is the only test failing in CI currently, and it looks like that is failing on several other PRs

@hauntsaninja
Copy link
Contributor Author

Merged master and the spurious test failure has gone away :-)

@pradyunsg pradyunsg merged commit a19ade7 into pypa:main Aug 4, 2023
22 checks passed
@pradyunsg
Copy link
Member

Thanks @hauntsaninja!

@hauntsaninja hauntsaninja deleted the strict-optional-install branch August 5, 2023 00:28
cosmicexplorer pushed a commit to cosmicexplorer/pip that referenced this pull request Aug 9, 2023
* Use strict optional checking in req_install.py

Suggested by pradyunsg in pypa#11374

Since half of the API of this class depends on self.req not being None,
it seems like we should just prevent users from passing None here.
However, I wasn't able to make that change.

Rather than sprinkle asserts everywhere, I added "checked" properties.
I find this less ad hoc and easier to adapt if e.g. we're able to make
self.req never None in the future.

There are now some code paths where we have asserts that we didn't
before. I relied on other type hints in pip's code base to be accurate.
If that is not the case and we actually relied on some function being
able to accept None when not typed as such, we may hit these asserts.
But hopefully tests would catch such a thing.

* news

* black

* inline asserts

* code review

* fix up merge issue

* fix specifier bug
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants