Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve is_namespace() check for modules where __spec__ is None #1802

Merged
merged 1 commit into from
Sep 23, 2022

Conversation

skshetry
Copy link
Contributor

@skshetry skshetry commented Sep 21, 2022

Steps

  • For new features or bug fixes, add a ChangeLog entry describing what your PR does.
  • Write a good description on what the PR does.

Description

See https://stackoverflow.com/a/42962529.

Let's take the following contents as an example:

import celery.result

From #1777, astroid started to use processed_components for namespace check. In the above case, the modname is celery.result, it first checks for celery and then celery.result. Before that PR, it'd always check for celery.result.

But if you have imported it as celery.result,
sys.modules["celery"].__spec__ is going to be None, and hence the function will return True, but below where we try to load get submodule_path/__path__ for celery.result will fail as it is not a package. Also celery is not a namespace package.

celery is recreating module to make it lazily load, which does not have __spec__ set. Reading through Python's docs, it seems that __spec__ can be set to None, so it seems like it's not a thing that we can depend upon for namespace checks.


The celery.result gets imported for me when pylint-pytest plugin tries to load fixtures, but this could happen anytime if any plugin imports packages. In that case, importlib.util._find_spec_from_path("celery") will raise ValueError since it's already in sys.modules and does not have a spec.

Type of Changes

Type
🐛 Bug fix
✨ New feature
🔨 Refactoring
📜 Docs

Related Issue

Fixes pylint-dev/pylint#7488.

@DanielNoord
Copy link
Collaborator

Thanks for that through investigation! I don't have the time for a full review right now, but this would definitely need a test in order not to regress in the future. It also needs a small entry in the changelog of the next patch release.

@coveralls
Copy link

coveralls commented Sep 21, 2022

Pull Request Test Coverage Report for Build 3099164269

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.0007%) to 92.425%

Totals Coverage Status
Change from base Build 3087318088: 0.0007%
Covered Lines: 9797
Relevant Lines: 10600

💛 - Coveralls

See https://stackoverflow.com/a/42962529.

Let's take the following contents as an example:
```python
import celery.result

```

From pylint-dev#1777, astroid started to use `processed_components` for
namespace check. In the above case, the `modname` is `celery.result`,
it first checks for `celery` and then `celery.result`.
Before that PR, it'd always check for `celery.result`.

`celery` is recreating module to make it lazily load.
See
https://github.com/celery/celery/blob/34533ab44d2a6492004bc3df44dc04ad5c6611e7/celery/__init__.py#L150.

This module does not have `__spec__` set.

Reading through Python's docs, it seems that `__spec__` can be set
to None, so it seems like it's not a thing that we can depend upon
for namespace checks.

See https://docs.python.org/3/reference/import.html#spec__.

---

The `celery.result` gets imported for me when pylint-pytest plugin tries
to load fixtures, but this could happen anytime if any plugin imports
packages. In that case, `importlib.util._find_spec_from_path("celery")` will raise ValueError
since it's already in `sys.modules` and does not have a spec.

Fixes pylint-dev/pylint#7488.
@skshetry
Copy link
Contributor Author

skshetry commented Sep 21, 2022

this would definitely need a test in order not to regress in the future. It also needs a small entry in the changelog of the next patch release.

Thanks, I have added a changelog and a test. I was wrong before, the celery module does not have __spec__ set due to which this fails. See celery/__init__.py#L150 where they recreate module to make it lazily load, which does not have __spec__ set. I added a test for a similar scenario.

@DanielNoord
Copy link
Collaborator

We probably should support this but we should probably open an issue against celery for this. I think it's a bit up to semantics but the documentation for __spec__ does seem to imply that it should be set where possible.

@skshetry
Copy link
Contributor Author

skshetry commented Sep 22, 2022

I was/am a bit hesitant to create a PR due to unclear semantics (especially since it's doing a lot of questionable stuff with modules). My understanding is __spec__ can be None, so I believe it's pylint/astroid responsibility to handle here. Anyway, I can try proposing a PR and see.

@DanielNoord
Copy link
Collaborator

The PEP is really unclear:
https://peps.python.org/pep-0451/#module-objects

Other than adding __spec__, none of the import-related module attributes will be changed or deprecated, though some of them could be; any such deprecation can wait until Python 4.

Similarly the docs state, https://docs.python.org/3/reference/import.html#spec__:

The __spec__ attribute must be set to the module spec that was used when importing the module.`

The must and add seem to imply some sort of requirement but I think this is somewhat relaxed due to backward compatibility concerns.

I did a deep dive into typeshed and how it types __spec__.
First commit:
python/typeshed@05e02c1
Second commit:
python/typeshed@43c3406

The first commit is linked to python/typeshed#321, which is part of python/typeshed#189.

There is a crucial reply in that thread by Brett:

I'm going to fix this issue step-by-step to make sure I'm doing it correctly, so I'm starting w/ importlib.__init__ and I will build from there.

I also noticed that the types module has the wrong details for modules, so I will fix that as part of this issue as well (__package__, __loader__, and __spec__ as initially set to None as of Python 3.4; __doc__ and __name__ have always existed).

My (perhaps somewhat biased) conclusion is that __spec__ is ModuleType | None because it was only introduced in 3.4. However, the language with which the documentation and PEP refer to it seem to imply that ideally it should be set. I completely agree that astroid should correctly deal with those cases where it isn't, but I think the evidence here makes a compelling case for celery to at least consider adding it as well. Where possible it seems best practice to make it available.

@skshetry
Copy link
Contributor Author

Great points. Thank you for the great research, and all the help. :)

I have opened celery/celery#7773.

@jacobtylerwalls jacobtylerwalls changed the title improve is_namespace check Improve is_namespace() check for modules where __spec__ is None Sep 22, 2022
Copy link
Member

@jacobtylerwalls jacobtylerwalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and investigation @skshetry, I appreciate having another pair of eyes on a very difficult bit of astroid's code.

I agree we need to make astroid more robust against unexpected inputs. Your change looks sensible (and for this branch you edited, all we are trying to do is make sure that we still pass the test cases for old-style namespace packages such as test_identify_old_namespace_package_protocol).

@jacobtylerwalls jacobtylerwalls added this to the 2.12.11 milestone Sep 22, 2022
return (
sys.modules[processed_components[0]].__spec__ is None
mod.__spec__ is None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we even need to check mod.__spec__ is None, but I don't fully understand all different styles of namespace packages. The test does pass even if we remove this.

@DanielNoord DanielNoord merged commit 8559936 into pylint-dev:main Sep 23, 2022
@skshetry skshetry deleted the fix-namespace-check branch September 23, 2022 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pylint crashes when importing submodule from a module of same name as the current file
5 participants