Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove site-packages from packages search tree #723

Merged

Conversation

tkrabel
Copy link
Contributor

@tkrabel tkrabel commented Nov 1, 2023

Description

Remove site-packages from the python package search tree to avoid duplicate entries in the autoimport database. This also speeds up generating the modules cache by a factor of ~2.

Fixes #722

Checklist (delete if not relevant):

  • I have added tests that prove my fix is effective or that my feature works
  • I have updated CHANGELOG.md
  • I have made corresponding changes to user documentation for new features
  • I have made corresponding changes to library documentation for API changes

@bagel897
Copy link
Contributor

bagel897 commented Nov 2, 2023

I should have written logic to handle this already, it may just be missing the regex rules

@bagel897
Copy link
Contributor

bagel897 commented Nov 2, 2023

Here's what I found: 2182bfe (utils.py)
I may have assumed all virtual environments began with .. This is convention, not guaranteed. Thanks for catching that.
You may want to change this logic instead - IE add a check that name is not site-packages in the existing checking logic. This way its more consistent with the other filter mechanisms
Did we merge the database versioning update? I would purge and re-create all dbs after this change regardless of the approach

@bagel897
Copy link
Contributor

bagel897 commented Nov 2, 2023

Honestly if I was to rewrite this again, I'd abstract the db, parsing and discovery (separate layers for package and module/file discovery) layers out of the main class - it's a bit of a monstrosity. They honestly can be made single threaded. There's also some poor logic in one of the cache generation functions I fixed in a PR I never merged.

@tkrabel
Copy link
Contributor Author

tkrabel commented Nov 2, 2023

There's also some poor logic in one of the cache generation functions I fixed in a PR I never merged.

So why not merge it then? :)

@bagel897
Copy link
Contributor

bagel897 commented Nov 2, 2023

There's also some poor logic in one of the cache generation functions I fixed in a PR I never merged.

So why not merge it then? :)

I never finished the work, it was part of a much broader change that broke API compatibility. I don't see the problem in the current rope code, but my idea was to use the same function to generate the cache and have both cache generation functions call it instead of implementing the logic twice.
Here's the PR: #485
But it's not necessary, just an idea.

Copy link
Member

@lieryan lieryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't tested this, but from bird's eye view, this looks reasonable. @bagel897 mentioned that she did a similar fix in an unmerged branch, so if this looks ok to her, it is ok to me as well.

@bagel897
Copy link
Contributor

bagel897 commented Nov 7, 2023

Does this handle cases where venv is in the project root? I am running into that on windows with the autodetect branch. Maybe this is a venv vs virtualenv thing? I think our implementation should be recursive (IE no Lib.site-packages.xyz modules) but I'm not sure how to implement that efficiently

@tkrabel-db
Copy link

Does this handle cases where venv is in the project root? I am running into that on windows with the autodetect branch. Maybe this is a venv vs virtualenv thing? I think our implementation should be recursive (IE no Lib.site-packages.xyz modules) but I'm not sure how to implement that efficiently

I think this PR doesn't change the way we treat virtual environments. The PR only doesn't treat site-packages like a python package, which makes sense to me.

That said: @lieryan I saw you approved the PR. Can you merge this?

@bagel897 if you have any concerns related virtual environments, can we please address this in a separate issue/PR?

@bagel897
Copy link
Contributor

bagel897 commented Nov 8, 2023

Autoimport scans for packages in the python path
image
In this example, we can see both ../.venv/lib64/python3.12/site-packages and also the cwd ''
If we detect site-packages, that means a directory called site-packages in the search path - which means a failure to detect venvs. We shouldn't be searching the venv directory from the project root. The current method is to exclude directories prefixed with a ., which is insufficient.

Here's another failure example:
https://github.com/python-rope/rope/actions/runs/6739504275/job/18321101941?pr=516
image
Since it searches the venv, it indexes its contents twice - under the Lib package. We need to exclude all instances of site-packages anywhere in the package or module path. Maybe the solution is to exclude it both at the package and module level?

@tkrabel-db
Copy link

@bagel897 @lieryan can we discuss this in a follow up issue and merge this PR?
IIUC, my Pr provides an incremental improvement, while not causing any regressions, so I'd not want to delay this improvement unnecessarily.

@lieryan
Copy link
Member

lieryan commented Nov 16, 2023

Hi @tkrabel-db, thanks for creating this PR. I'm currently travelling on vacation, so my availability to look into rope issues and review PRs will be limited until December.

So don't worry, but I hadn't forgotten about this 🙂, but I won't be able to action the PR or make a new release until December.

@tkrabel
Copy link
Contributor Author

tkrabel commented Nov 19, 2023

@bagel897 coming back to your comment:

We need to exclude all instances of site-packages anywhere in the package or module path

I tested my code creating a virtual environment using venv and I am fairly confident that my PR does exactly that. It may help looking at my problem statement again.

There, you can see we have duplications of the form (and only of the form)

Name(name=<import_name>, modname=<mod_name>, package=<package>, ...)
Name(name=<import_name>, modname="site-packages."<mod_name>, package='site-packages', ...)

Let's use an example to facilitate the conversation.

Name(name='BaseName', modname='jedi.api.classes', package='jedi', source=<Source.SITE_PACKAGE: 4>, name_type=<NameType.Class: 7>)
Name(name='BaseName', modname='site-packages.jedi.api.classes', package='site-packages', source=<Source.SITE_PACKAGE: 4>, name_type=<NameType.Class: 7>)

Here, you see that we attribute the BaseName object to jedi in the first entry - the correct match. In the second entry, however, we attribute the same object to a non-existing package we call site-packages.

The change I propose (with your help of pointing me to the correct util; thank you!) captures all cases in which we attribute any object/module to be of root (i.e. the "package") site-packages. This means it also captures the failure case you have eluded to earlier, namely ('from Lib.site-packages.pytoolconfig.sources import pytool', 'pytool').

Again, I tested my code and found that it also captures the case you presented.

If you think my PR still has some gaps, pls show them in a reproducible example based on this PR. And thank you for engaging in this PR! It's truly helpful having the original author around :)

@lieryan
Copy link
Member

lieryan commented Dec 16, 2023

Thanks for fixing this issue @tkrabel

@lieryan lieryan merged commit f11f1c1 into python-rope:master Dec 16, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AutoImport.generate_modules_cache can be speeded up by 2x
4 participants