Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E0401 (import-error) checks perform repeated _has_init and stat calls #9613

Closed
correctmost opened this issue May 11, 2024 · 0 comments · Fixed by pylint-dev/astroid#2429
Closed
Labels
Enhancement ✨ Improvement to a component Needs astroid update Needs an astroid update (probably a release too) before being mergable performance

Comments

@correctmost
Copy link

correctmost commented May 11, 2024

Bug description

In astroid, there's a _has_init function that looks for the presence of __init__.pyi, __init__.py, and other __init__.* files in a directory.

https://github.com/pylint-dev/astroid/blob/098438683cac8d53e67be75856d7d7aab446bb49/astroid/modutils.py#L669-L678

This function is called repeatedly with the same directory arguments. When running pylint on the yt-dlp codebase, _has_init ends up performing ~43,000 stats, almost all of which are redundant.

Applying a cache to the function brings the number of stats down to ~80 and reduces execution time by ~300ms (~34.1secs -> ~33.8secs).

Configuration

[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=E0401

[REPORTS]
reports=no
score=no

Command used

Steps to reproduce

git clone https://github.com/yt-dlp/yt-dlp.git
cd yt-dlp
git checkout 5904853ae5788509fdc4892cb7ecdfa9ae7f78e6

cat << EOF > ./profile_pylint.py
import cProfile
import pstats
import sys

sys.argv = ['pylint', '--recursive=y', '.']
cProfile.run('from pylint import __main__', filename='stats')

with open('profiler_stats', 'w', encoding='utf-8') as file:
    stats = pstats.Stats('stats', stream=file)
    stats.sort_stats('tottime')
    stats.print_stats()
EOF

cat << EOF > .pylintrc
[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=E0401

[REPORTS]
reports=no
score=no
EOF

python ./profile_pylint.py

Analysis

_has_init calls exists ~43,000 times

import pstats

stats = pstats.Stats('stats')
stats.print_callees('_has_init')

Function                             called...
                                            ncalls  tottime  cumtime
astroid/modutils.py:669(_has_init)      ->   42696    0.039    0.236  <frozen genericpath>:16(exists)
                                             21348    0.051    0.086  <frozen posixpath>:71(join)

Pylint output

There may be some import errors depending on your (virtual) environment, but the output is less important than the performance numbers.

Expected behavior

Improved performance via reduced _has_init and stat calls

Pylint version

astroid @ git+https://github.com/pylint-dev/astroid.git@2c38c0275b790265ab450b79e8dc602e651ca9d3
pylint @ git+https://github.com/pylint-dev/pylint.git@7521eb1dc6ac89fcf1763bee879d1207a87ddefa
Python 3.12.3

OS / Environment

Arch Linux

Additional dependencies

No response

@correctmost correctmost added the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label May 11, 2024
correctmost added a commit to correctmost/astroid that referenced this issue May 11, 2024
_has_init can end up checking for the presence of the same files
over and over.

For example, when running pylint's import-error checks on a
codebase like yt-dlp, ~43,000 redundant stats were performed prior
to caching.

Closes pylint-dev/pylint#9613.
@Pierre-Sassoulas Pierre-Sassoulas added Enhancement ✨ Improvement to a component performance Needs astroid update Needs an astroid update (probably a release too) before being mergable and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels May 12, 2024
Pierre-Sassoulas pushed a commit to pylint-dev/astroid that referenced this issue May 17, 2024
_has_init can end up checking for the presence of the same files
over and over.

For example, when running pylint's import-error checks on a
codebase like yt-dlp, ~43,000 redundant stats were performed prior
to caching.

Closes pylint-dev/pylint#9613.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component Needs astroid update Needs an astroid update (probably a release too) before being mergable performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants