Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stat reloader CPU usage too high on Docker for Mac #2141

Closed
martinsvoboda opened this issue May 21, 2021 · 7 comments · Fixed by #2322
Closed

Stat reloader CPU usage too high on Docker for Mac #2141

martinsvoboda opened this issue May 21, 2021 · 7 comments · Fixed by #2322
Labels
Milestone

Comments

@martinsvoboda
Copy link

martinsvoboda commented May 21, 2021

After migration from Werkzeug 1.0.1 to 2.0.1 stat reloader consuming 100% of CPU. I used stat reloader because inotify doesn't have good support in docker for mac. Using Werkzeug from django-extensions with runserver_plus command. Returning back to Werkzeug 1.0.1 solves problem (cpu seems ok).

Environment:

  • Python version: 3.9.4
  • Werkzeug version: 2.0.1
  • django-extension: 3.1.3
@davidism davidism changed the title Stab reloader CPU usage (docker for mac) Stat reloader CPU usage too high on Docker for Mac May 26, 2021
@mgu
Copy link

mgu commented Jun 14, 2021

It seems that the reloader generates a lot of syscalls to stat() for testing isfile()
Do you think we could cache the result of isfile(path) ? That would benefit to StatReloader and WatchdogReloader

@davidism
Copy link
Member

davidism commented Aug 8, 2021

Caching calls to os.path.isfile() doesn't appear to change the number of fstatat syscalls for either reloader.

run_simple("localhost", 5000, app, use_reloader=True, reloader_type="stat")
$ strace -c -f --trace=%%stat python example.py

I think it's because most of the filesystem work is done by os.walk() which we can't cache because then new/removed files won't be detected and os.stat() to get the mtimes, which is central to how the reloader works and can't be cached.

@davidism davidism closed this as completed Aug 8, 2021
@davidism davidism reopened this Aug 8, 2021
@davidism
Copy link
Member

davidism commented Aug 9, 2021

Werkzeug 2.0 makes the stat reloader more accurate by walking all path directories, rather than only loaded module directories. This makes the stat reloader behavior match the watchdog behavior more, at the expense of higher CPU usage. I'm reluctant to undo that.

Here's some performance stats:

  • If I run the reloader for an example.py file in the werkzeug repo directory and venv, the stat reloader watches about 300 files every 1 second, at 3.5% CPU.
  • If instead I run it from somewhere very generic, such as my home directory with tons of unrelated folders and files, it is understandably much slower, watching 168388 files at 90% CPU.
  • If we go back to the Werkzueg directory, but ran tox which created a bunch of venvs under .tox/, it jumps up to 8000 files, 10% CPU.

There's probably a way for you to optimize what the reloader is watching for your project in your docker container. Perhaps you copied your code to /, when it could be installed in a /project/code directory that doesn't add a very broad tree to sys.path. Or perhaps you have a large amount of files in the tree that can be ignored by pattern, such as */.tox/*. If you're not concerned about installed modules, only your own code, you could add the virtualenv site-packages path to be excluded, which will basically go back to 1.0.x behavior.

You can see how many paths the stat reloader is watching by running the following from the same venv and directory your project will run from:

from werkzeug._reloader import _find_stat_paths
print(len(_find_stat_paths(set(), set()))

You can print out the list as well and perhaps identify a subtree that has files that aren't relevant to reloader and can be excluded.

You can pass a list of fnmatch patterns to run_simple(..., exclude_patterns=[...]) (or the second set() above to test it out) which will tell the reloader to ignore all files matching those patterns. I'm not sure if runserver_plus exposes that, but flask run will expose that soon, or you can call run_simple directly for now.

The reloader already does a bit of optimization by always ignoring __pycache__, .git, and .hg folders. I can add some folders for common Python tools such as tox, nox, pytest, and mypy, as well as the Flask instance folder. Projects will still need to use exclude_patterns to tailor the list to their specific project structure.

I'm inclined to close this based on the reasoning and capabilities above, but post back about the number of files watched and if there's any exclude_patterns you can add.

@davidism
Copy link
Member

davidism commented Aug 9, 2021

I can do a few things to cut down on the number of files scanned by the stat reloader without meaningfully changing what is watched:

  • Ignore common Python tool cache directories (tox, nox, pytest, mypy)
  • Always ignore sys.base_prefix (it's the system Python files, not the venv, it shouldn't be changing)
  • Don't recurse into any directories if they or their parent do not have any Python files (not a package or data within a package)

In the case of running from my home directory, this cuts down from 16k to 800 files. In the case of running from a project directory, it cuts down from 300 (or 8k with tox) to 150.

@martinsvoboda
Copy link
Author

@davidism thank you for the investigation. In my case, I see the problem in

for root, dirs, files in os.walk(path):

os.walk is very CPU extensive because it walks through all frontend files in the project (node_modules). Probably your third suggestion "Don't recurse into any directories if they or their parent do not have any Python files (not a package or data within a package)" would solve the issue.

@davidism
Copy link
Member

davidism commented Aug 9, 2021

Node modules shouldn't be in your flask project. Separate the python and node project files.

@martinsvoboda
Copy link
Author

It is not Flask, but Django, but it doesn't matter. Yes, there are separate, but obviously not in the way Werkzeug would expect:

/app/project/backend/wsgi.py
/app/project/frontend/node_modules/
/app/project/manage.py

Would be great if Werkzeug reloader can handle any project structure and ignore folders without Python files. Or at least document how to prevent high CPU usage. For instance, it is possible to re-mount big directories with empty docker volume:

volumes:
  - /var/empty:/app/project/frontend/node_modules

@davidism davidism added this to the 2.1.0 milestone Jan 26, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants