Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job list view makes multiple ensure_git_repository (slow) calls on the same repo #4142

Closed
u1735067 opened this issue Jul 20, 2023 · 3 comments · Fixed by #4978 or #5043
Closed

Job list view makes multiple ensure_git_repository (slow) calls on the same repo #4142

u1735067 opened this issue Jul 20, 2023 · 3 comments · Fixed by #4978 or #5043
Labels
type: bug Something isn't working as expected

Comments

@u1735067
Copy link
Contributor

Environment

  • Nautobot version (Docker tag too if applicable): 1.5.23
  • Python version: 3.10
  • Database platform, version: pgsql
  • Middleware(s):

Steps to Reproduce

  1. Create multiple jobs from the same unique Git repo
  2. Access the Job list page

Expected Behavior

At most one ensure_git_repository call (per repo) is made

Observed Behavior

One ensure_git_repository per job is made

Additional informations

A little bit like #1515, when a repo contains many (~10-20) jobs, and the git directory is hosted on a slow filesystem (like glusterfs ..), it makes loading the job list page very slow (about 5 secs). Also there's no need to refresh the same repo multiple time in the same request. The code calling this refresh is runnable() (for the buttons):

[..]
  File "/usr/local/lib/python3.10/site-packages/django/template/base.py", line 837, in _resolve_lookup
    current = getattr(current, bit)
  File "/usr/local/lib/python3.10/site-packages/nautobot/extras/models/jobs.py", line 363, in runnable
    and self.job_class is not None
  File "/usr/local/lib/python3.10/site-packages/nautobot/extras/models/jobs.py", line 308, in job_class
    ensure_git_repository(
  File "/usr/local/lib/python3.10/site-packages/nautobot/extras/datasources/git.py", line 289, in ensure_git_repository
    traceback.print_stack()

One way (maybe not the best?) to fix this might be to store in the redis cache a key with the current request id + repo id (with a low expiry time, like 300s, as it's per request) to check if the key exists and in this case prevent the refresh, but this requires to be able to access the request, which only seems feasible through a middleware (https://stackoverflow.com/questions/7267977/is-the-global-request-variable-in-python-django-available)? But in this case maybe the middleware could handle everything (patching the function)? Or instead of the cache, use a variable in that middleware?

@glennmatthews
Copy link
Contributor

Thanks for the report! I think this is fixed by some of the more structural changes to Git and Jobs in 2.0, but we can definitely look at whether this can be addressed by a more targeted fix in the LTM release.

@glennmatthews glennmatthews added the type: bug Something isn't working as expected label Jul 20, 2023
@Kircheneer
Copy link
Contributor

Kircheneer commented Oct 19, 2023

Would it be acceptable if instead we opted for the cache key to be "hostname" + "repo id"? This would make implementation significantly easier and hostname to file system should be at worst a many-to-one, so this should be an immediate performance improvement.

@Kircheneer
Copy link
Contributor

At most one ensure_git_repository call (per repo) is made

#4844 doesn't do exactly this, but it rather ends calls to ensure_git_repository early in case the latest HEAD is already checked out with git on the filesystem

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: bug Something isn't working as expected
Projects
No open projects
Status: Done
3 participants