-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance considerations #486
Comments
Hey @rossbeehler! I was thinking about adding concurrency a few times in the past but: So I felt that it's not needed because for Egnyte where we use self-hosted GitLab instance, applying the config for a bit over 1000 projects and over 30 groups takes about 16 minutes now. I assumed that this is not much for a pretty large scale. And it's not trivial because of the output - you'll have to implement some buffering solution to prevent having output for all groups and projects mixed up. Anyway, because I recently don't have too much time for the project (see #343), I am open to PRs adding it. As for the other things that you might do:
Perhaps to be continued... Let me know what you think! |
Thanks for the detailed response, @gdubicki. May take me a while, but I'll see if I can test in GCP I will say we see a rather regular consistent amount of time processing each project, but we'll also turn on verbose at some point to see if anything stands out. |
Keep in mind with a lot of API calls you probably start to hit GitLab's rate limiting rather than just network performance issues: https://docs.gitlab.com/ee/user/gitlab_com/index.html#gitlabcom-specific-rate-limits At least that would be my assumption as I also work with a large self-hosted instance. So even with concurrency, since urllib's Retry respects retry-headers headers by default, I think it would slow down after 429 responses. I may be wrong though, would have to benchmark that. So one aspect of optimization would be for gitlabform to make as few requests as possible (e.g. ensure the max 100 per page for pagination, avoid making the same get calls if the data is already fetched etc. Just an idea though! |
We have this challenge with 10k+ repos, with this, danger, triage-bot, and renovate. Basically anything we want to run across the estate. What we found was running any of them as one continuous run took hours. What we did was take the approach the tool is not the issue, at a repo level it is fast enough. Ergo our execution approach was suboptimal at scale. So what we do now is:
Doing this took our renovate run-time from > 24hrs to < 3hrs, which was certainly a win for us. We also use the Audit Stream to run on triggers. This isn't the code we use now, but you can see an earlier iteration I shared over at the reno repo renovatebot/renovate#13172 (reply in thread) |
I think that makes sense, if I remember correctly we had a similar approach but just with |
Yeah, its a risk but the exploit opportunity is minimised as much as is currently possible. And you need an active SSO session on our IdP. Token scopes will definitely be better but it is only once the support for adding the email address, signing key, and regenerating rather than recreating PRaTs lands that it can be truly secured. |
What target are you using to run the app, @rossbeehler? |
We're running it against our top-level group for our organization. |
We're on GitLab SaaS with over 3000 repos, and climbing, and our GitLabForm nightly process takes many, many hours to run. Are there any tips, tricks, etc. to make this faster. I know I could use the group structure and just run concurrent CI/CD jobs per 2nd-level group, but wondered if there was any other ideas, configurations, etc. that I might be missing. For example, it would be nice if there was a concurrency setting, and all groups in the hierarchy are processed separately/concurrently based on that setting.
The text was updated successfully, but these errors were encountered: