Performance considerations #486

rossbeehler · 2023-01-25T21:35:49Z

We're on GitLab SaaS with over 3000 repos, and climbing, and our GitLabForm nightly process takes many, many hours to run. Are there any tips, tricks, etc. to make this faster. I know I could use the group structure and just run concurrent CI/CD jobs per 2nd-level group, but wondered if there was any other ideas, configurations, etc. that I might be missing. For example, it would be nice if there was a concurrency setting, and all groups in the hierarchy are processed separately/concurrently based on that setting.

as reported in #486

gdubicki · 2023-01-25T22:19:13Z

Hey @rossbeehler!

I was thinking about adding concurrency a few times in the past but:
a) I always ended up thinking that there is no need,
b) it's not trivial.

So I felt that it's not needed because for Egnyte where we use self-hosted GitLab instance, applying the config for a bit over 1000 projects and over 30 groups takes about 16 minutes now. I assumed that this is not much for a pretty large scale.

And it's not trivial because of the output - you'll have to implement some buffering solution to prevent having output for all groups and projects mixed up.

Anyway, because I recently don't have too much time for the project (see #343), I am open to PRs adding it.

As for the other things that you might do:

run your CI/CD worker as close to GitLab.com as possible for a minimal latency - according to their public docs they are located in GCP in us-east1,
whenever available use group-level configurations to apply settings once per group, instead of for each project individually - f.e. group_variables instead of project_variables,
review the verbose output if your run and look for:
- changes that look unnecessary - we might have more bugs like the one fixed in [Bug Fix] Fix protected branches current configuration access level fetching #474 remaining in the code,
- long processing time of particular sections - you'll have to have timestamps in your CI/CD solution for that,

Perhaps to be continued...

Let me know what you think!

rossbeehler · 2023-01-26T14:25:02Z

Thanks for the detailed response, @gdubicki. May take me a while, but I'll see if I can test in GCP us-east1 and report back on how much it improved performance. We are in Azure East US 2 at the moment, so only a couple states away, but I'm sure co-locating on the same cloud/region would make a significant difference.

I will say we see a rather regular consistent amount of time processing each project, but we'll also turn on verbose at some point to see if anything stands out.

nejch · 2023-01-29T11:10:10Z

Keep in mind with a lot of API calls you probably start to hit GitLab's rate limiting rather than just network performance issues:

https://docs.gitlab.com/ee/user/gitlab_com/index.html#gitlabcom-specific-rate-limits

At least that would be my assumption as I also work with a large self-hosted instance. So even with concurrency, since urllib's Retry respects retry-headers headers by default, I think it would slow down after 429 responses. I may be wrong though, would have to benchmark that.

So one aspect of optimization would be for gitlabform to make as few requests as possible (e.g. ensure the max 100 per page for pagination, avoid making the same get calls if the data is already fetched etc. Just an idea though!

adam-moss · 2023-02-16T11:15:55Z

We have this challenge with 10k+ repos, with this, danger, triage-bot, and renovate. Basically anything we want to run across the estate. What we found was running any of them as one continuous run took hours.

What we did was take the approach the tool is not the issue, at a repo level it is fast enough. Ergo our execution approach was suboptimal at scale. So what we do now is:

hit the gitlab api for a list of all groups & projects
use that to generate child pipelines, running 1 instance of the job for each repo
batch them into blocks of n size, whatever you're comfortable with within the rate limits
use resource_group in the gitlab-ci.yml to ensure only X child pipelines run in parallel.

Doing this took our renovate run-time from > 24hrs to < 3hrs, which was certainly a win for us.

We also use the Audit Stream to run on triggers.

This isn't the code we use now, but you can see an earlier iteration I shared over at the reno repo renovatebot/renovate#13172 (reply in thread)

nejch · 2023-02-16T14:01:17Z

I think that makes sense, if I remember correctly we had a similar approach but just with parallel: and use of CI_NODE_* variables. Btw might want to be careful with renovating 10k repos from a single renovate instance :P https://docs.renovatebot.com/gitlab-bot-security/ should get better after 15.9 with the job token scope.

adam-moss · 2023-02-17T21:28:23Z

Yeah, its a risk but the exploit opportunity is minimised as much as is currently possible. And you need an active SSO session on our IdP. Token scopes will definitely be better but it is only once the support for adding the email address, signing key, and regenerating rather than recreating PRaTs lands that it can be truly secured.

gdubicki · 2023-03-17T22:10:05Z

What target are you using to run the app, @rossbeehler? ALL_DEFINED? ALL? Something else?

rossbeehler · 2023-04-10T12:28:48Z

We're running it against our top-level group for our organization.

gdubicki added a commit that referenced this issue Jan 25, 2023

gitlab.com works too

ee40ab8

as reported in #486

gdubicki added the 🙋question label Jan 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance considerations #486

Performance considerations #486

rossbeehler commented Jan 25, 2023 •

edited

Loading

gdubicki commented Jan 25, 2023 •

edited

Loading

rossbeehler commented Jan 26, 2023

nejch commented Jan 29, 2023

adam-moss commented Feb 16, 2023

nejch commented Feb 16, 2023

adam-moss commented Feb 17, 2023

gdubicki commented Mar 17, 2023

rossbeehler commented Apr 10, 2023

Performance considerations #486

Performance considerations #486

Comments

rossbeehler commented Jan 25, 2023 • edited Loading

gdubicki commented Jan 25, 2023 • edited Loading

rossbeehler commented Jan 26, 2023

nejch commented Jan 29, 2023

adam-moss commented Feb 16, 2023

nejch commented Feb 16, 2023

adam-moss commented Feb 17, 2023

gdubicki commented Mar 17, 2023

rossbeehler commented Apr 10, 2023

rossbeehler commented Jan 25, 2023 •

edited

Loading

gdubicki commented Jan 25, 2023 •

edited

Loading