Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High cpu comsuming cause gitea very slow #15143

Closed
2 of 6 tasks
0000005 opened this issue Mar 24, 2021 · 25 comments
Closed
2 of 6 tasks

High cpu comsuming cause gitea very slow #15143

0000005 opened this issue Mar 24, 2021 · 25 comments
Labels
performance/bigrepo Performance Issues affecting Big Repositories

Comments

@0000005
Copy link

0000005 commented Mar 24, 2021

  • Gitea version (or commit ref): 1.13.2
  • Git version: 1.8.3.1
  • Operating system:
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
  • Log gist:

Description

...
I don't know what action will be causing this problem yet. But it happened frequently.

Screenshots

image
image
The cpu high usage will last for tens of minutes.

@0000005
Copy link
Author

0000005 commented Mar 24, 2021

I found some related issue. #9642
But there is no more feedback.

@zeripath
Copy link
Contributor

Please update - this is likely fixed by #14649 which is in 1.13.3

@zeripath
Copy link
Contributor

The latest point release for 1.13 is 1.13.6 - you should always try the latest point release.

@0000005
Copy link
Author

0000005 commented Mar 24, 2021

Thank you.
I will try the latest stable version to see if the problem remains.

@techknowlogick
Copy link
Member

Closing, please re-open if it remains.

@0000005
Copy link
Author

0000005 commented Mar 25, 2021

Please re-open this issue, because after upgrade to 1.13.6 the problem remains.

@0000005
Copy link
Author

0000005 commented Mar 25, 2021

image

@0000005
Copy link
Author

0000005 commented Mar 25, 2021

tree model
image

@0000005
Copy link
Author

0000005 commented Mar 25, 2021

@techknowlogick

@0000005
Copy link
Author

0000005 commented Mar 25, 2021

what should i do?

@0000005
Copy link
Author

0000005 commented Mar 25, 2021

I am pretty sure that merge pr will cause this problem, although this pr is very small.
image

@noerw
Copy link
Member

noerw commented Mar 26, 2021

The PR may be small, but does that repo have a long history (i.e. > 10.000 commits)?

@0000005
Copy link
Author

0000005 commented Mar 26, 2021

The PR may be small, but does that repo have a long history (i.e. > 10.000 commits)?

image

@noerw noerw added the performance/bigrepo Performance Issues affecting Big Repositories label Mar 26, 2021
@0000005
Copy link
Author

0000005 commented Mar 26, 2021

It is very fast if I merge the branch in the local.

@zeripath
Copy link
Contributor

zeripath commented Mar 27, 2021

I think pprof would be useful here:

[server]
ENABLE_PPROF = true

Just before the slow merge please start:

wget -O cpuprofile.out http://localhost:6060/debug/pprof/profile?seconds=60

Then do the merge and either attach the profile here or talk to me on discord.

I will need to know what version of gitea you are running exactly.


We also need logs. Read the issue template. We do not ask for htop as essentially it is useless. Give us some logs and the results of the pprof I've asked for above.

@0000005
Copy link
Author

0000005 commented Mar 30, 2021

cpuprofile.zip

@0000005
Copy link
Author

0000005 commented Mar 30, 2021

gitea.0330.log
merge action happend at 2021-03-30 14:35:38

@0000005
Copy link
Author

0000005 commented Mar 30, 2021

@zeripath Can you analyze this problem for me? thank you very much!

@zeripath
Copy link
Contributor

zeripath commented Mar 30, 2021

OK you've only sent me the xorm logs which are the least useful logs. Please read: https://docs.gitea.io/en-us/logging-configuration/#debugging-problems.

And the profile doesn't actually contain all of the merge - however, it probably does show where the inefficiency is:

43% of the time in that profile is spent in

// GetLanguageStats calculates language stats for git repository at specified commit
func (repo *Repository) GetLanguageStats(commitID string) (map[string]int64, error) {
r, err := git.PlainOpen(repo.Path)
if err != nil {
return nil, err
}
rev, err := r.ResolveRevision(plumbing.Revision(commitID))
if err != nil {
return nil, err
}
commit, err := r.CommitObject(*rev)
if err != nil {
return nil, err
}
tree, err := commit.Tree()
if err != nil {
return nil, err
}
sizes := make(map[string]int64)
err = tree.Files().ForEach(func(f *object.File) error {
if f.Size == 0 || enry.IsVendor(f.Name) || enry.IsDotFile(f.Name) ||
enry.IsDocumentation(f.Name) || enry.IsConfiguration(f.Name) {
return nil
}
// If content can not be read or file is too big just do detection by filename
var content []byte
if f.Size <= bigFileSize {
content, _ = readFile(f, fileSizeLimit)
}
if enry.IsGenerated(f.Name, content) {
return nil
}
// TODO: Use .gitattributes file for linguist overrides
language := analyze.GetCodeLanguage(f.Name, content)
if language == enry.OtherLanguage || language == "" {
return nil
}
// group languages, such as Pug -> HTML; SCSS -> CSS
group := enry.GetLanguageGroup(language)
if group != "" {
language = group
}
sizes[language] += f.Size
return nil
})
if err != nil {
return nil, err
}
// filter special languages unless they are the only language
if len(sizes) > 1 {
for language := range sizes {
langtype := enry.GetLanguageType(language)
if langtype != enry.Programming && langtype != enry.Markup {
delete(sizes, language)
}
}
}

In particular 27.3% (3.46s) is spent on line 47 above in go-enry.IsVendor:

// IsVendor returns whether or not path is a vendor path.
func IsVendor(path string) bool {
	return matchRegexSlice(data.VendorMatchers, path)
}

Which is remarkably inefficiently written because it simply iterates over a list of regexps checking each one in turn without combining the regexps in to a single regexp.

There are a number of other inefficiencies that this flame graph identifies but I think this is a particularly glaring issue.

zeripath added a commit to zeripath/gitea that referenced this issue Mar 30, 2021
`enry.IsVendor` is kinda slow as it simply iterates across all regexps.
This PR ajdusts the regexps to combine them to make this process a
little quicker.

Related go-gitea#15143

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath
Copy link
Contributor

@81519434 would it be possible to for you give us a profile file for this push on 1.14-rc2 and master?

(I think at least a few things should have been improved even without the isvendor pr I've proposed above but it would be good to know for sure.)

@0000005
Copy link
Author

0000005 commented Apr 1, 2021

The high CPU will keep 10-20 minutes after the merge action complete.
Do you mean that I should capture the profile for this long?
@zeripath

@0000005
Copy link
Author

0000005 commented Apr 1, 2021

We can see git rev-list process is the reason why CPU keeps high consumption from the htop.
Is that mean gitea incorrectly triggered the git command line?
If it is like this, I think that pprof will not find the cause. Because git command line is not managed by go.
Of course, I am not familiar with go. This is all my guess.

6543 pushed a commit that referenced this issue Apr 1, 2021
`enry.IsVendor` is kinda slow as it simply iterates across all regexps.
This PR ajdusts the regexps to combine them to make this process a
little quicker.

Related #15143

Signed-off-by: Andrew Thornton <art27@cantab.net>
zeripath added a commit to zeripath/gitea that referenced this issue Apr 1, 2021
Backport go-gitea#15213

`enry.IsVendor` is kinda slow as it simply iterates across all regexps.
This PR ajdusts the regexps to combine them to make this process a
little quicker.

Related go-gitea#15143

Signed-off-by: Andrew Thornton <art27@cantab.net>
zeripath added a commit to zeripath/gitea that referenced this issue Apr 1, 2021
Backport go-gitea#15213

`enry.IsVendor` is kinda slow as it simply iterates across all regexps.
This PR ajdusts the regexps to combine them to make this process a
little quicker.

Related go-gitea#15143

Signed-off-by: Andrew Thornton <art27@cantab.net>
6543 pushed a commit that referenced this issue Apr 1, 2021
Backport #15213

`enry.IsVendor` is kinda slow as it simply iterates across all regexps.
This PR ajdusts the regexps to combine them to make this process a
little quicker.

Related #15143

Signed-off-by: Andrew Thornton <art27@cantab.net>
6543 pushed a commit that referenced this issue Apr 1, 2021
Backport #15213

`enry.IsVendor` is kinda slow as it simply iterates across all regexps.
This PR ajdusts the regexps to combine them to make this process a
little quicker.

Related #15143

Signed-off-by: Andrew Thornton <art27@cantab.net>
@johanvdw
Copy link
Contributor

johanvdw commented Apr 3, 2021

What happens if you create a new repo on the same server, push the original data and open the same pull request? Your git directory (on the server) might be damaged if git itself is very slow.

More recent versions of git also contain speedups to the mentioned functions, but it should be a fast operation anyway.

@0000005
Copy link
Author

0000005 commented Apr 6, 2021

I will try to upgrade the latest git client verstion, and see if the problem remains.

AbdulrhmnGhanem pushed a commit to kitspace/gitea that referenced this issue Aug 10, 2021
`enry.IsVendor` is kinda slow as it simply iterates across all regexps.
This PR ajdusts the regexps to combine them to make this process a
little quicker.

Related go-gitea#15143

Signed-off-by: Andrew Thornton <art27@cantab.net>
@wxiaoguang
Copy link
Contributor

@0000005 Has the problem been resolved? Since Gitea 1.16 is coming and a lot of code was changed(improved), I think we can close this one. If there is any performance problem, we can continue to improve on 1.16

@go-gitea go-gitea locked and limited conversation to collaborators Apr 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance/bigrepo Performance Issues affecting Big Repositories
Projects
None yet
Development

No branches or pull requests

6 participants