-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serv command consumes all server resources #4450
Comments
We have upgraded to v1.4.3. If it happens again we try to come up with some more details. |
Still seems to be an issues multiple times a day with v1.4.3. |
I think it is because we have only assigned 16GB memory and some swap. Every connection uses virtual ~1500M memory and ~500M residence memory. This is a insane amount of memory. |
Is there anything useful in gitea.log at such times? |
I have had a look at the logfiles but nothing special is in there. I will keep an eye on it. But what about dumping debug pprof to disk for the serv command to see the actual memory usage hotspots? |
Hi guys, I have no exact explanation but the problem has not be seen after we updated Jenkins. Still i think it is good to have the ability create pprof dumps per new process of serv command to figure out the memory usage. Because OOM-killer (linux) can come by and reap other processes and make the server unusable. In the cases where SSH is requested with a high-rate (big deployments). |
Currently it is hard to find out cause for this without ability to reproduce such behaviour |
I totaly understand, is it a possibility I add a feature for pprof memory dumps to disk on exit with special flag? Because currently there is only a http pprof endpoint which can not be used with short lived serve executions under SSH. Then we can profile CPU and Heap and make some better conclusions than this weird "smoking server" problem. https://github.com/go-gitea/gitea/search?q=pprof&unscoped_q=pprof |
Sure, it could help to have such option to pprof this |
If no one will work on this I can try to implement dumping pprof to disk in the upcoming weeks? |
Yes, go ahead |
I have the same issue - from time to time, hundreds of "gitea serv" processes eat all my system memory. How can I help to track it down? |
Hi @micw I started working on profiling this problem in feature #4560. Currently we can not trace exactly what is going on so I added pprof dumps to disk. More can be read at https://blog.golang.org/profiling-go-programs. I am using the https://github.com/google/pprof web interface visualizer. |
When #4560 is merged (and released) we could share and offline analyze the pprof files and investigate further. |
Hi all, I use gitea 1.3.2 have the same issue almost used CPU resource on AWS EC2 m4.medium,
|
@chihchi This issue causes high memory/io consumption and occurs on 1.4.3. So I'm not sure that your's is the same issue. If so, upgrading to 1.4.3 won't help. |
The issues occured several times this week. It looks like the situation occurs if more "serv" processes are running than the server can handle. If no more requests follow, the server will return to normal after a while. But if clients keep querying git (like jenkins does), more and more processes are launched which consume all available ressources (mainly memory). Basically it's expected behaviour if clients consume more ressources than the server can provide (kind of DoS). But in this case, the amount of client activity is quite small (just a few jenkins jobs polling for changes every minute) - that should be handled without problems. As a workaround, I moved our git to a better server (fast SSD Storage, 64 GB Ram, 12 CPU cores) - the issue did not occur again since that. |
@chihchi you should try upgrading anyway as even if there has no been direct change to fix that but could happen that by fixing other bug this could also could have been fixed |
I need to update my experience - the issue also occurs on the bigger server today - it can handle the load better but already has a load of >60. Since the system is still responsive, I can do some analysis:
-> only 40 gitea serv process are running
only a few processes (~15) last longer and consume much ressources (some consume 1 GB or more memory). After a while (~10 minutes) all those processes are finished and all returned to normal. |
@lafriks is it possible my pprof dump feature could be backported to 1.4.x ? |
@xor-gate not for 1.4.x as 1.5.0 was just released. We don't usually backport features to minor releases |
I understand |
@lafriks Hi all, I upgraded to 1.4.3, it's seem to resolve high CPU loading problem, |
Just want to point out that @chihchi's issue was a different one (completely different, much older version). The issue described in this ticket still exists. |
@micw if you update to 1.5.x it is possible to enable profiling |
Upgrade is scheduled to wednesday. But I have no Idea, how to enable profiling and how to profile. So please let me know how I can provide information to you. |
I upgraded to 1.5.3, the problem still exists and gets worse. I have a bunch of git repos and a few (~30) jenkins jobs polling on it. This causes a fat server (10 cpu cores, 60gb of ram, fully ssd) to overload. Here's how top looks like after a few minutes, almost all of the 60 gb of memory are consumed:
This happens serveral times a day. |
@micw You can try out an RC for 1.6.0 which contains this fix. I am currently running an RC in production anyways. |
We are not backporting the profiling patch to 1.5.x as 1.6.0 will be release soon. |
Hi, I ran the profiling on 1.6.0-rc2 but I cannot interpret the results.
Here's the result: CPU profiling says that the process ran only 16 ms which is definitely wrong. Any ideas? |
I think you are analyzing the incorrect profiles. I have loaded the profiles into pprof (downloaded https://github.com/go-gitea/gitea/releases/download/v1.6.0-rc2/gitea-1.6.0-rc2-linux-amd64 in
https://github.com/google/pprof See screenshots: |
All profiles that where written look that like. I did a short pprof -text on each file and grepped for duration. That one I uploaded had the absolutely longest duration. If you look into the memory profile under alloc_space, you see that almost 3 GB are allocated. There are other profilings with >10 GB allocated. May it be that this is the cause of the enormous load? And may it be that memory allocation (which is done by the OS, not by the go code) is not part of the profiling? |
I'm not that familar with golang but for me it reads at https://github.com/go-gitea/gitea/blob/master/modules/log/file.go#L147 that teh whole logfile is read into memory during gitea serv |
I wiped the serv.log and the problem is gone... |
Reviewed the code. Indeed the whole logfile is read. The purpose is log rotation. This should not be done in the logger, especially not if multiple processes log to the same file. |
Wow, that is crazy. Good find |
It was a long story, but finally 🎉 |
I'd say the story ends when there is a patch ;-) I suggest the following changes:
|
Oh boy, what a find. I agree file size should be used |
Or move log rotation totally out of gitea's responsibility |
I have submitted PR #5282 to remove logfile max line count functionality as max file size should be more than enough for file rotation. |
Yeah I think this option is also not "normal": https://manpages.debian.org/jessie/logrotate/logrotate.8.en.html |
@xor-gate absolutely, I never saw a line based log rotation before. Removing the "feature" should cause no issues. I can barely imagine any use case where line based rotation is required ;) |
I think we can close this issue when #5282 is merged. I'll create a new one to rethink/rework log rotation. |
As @lafriks mentions the size would be enough to keep things within reasonable proportions without blowing up the storage. Thanks all for this teamwork! |
I created #5285 to track the rest of the log rotation that does not affect performance/memory usage. |
[x]
):Description
We use jenkins to poll for changes in git. It uses openssh for access and gitea spawns a serv command per poll. There are many concurrent connections (jenkins set to 10 concurrent SCM pollers) and the server memory and CPU is blown up. It doesn't happen very often but I'm not sure its a deadlock/racecondition in Gitea. We are going to upgrade to 1.4.3 but according to the changelog it should probably not fix the problem for us.
If you need some more information we hopefully could provide it as it only happens once a day or so.
Screenshots
The text was updated successfully, but these errors were encountered: