Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage by droned process #1099

Closed
ghost opened this issue Jul 15, 2015 · 12 comments
Closed

High CPU usage by droned process #1099

ghost opened this issue Jul 15, 2015 · 12 comments
Milestone

Comments

@ghost
Copy link

ghost commented Jul 15, 2015

I am running a drone docker container integrated with GitLab with Drone Version: 0.3.0-alpha-1436428588.

The CPU utilization reaches 99.9% generally after 24 hours of uptime of 'droned' process. Following is output of 'top' from the 'Ubuntu 14.04.2 LTS, Trusty Tahr' host running drone.

top - 18:09:56 up 1 day, 17:32, 1 user, load average: 1.00, 1.01, 0.97
Tasks: 75 total, 3 running, 72 sleeping, 0 stopped, 0 zombie
%Cpu0 : 99.0 us, 0.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
KiB Mem: 2050060 total, 813368 used, 1236692 free, 262560 buffers
KiB Swap: 0 total, 0 used, 0 free. 349076 cached Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2724 root 20 0 487928 17616 7676 R 99.9 0.9 55:54.18 droned

There hardly 2-3 builds running in a day so most of the time drone is inactive, in spite of inactivity it utilizes 99% of CPU.

Please look into this issue. I can provide you more details if needed regarding this issue for debug.

@bscott
Copy link

bscott commented Jul 16, 2015

Have you tried using http://www.sysdig.org/ utility to gather more debug info about the droned process while it's running.

@ghost
Copy link
Author

ghost commented Jul 22, 2015

Following is the output of sysdig process trace, I am getting when CPU utilization is at 99%.

32 01:24:45.558975540 0 droned (11153) < clock_gettime
33 01:24:45.559026435 0 droned (11153) > switch next=5830(sysdig) pgft_maj=12 pgft_min=1772 vm_size=568116 vm_rss=33116 vm_swap=0
34 01:24:45.561364006 0 droned (11150) < select res=0
35 01:24:45.561365538 0 droned (11150) > clock_gettime
36 01:24:45.561365860 0 droned (11150) < clock_gettime
37 01:24:45.561366467 0 droned (11150) > epoll_wait maxevents=128
38 01:24:45.561367111 0 droned (11150) < epoll_wait res=0
39 01:24:45.561367883 0 droned (11150) > select
40 01:24:45.561370586 0 droned (11150) > switch next=11153(droned) pgft_maj=0 pgft_min=26 vm_size=568116 vm_rss=33116 vm_swap=0
41 01:24:45.561479761 0 droned (11153) > clock_gettime
42 01:24:45.561480348 0 droned (11153) < clock_gettime
43 01:24:45.561546667 0 droned (11153) > clock_gettime
44 01:24:45.561546907 0 droned (11153) < clock_gettime
45 01:24:45.561677894 0 droned (11153) > clock_gettime
46 01:24:45.561678224 0 droned (11153) < clock_gettime
47 01:24:45.561743967 0 droned (11153) > clock_gettime
48 01:24:45.561744246 0 droned (11153) < clock_gettime
49 01:24:45.561875380 0 droned (11153) > clock_gettime
50 01:24:45.561875623 0 droned (11153) < clock_gettime
51 01:24:45.561941489 0 droned (11153) > clock_gettime
52 01:24:45.561941700 0 droned (11153) < clock_gettime

@chengweiv5
Copy link

I think I hit this issue too. I'm using the latest 0.3.0-alpha release and after drone running 1 day long, the page loading costs more than 1 minute, even for a user login.

I see that angular.min.js says get user costs more than 15 seconds and see /api/user return 401 in chrome debug console.

However, after I added some debug logs into GetUserCurrent(), there's no such logs output, seems the angular.min.js didn't access /api/user at all.

Weird! But if I click login to login, GetUserCurrent() does print some logs.

@chengweiv5
Copy link

em, seems there are some magic in go web server, even I run http GET http://host/api/user multiple times, GetUserCurrent only print logs at the first request, seems there are some cache in web server, filtered the following request out directly.

@chengweiv5
Copy link

OK, it's first handled by RequireUser.

@bradrydzewski
Copy link

No magic on the server side. The angular app caches the user for duration of the session (in JavaScript):
https://github.com/drone/drone/blob/master/server/app/scripts/services/auth.js#L5

The appropriate headers are set server-side to prevent browser caching:
https://github.com/drone/drone/blob/master/server/middleware/header.go#L12

This issue could be the result of a race (in Drone or in the underlying http library) or even a race in the sqlite database code (unlikely). I was able to reproduce once, but not reliably, which has limited the ability to diagnose and fix.

If you are using mysql it could be this known issue with the driver timing out as well, if mysql kills idle connections:
go-sql-driver/mysql#257

@chengweiv5
Copy link

I'm using sqlite3 and below is a chrome debugger screenshot.

drone-load

@benschumacher
Copy link

We have seen this behavior, too. I created a build w/Go the profiler and didn't have much luck identifying the problem. I'd been intending to do some instrumentation of the code to try to identify where it might be occurring, but haven't gotten around to it.

In the meantime, I've been hoping that 0.4 won't exhibit the same behavior, and look to aggressively move to that soon.

@bradrydzewski
Copy link

@benschumacher agreed, the problem may not even exist in 0.4 which is why I haven't pursued it more

@ghost
Copy link
Author

ghost commented Aug 13, 2015

FYI,

I am running Version: 0.3.0-alpha-1437596183 now & I don't see high CPU usage. The 'droned' process is up since last 13 days without any spike on CPU usage.

@chengweiv5
Copy link

After having a day long running, didn't reproduce this issue today, I'll wait for another day.

@bradrydzewski bradrydzewski added this to the v0.4.0 milestone Aug 18, 2015
@bradrydzewski
Copy link

marking as fixed in 0.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants