Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU usage at 100% #1594

Closed
doppioandante opened this issue Sep 26, 2017 · 20 comments
Closed

CPU usage at 100% #1594

doppioandante opened this issue Sep 26, 2017 · 20 comments

Comments

@doppioandante
Copy link
Contributor

doppioandante commented Sep 26, 2017

I've been having kakoune instances running at 100% for a while.
Common factor in all these is that they were processes that I had closed since long (probably by just killing the terminal, but I'm not that sure).
Today I profiled a running instance with perf and this is what I've got (~50MB):
https://mega.nz/#!mQskAQrJ!YgvpHaGdeZgBrNcKC5oI9NCom3jjxfltrcTdMl2nXD4
[Use perf report to see the call graph]
I could still attach to the session with kak -c and it looks like everything is normal, that old buffer was still open (asking for reload due to changes) and I could navigate it and so on.
I'm leaving the process open for a few hours in case someone wants to chime in and tell me what to profile exactly.
Some debug info:

pid: 5959
session: 5959
build: release
Memory usage:
  Undefined: 13112
  String: 221807
  SharedString: 4691
  BufferContent: 1408
  BufferMeta: 1208
  Options: 25376
  Highlight: 212320
  Display: 47488
  Mapping: 0
  Commands: 82312
  Hooks: 5360
  Aliases: 2560
  EnvVars: 20960
  Faces: 10504
  Values: 760
  Registers: 3144
  Client: 160
  WordDB: 0
  Selections: 432
  History: 320
  Remote: 16581
  Events: 192
  Completion: 24
  Total: 670719
  Malloced: 5228128
Shared Strings stats:
  data size: 21, mean: 21,000000
  refcounts: 2147483648, mean: 2147483648,000000
UnorderedMap (1000) -- inserts: 131us, reads: 22us, remove: 41us, find: 11us (356)
     HashMap (1000) -- inserts: 45us, reads: 20us, remove: 28us, find: 4us (374)
UnorderedMap (10000) -- inserts: 698us, reads: 240us, remove: 452us, find: 120us (3728)
     HashMap (10000) -- inserts: 592us, reads: 203us, remove: 294us, find: 51us (3685)
UnorderedMap (100000) -- inserts: 7769us, reads: 2990us, remove: 5753us, find: 1345us (36644)
     HashMap (100000) -- inserts: 6207us, reads: 2804us, remove: 3395us, find: 617us (36838)
UnorderedMap (1000000) -- inserts: 152132us, reads: 124162us, remove: 178223us, find: 29725us (367648)
     HashMap (1000000) -- inserts: 69402us, reads: 75602us, remove: 83459us, find: 13917us (367812)
UnorderedMap (10000000) -- inserts: 2697819us, reads: 1565052us, remove: 2127715us, find: 372260us (3668829)
     HashMap (10000000) -- inserts: 947272us, reads: 1498496us, remove: 1099262us, find: 202232us (3670735)
Buffers:
media.py
Flags: File (/home/enrico/Documents/media.py) 
Used mem: content=2052 additional=376
*debug*
Flags: NoUndo 
Used mem: content=2625 additional=1100
Options:
 * tabstop: 8
 * indentwidth: 4
 * scrolloff: 0,0
 * eolformat: lf
 * BOM: none
 * incsearch: true
 * autoinfo: command|onkey
 * autoshowcompl: true
 * aligntab: false
 * ignored_files: ^(\..*|.*\.(o|so|a))$
 * disabled_hooks: 
 * filetype: 
 * path: ./:/usr/include
 * completers: filename:word=all
 * static_words: 
 * autoreload: ask
 * idle_timeout: 50
 * fs_check_timeout: 500
 * ui_options: 
 * modelinefmt: %val{bufname} %val{cursor_line}:%val{cursor_char_column} {{context_info}} {{mode_info}} - %val{client}@[%val{session}]
 * debug: 
 * readonly: false
 * extra_word_chars: 
 * docsclient: 
 * git_blame_flags: 0:
 * git_diff_flags: 0:
 * editorconfig_trim_trailing_whitespace: false
 * autorestore_purge_restored: true
 * modelines: 5
 * go_complete_tmp_dir: 
 * gocode_completions: :
 * go_format_tmp_dir: 
 * go_doc_tmp_dir: 
 * racer_tmp_dir: 
 * racer_completions: :
 * jedi_tmp_dir: 
 * jedi_completions: :
 * jedi_python_path: 
 * clang_options: 
 * clang_tmp_dir: 
 * clang_completions: :
 * clang_flags: 0:
 * clang_errors: 0:
 * lintcmd: 
 * lint_flags: 0:
 * lint_errors: 0:
 * spell_regions: 0:
 * spell_lang: 
 * spell_tmp_file: 
 * ctagsfiles: tags
 * ctagsopts: -R
 * ctagspaths: .
 * autowrap_column: 80
 * autowrap_format_paragraph: false
 * autowrap_fmtcmd: fold -s -w %c
 * termcmd: gnome-terminal -e      
 * grepcmd: grep -RHn
 * toolsclient: 
 * grep_current_line: 0
 * jumpclient: 
 * makecmd: make
 * make_error_pattern:  (?:fatal )?error:
 * make_current_error_line: 0
 * comment_line: #
 * comment_block: 
 * formatcmd: 
 * manpage: 
 * c_include_guard_style: ifdef
 * alt_dirs: .:..
@doppioandante
Copy link
Contributor Author

I debugged the program a bit and nothing wrong occurs to me, except the fact that the redraw_clients method is always called.
From IRC:

04:11 <mawww> lumo_e: redraw_clients is called in a loop, but only after we handle an event, like some input arrived from a client, or a timer fired
04:13 <mawww> so here my best guess is that we have a bug and end up with a file descriptor in the event manager thats continually ready for read (maybe its a connection that got dropped, but for some reason we dont close the socket locally)

@doppioandante
Copy link
Contributor Author

It looks like my terminal has a bug and doesn't shut processes properly. I attached strace to a running process and closed it in two different terminals.
This is qterminal, which works correctly:

strace: Process 4552 attached
pselect6(4, [0 3], [], [], {0, 211048650}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499910140}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499858890}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499870679}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499842518}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499862736}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499912573}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499903277}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499877391}, {NULL, 8}) = 1 (in [0], left {0, 21155014})
poll([{fd=0, events=POLLIN}], 1, 0)     = 1 ([{fd=0, revents=POLLIN|POLLERR|POLLHUP}])
read(0, "", 1)                          = 0
--- SIGHUP {si_signo=SIGHUP, si_code=SI_KERNEL} ---
--- SIGCONT {si_signo=SIGCONT, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[HUP]})              = 0
rt_sigaction(SIGWINCH, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, {0x5635c619194d, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, 8) = 0
rt_sigaction(SIGCONT, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, {0x5635c619194d, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, 8) = 0
rt_sigreturn({mask=[]})                 = 94788985323824
write(1, "\33]2;*scratch* 1:1  1 sel - unnam"..., 53) = -1 EIO (Input/output error)
rt_sigaction(SIGTSTP, {SIG_IGN, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, {0x5635c616141b, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, 8) = 0
poll([{fd=0, events=POLLIN}], 1, 0)     = 1 ([{fd=0, revents=POLLIN|POLLERR|POLLHUP}])
write(1, "\33[2d\33[34m~  \33[30m\33[43m        \342\225"..., 1655) = -1 EIO (Input/output error)
write(1, "6G\342\224\202\33[20;1H\33[39;49m\33[34m~  \33[30"..., 231) = -1 EIO (Input/output error)
rt_sigaction(SIGTSTP, {0x5635c616141b, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, NULL, 8) = 0
pselect6(4, [0 3], [], [], {0, 19458886}, {NULL, 8}) = 1 (in [0], left {0, 19456869})
geteuid()                               = 1000
open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 4
lseek(4, 0, SEEK_CUR)                   = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=2339, ...}) = 0
mmap(NULL, 2339, PROT_READ, MAP_SHARED, 4, 0) = 0x7f9324129000
lseek(4, 2339, SEEK_SET)                = 2339
munmap(0x7f9324129000, 2339)            = 0
close(4)                                = 0
unlink("/tmp/kakoune/enrico/4552")      = 0
close(3)                                = 0
rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, {0x5635c6161289, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, 8) = 0
rt_sigaction(SIGTSTP, {0x7f9323cf8210, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, {0x5635c616141b, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, 8) = 0
write(1, "\33[?1000l", 8)               = -1 EIO (Input/output error)
write(1, "\33[?1004l\33[?1002l", 16)    = -1 EIO (Input/output error)
write(1, "\33[21;1H\33[?12l\33[?25h", 19) = -1 EIO (Input/output error)
write(1, "\33[?1049l\r\33[?1l\33>", 16) = -1 EIO (Input/output error)
ioctl(1, TCGETS, 0x7fff45cc4260)        = -1 EIO (Input/output error)
ioctl(1, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = -1 EIO (Input/output error)
rt_sigaction(SIGWINCH, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, 8) = 0
rt_sigaction(SIGCONT, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, {SIG_DFL, [], SA_RESTORER|SA_RESTART, 0x7f9322e757f0}, 8) = 0
exit_group(0)                           = ?
+++ exited with 0 +++

On the other hand, Tilix does this:

pselect6(4, [0 3], [], [], {0, 47778921}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499898111}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499919831}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499883100}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499902143}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499875717}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499913544}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499809702}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499924971}, {NULL, 8}) = 1 (in [0], left {0, 276186918})
poll([{fd=0, events=POLLIN}], 1, 0)     = 1 ([{fd=0, revents=POLLIN}])
read(0, "\33", 1)                       = 1
poll([{fd=0, events=POLLIN}], 1, 25)    = 1 ([{fd=0, revents=POLLIN}])
read(0, "[", 1)                         = 1
poll([{fd=0, events=POLLIN}], 1, 25)    = 1 ([{fd=0, revents=POLLIN}])
read(0, "I", 1)                         = 1
poll([{fd=0, events=POLLIN}], 1, 0)     = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 275628933}, {NULL, 8}) = 1 (in [0], left {0, 228287026})
poll([{fd=0, events=POLLIN}], 1, 0)     = 1 ([{fd=0, revents=POLLIN}])
read(0, "\33", 1)                       = 1
poll([{fd=0, events=POLLIN}], 1, 25)    = 1 ([{fd=0, revents=POLLIN}])
read(0, "[", 1)                         = 1
poll([{fd=0, events=POLLIN}], 1, 25)    = 1 ([{fd=0, revents=POLLIN}])
read(0, "O", 1)                         = 1
poll([{fd=0, events=POLLIN}], 1, 0)     = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 227905903}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499866425}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499882898}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499871977}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499873104}, {NULL, 8}) = 0 (Timeout)
pselect6(4, [0 3], [], [], {0, 499879705}, {NULL, 8}) = 0 (Timeout)

(and it goes on and on).

@doppioandante
Copy link
Contributor Author

Putting a breakpoint on Kakoune::ClientManager::redraw_clients periodically stops kakoune even if I'm not focusing the terminal, AFAICT that timer firing isn't a problem when kakoune is open, but it pegs the CPU when the terminal is closed.

@doppioandante
Copy link
Contributor Author

doppioandante commented Sep 27, 2017

The following line seems to work around the problem Nope it doesn't

diff --git a/src/main.cc b/src/main.cc
index c69c72a3..1b3dfabe 100644
--- a/src/main.cc
+++ b/src/main.cc
@@ -676,6 +676,9 @@ int run_server(StringView session, StringView server_init,
                     throw convert_to_client_mode{ std::move(session), std::move(buffer_name), std::move(selections) };
                 }
             }
+
+            if (not isatty(1))
+                terminate = true;
         }
     }
     catch (const kill_session&) {}

@doppioandante
Copy link
Contributor Author

After a reboot I cannot reproduce the high CPU usage anymore,
The perf profile is now split at 50/50 among the redraw_clients method and EventManager::handle_next_events method.
It looks like it depends from some particular system state, closing for now, but I'll reopen in case I get new data.

@doppioandante
Copy link
Contributor Author

doppioandante commented Sep 29, 2017

This time after a logout and relogin, a kak process was still open and doing this at a blazing speed:

pselect6(4, [0 3], [], [], {0, 35083407}, {NULL, 8}) = 1 (in [0], left {0, 35082522})
pselect6(4, [0 3], [], [], {0, 35051878}, {NULL, 8}) = 1 (in [0], left {0, 35050987})
poll([{fd=0, events=POLLIN}], 1, 0)     = 1 ([{fd=0, revents=POLLIN|POLLERR|POLLHUP}])
read(0, "", 1)                          = 0
pselect6(4, [0 3], [], [], {0, 34987806}, {NULL, 8}) = 1 (in [0], left {0, 34986944})
pselect6(4, [0 3], [], [], {0, 34956435}, {NULL, 8}) = 1 (in [0], left {0, 34955593})
poll([{fd=0, events=POLLIN}], 1, 0)     = 1 ([{fd=0, revents=POLLIN|POLLERR|POLLHUP}])
read(0, "", 1)                          = 0
pselect6(4, [0 3], [], [], {0, 34891926}, {NULL, 8}) = 1 (in [0], left {0, 34891067})
pselect6(4, [0 3], [], [], {0, 34860403}, {NULL, 8}) = 1 (in [0], left {0, 34859555})
poll([{fd=0, events=POLLIN}], 1, 0)     = 1 ([{fd=0, revents=POLLIN|POLLERR|POLLHUP}])
read(0, "", 1)                          = 0

@lenormf
Copy link
Contributor

lenormf commented Sep 29, 2017

Something is flooding the process' stdin with empty strings.

@mawww
Copy link
Owner

mawww commented Sep 30, 2017

not flooding it with empty strings, read returning zero means we hit the end of file, which for stdin would mean the terminal is gone, but we should have received a SIGHUP if that was the case.

@lenormf
Copy link
Contributor

lenormf commented Sep 30, 2017

Wouldn't a broken pipe occur if the terminal was gone, because the file descriptor wouldn't be reading from anything anymore? I'm surprised no error events are produced by select in this particular case.

@doppioandante
Copy link
Contributor Author

It is in fact because SIGHUP is not received when the terminal is closed (elvish bug) but I remember this high CPU usage with hanging processes happening even with other shells, probably due to other causes. Would it be difficult to make the fd handling more resilient?

@Screwtapello
Copy link
Contributor

I just had the same thing happen with the JSON UI, where Kakoune doesn't have a controlling terminal. The parent process exited, so Kakoune's stdin was closed, so it sat there buzzing at 100% CPU until I killed it.

@doppioandante doppioandante reopened this Apr 26, 2018
@doppioandante
Copy link
Contributor Author

doppioandante commented Apr 26, 2018

I just had the same thing happen with the JSON UI, where Kakoune doesn't have a controlling terminal. The parent process exited, so Kakoune's stdin was closed, so it sat there buzzing at 100% CPU until I killed it.

I remember this happening sometimes using my QML ui. I'm reopening this for the moment, but I have no time to look into it.

@Jackojc
Copy link

Jackojc commented Jan 11, 2019

I seem to be having this issue at times aswell.

@Jens0512
Copy link

Jens0512 commented Mar 5, 2019

Just got kakoune some nice 20 minutes ago, apart for the fact that CPU usage's been stuck at 100% the whole time, I'm very impressed. Typing this, I realized I miss kakoune already!

@mawww
Copy link
Owner

mawww commented Mar 6, 2019

@Jackojc @Jens0512 Which version of Kakoune did that happen on, were you doing anything special ?

@Jens0512
Copy link

Jens0512 commented Mar 6, 2019

% kak -version
Kakoune v2019.01.20

It was the first time I've ever opened kak, apart for that nothing special.

@mawww
Copy link
Owner

mawww commented Mar 6, 2019

What machine is that running on ? Would it be possible to run a profiler on it to see whats taking time ?

@Jens0512
Copy link

Jens0512 commented Mar 6, 2019

Well, unfortunately now it is only running at ~0,0% CPU usage :D

My machine is a MacBookAir4,1 1.0
running Linux 5.0.0-arch1-1-ARCH.

I've no idea how to profile kakoune, but with some instructions to refer to, I'll happily do so if this happens again.

@mawww
Copy link
Owner

mawww commented Mar 6, 2019

On linux, install perf, then run perf record --call-graph lbr -p <kakoune pid> for say 20 seconds (do that when Kakoune is stuck at 100% cpu), then post the output of perf report -G --stdio.

@Jens0512
Copy link

Jens0512 commented Mar 7, 2019

Excellent! I will do so if the situation arises again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants