Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage (slow thread lock) after reading gzip file #3697

Closed
wangqr opened this issue Dec 2, 2018 · 13 comments
Closed

High CPU usage (slow thread lock) after reading gzip file #3697

wangqr opened this issue Dec 2, 2018 · 13 comments
Labels

Comments

@wangqr
Copy link

wangqr commented Dec 2, 2018

  • Your Windows build number:

Microsoft Windows [Version 10.0.17763.134] with Archlinux and python 3.7.1

Also reproducible on 10.0.18290.1000 with Ubuntu 18.04 and python 3.6.5, fresh installed from Microsoft store.

  • How to reproduce:
  1. Create a dummy gzip file:
dd if=/dev/urandom bs=256M count=1 iflag=fullblock | gzip > dummy.gz
  1. Run the following python script
#!/usr/bin/python3

import gzip
from time import sleep
from multiprocessing.pool import ThreadPool
import subprocess

def sudo_loop():
    while True:
        # Issue can be reproduced with any process, so use true instead of sudo
        subprocess.Popen(['true']).communicate()
        sleep(1)


def run_with_sudo_loop(function):
    with ThreadPool(processes=2) as pool:
        main_thread = pool.apply_async(function, ())
        pool.apply_async(sudo_loop)
        pool.close()
        result = main_thread.get()  # cProfile shows that this line is using abnormally high CPU
        pool.terminate()
    return result


def dummy():
    # Read the gzip file
    with gzip.open('dummy.gz') as fp:
        fp.read()
    # Then sleep
    print('begin sleep')
    sleep(100000)

if __name__ == '__main__':
    run_with_sudo_loop(dummy)

This python script will run 2 threads. One thread sudo_loop() runs a sub process each second. The other thread dummy() loads a gzip file, then sleep for a long time.
On a linux machine, after the second thread goes to sleep, the CPU usage is nearly 0.
On WSL, the python3 process will constantly have high CPU usage. This can be seen in htop, or taskmgr if WSL becomes too laggy or hang.

This issue is originally opened at actionless/pikaur#317. In the original issue, the first thread runs sudo -v each second. The second thread will call some functions in libalpm, which eventually loads a gzip file with libarchive (a C library to open archive. The gzip module in python does NOT depend on libarchive, but both of them depend on zlib to open gzip files) then waits for user input.

strace: strace.zip
This strace is generated on windows 18290 with Ubuntu 18.04. Strace cannot be used on 17763 due to #3695

@wangqr wangqr changed the title High CPU usage after reading gzip file High CPU usage (slow thread lock) after reading gzip file Dec 4, 2018
@Brian-Perkins
Copy link

From the strace this is likely because gettimeofday() with WSL is a syscall, whereas on regular Linux it is a read from a special memory location. There are multiple issues in #77, but this is one of them.

@therealkenc
Copy link
Collaborator

therealkenc commented Jan 26, 2019

Again fwiw (two in one day), the unmentionable other thread was/is dbus, which on 18.10 Cosmic hangs for a few seconds minutes on sudo service dbus start with the same high CPU symptoms.

I have a better-than-even suspicion there is something "new" here, and it won't be explained (in whole) by gettimeofday() which has been like that since the begining of time. There would have been noise way sooner than a pikaur complaint in Dec 2018. The OP repro is on Arch only, and they trip new patterns in glibc (and other libs) sooner than Store Ubuntu 18.04. I never tracked the dbus regress myself because I didn't keep Cosmic around long enough to look and I don't run Arch. But I suspect there is something to this. [ed, specificity] I think some new pattern is spinning on the gettimeofday() that didn't use to spin; as opposed to a gettimeofday() diverge proper like #77.

@therealkenc
Copy link
Collaborator

Here is the the dbus-daemon strace log, or I'll end up forgetting for another three months. dbus-daemon doesn't spin like I assumed, but it does sit on a ppoll() for 90 seconds at a stretch, for a total of 3m 13s to start. [#77 is a 2x-3x slowdown on gettimeofday(), which fine, gets a 'whah'. Three minutes to start dbus is different matter.]

Blockage looks like the snippit below. I strongly suspect the ppoll() timeout is the same as the select() timeout in the OP strace log.

This is dbus-daemon 1.12.10 with glibc 2.28 on 18323.

44    19:10:02.160669 sendmsg(5, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0AUTH EXTERNAL ", iov_len=15}, {iov_base="30", iov_len=2}, {iov_base="\r\nBEGIN\r\n", iov_len=9}], msg_iovlen=3, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 26
44    19:10:02.161108 gettid()          = 44
44    19:10:02.161269 getrandom("\xeb\xfc\x3f\x2a\xba\x95\xb9\xae\xec\xaf\xb5\x8b\x8f\x49\x12\x6b", 16, GRND_NONBLOCK) = 16
44    19:10:02.161735 clock_gettime(CLOCK_MONOTONIC, {tv_sec=138, tv_nsec=296991600}) = 0
44    19:10:02.161891 recvmsg(5, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
44    19:10:02.162026 clock_gettime(CLOCK_MONOTONIC, {tv_sec=138, tv_nsec=297269600}) = 0
44    19:10:02.162170 ppoll([{fd=5, events=POLLIN}], 1, {tv_sec=89, tv_nsec=998036000}, NULL, 8) = 0 (Timeout)
44    19:11:32.161269 clock_gettime(CLOCK_MONOTONIC, {tv_sec=228, tv_nsec=296544100}) = 0
44    19:11:32.161485 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
44    19:11:32.161662 clock_gettime(CLOCK_MONOTONIC, {tv_sec=228, tv_nsec=296946700}) = 0
44    19:11:32.161858 close(5)          = 0

The OP trace looks like:

206   mprotect(0x7f121c1b4000, 8192, PROT_READ|PROT_WRITE) = 0
206   read(3, "9N\330\f\277]e=\322j\214\356sa\177\337$\247\377\275\322\32\256\305\203\336`\323\0205\216\177"..., 7680) = 7680
206   read(3, "_\341\336\370\30:K\243\233n\300L\24\260V}\337?{\261\7\2019\300\357\22i8[t\25\35"..., 512) = 512
207   <... select resumed> )            = 0 (Timeout)
206   mprotect(0x7f121c1b6000, 8192, PROT_READ|PROT_WRITE <unfinished ...>
207   gettimeofday( <unfinished ...>
206   <... mprotect resumed> )          = 0

dbus-daemon.strace.gz

@Brian-Perkins
Copy link

Thanks @therealkenc -- I took a closer look at this and I believe the difference is due to the fact that we are delivering the signal to a different thread than Linux. Our signal policy is essentially first available thread in the threadgroup, but running a test on Linux it appears to favor the parent thread for SIGCHLD.

205   clone( <unfinished ...>
205   <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f122c0309d0) = 211
...
204   --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=211, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
205   <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 211

@Brian-Perkins Brian-Perkins removed the bug label Feb 1, 2019
@Brian-Perkins
Copy link

I am going to call the signal targeting issue a red-herring (interesting, but not causing this issue). It looks like the problem here is simply that clone() is really slow. Now clone() on WSL is certainly going to be slower than on Linux because this is one of the most challenging emulation points. But slow should be on the order of one second, whereas in the strace it is 8-9 seconds. Maybe this is a one proc VM, where the repeated futex attempt by the other thread is exacerbating the slowness. At any rate the strace from this perspective looks like:

205   clone( <unfinished ...>
207   <... select resumed> )            = 0 (Timeout)
207   gettimeofday({tv_sec=1543787712, tv_usec=373567}, NULL) = 0
207   futex(0xa8b1e8, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1543787712, tv_nsec=378567000}, 0xffffffff) = -1 ETIMEDOUT (Connection timed out)
207   gettimeofday({tv_sec=1543787712, tv_usec=388973}, NULL) = 0
207   futex(0xa8b180, FUTEX_WAKE_PRIVATE, 1) = 0
207   futex(0xa8b1e8, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1543787712, tv_nsec=393973000}, 0xffffffff) = -1 ETIMEDOUT (Connection timed out)

~8.5 secs passes spinning on the futex with 5ms sleeps until the clone completes and the futex is signaled.

207   gettimeofday({tv_sec=1543787720, tv_usec=847956}, NULL) = 0
207   futex(0xa8b180, FUTEX_WAKE_PRIVATE, 1) = 0
207   futex(0xa8b1e8, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1543787720, tv_nsec=852956000}, 0xffffffff <unfinished ...>
205   <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f122c0309d0) = 247
// clone finishes and finally signals the futex
205   futex(0xa8b1e8, FUTEX_WAKE_PRIVATE, 1) = 1
207   <... futex resumed> )             = 0
205   futex(0xa8b168, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
207   futex(0xa8b168, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
205   <... futex resumed> )             = -1 EAGAIN (Resource temporarily unavailable)
207   <... futex resumed> )             = 0

@therealkenc
Copy link
Collaborator

Aright - probably disregard the dbus segue for now then. There are plenty of clone()s in there but nothing that takes on the order of seconds (or even near that long). Someone will probably report the dbus blockage eventually, but whatever the problem it isn't looking like this issue and will need a different cover.

@wangqr
Copy link
Author

wangqr commented Feb 1, 2019

test.strace.zip

Here is the strace with timestamps. The clone() takes ~6ms on a real linux machine.
On WSL, it is fast first, and gradually becomes slower, and finally costs ~10sec.

@Brian-Perkins
Copy link

@wangqr How many processors does the repro environment have? If more than one, can you try disabling any anti-virus software and rerunning the test.

@wangqr
Copy link
Author

wangqr commented Feb 1, 2019

The WSL repro environment is a fresh installed Windows insider, and Ubuntu from Store, on i5-4258u (2C4T), with Windows Defender turned on. The Linux repro environment is an Archlinux on N3050.

I can rerun the test tomorrow with Defender turned off.

@wangqr
Copy link
Author

wangqr commented Feb 2, 2019

test.strace.wsl_18329_defender_off.zip
Defender is disabled using Group Policy, according to this link.
After turning off Defender, clone() still takes ~8sec.

$ grep clone test.strace
19    14:59:27.738875 clone(child_stack=0x7f0d53e6ffb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f0d53e709d0, tls=0x7f0d53e70700, child_tidptr=0x7f0d53e709d0) = 20 <0.000144>
19    14:59:27.741747 clone(child_stack=0x7f0d5365ffb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f0d536609d0, tls=0x7f0d53660700, child_tidptr=0x7f0d536609d0) = 21 <0.000160>
19    14:59:27.744718 clone(child_stack=0x7f0d52e4ffb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f0d52e509d0, tls=0x7f0d52e50700, child_tidptr=0x7f0d52e509d0) = 22 <0.000168>
19    14:59:27.747439 clone(child_stack=0x7f0d5263ffb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f0d526409d0, tls=0x7f0d52640700, child_tidptr=0x7f0d526409d0) = 23 <0.000131>
19    14:59:27.750231 clone(child_stack=0x7f0d51e2ffb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f0d51e309d0, tls=0x7f0d51e30700, child_tidptr=0x7f0d51e309d0) = 24 <0.000162>
20    14:59:27.754465 clone( 
20    14:59:27.770005 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 25 <0.015512>
20    14:59:28.805802 clone( 
20    14:59:28.902467 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 26 <0.096614>
20    14:59:29.931249 clone( 
20    14:59:30.534634 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 27 <0.596846>
20    14:59:31.686543 clone( 
20    14:59:33.058788 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 28 <1.372192>
20    14:59:34.292344 clone( 
20    14:59:36.345903 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 42 <2.053426>
20    14:59:37.376535 clone( 
20    14:59:39.871484 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 43 <2.494900>
20    14:59:40.919529 clone( 
20    14:59:43.077866 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 44 <2.158284>
20    14:59:44.113102 clone( 
20    14:59:47.442746 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 45 <3.329588>
20    14:59:48.471756 clone( 
20    14:59:50.928630 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 47 <2.455550>
20    14:59:52.026043 clone( 
20    14:59:55.611221 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 48 <3.585129>
20    14:59:56.635860 clone( 
20    14:59:59.141481 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 49 <2.505576>
20    15:00:00.174982 clone( 
20    15:00:02.782544 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 50 <2.607512>
20    15:00:03.804078 clone( 
20    15:00:06.554074 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 51 <2.749950>
20    15:00:07.575590 clone( 
20    15:00:11.011709 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 52 <3.436073>
20    15:00:12.030540 clone( 
20    15:00:15.546079 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 53 <3.515489>
20    15:00:16.567787 clone( 
20    15:00:20.178102 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 54 <3.610270>
20    15:00:21.206610 clone( 
20    15:00:25.338300 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 55 <4.131645>
20    15:00:26.359392 clone( 
20    15:00:30.806924 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 56 <4.447493>
20    15:00:31.825784 clone( 
20    15:00:36.740248 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 57 <4.914418>
20    15:00:37.762260 clone( 
20    15:00:42.770947 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 58 <5.008641>
20    15:00:43.791102 clone( 
20    15:00:51.052857 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 59 <7.261682>
20    15:00:52.077099 clone( 
20    15:00:58.730689 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 60 <6.653541>
20    15:00:59.775518 clone( 
20    15:01:07.764343 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 61 <7.988764>
20    15:01:08.787023 clone( 
20    15:01:16.003229 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 62 <7.216154>
20    15:01:17.022932 clone( 
20    15:01:23.637983 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 63 <6.615008>
20    15:01:24.659841 clone( 
20    15:01:31.378765 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 64 <6.718875>
20    15:01:32.401456 clone( 
20    15:01:39.580912 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 65 <7.179411>
20    15:01:40.613041 clone( 
20    15:01:48.662118 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 66 <8.049015>
20    15:01:49.686539 clone( 
20    15:01:57.513741 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 67 <7.827153>
20    15:01:58.537691 clone( 
20    15:02:06.155386 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 68 <7.617653>
20    15:02:07.175481 clone( 
20    15:02:14.994761 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 69 <7.819224>
20    15:02:16.017592 clone( 
20    15:02:24.114733 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 70 <8.097083>
20    15:02:25.143484 clone( 
20    15:02:33.595337 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 71 <8.451802>
20    15:02:34.619535 clone( 
20    15:02:44.322015 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 72 <9.702431>
20    15:02:45.708881 clone( 
20    15:02:53.268223 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 73 <7.559287>
20    15:02:54.286629 clone( 
20    15:03:00.846601 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 74 <6.559912>
20    15:03:01.862316 clone( 
20    15:03:07.860921 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 75 <5.998529>
20    15:03:08.877983 clone( 
20    15:03:14.696141 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 76 <5.818031>
20    15:03:15.713806 clone( 
20    15:03:23.020814 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 78 <7.306902>
20    15:03:24.035446 clone( 
20    15:03:29.846315 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 79 <5.810815>
20    15:03:30.862845 clone( 
20    15:03:36.720649 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 80 <5.857723>
20    15:03:37.736573 clone( 
20    15:03:43.576207 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 81 <5.839561>
20    15:03:44.592914 clone( 
20    15:03:51.543132 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 82 <6.950138>
20    15:03:52.558935 clone( 
20    15:03:58.429637 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 83 <5.870618>
20    15:03:59.445356 clone( 
20    15:04:05.515981 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 84 <6.070550>
20    15:04:06.536120 clone( 
20    15:04:12.393031 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 85 <5.856856>
20    15:04:13.410141 clone( 
20    15:04:19.196490 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 86 <5.786255>
20    15:04:20.215090 clone( 
20    15:04:26.495931 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 87 <6.280624>
20    15:04:27.537266 clone( 
20    15:04:38.405305 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 88 <10.867984>
20    15:04:39.420375 clone( 
20    15:04:48.752688 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 89 <9.332252>
20    15:04:49.768646 clone( 
20    15:04:56.937563 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 90 <7.168848>
20    15:04:57.954537 clone( 
20    15:05:04.249592 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 91 <6.294992>
20    15:05:05.266078 clone( 
20    15:05:12.736971 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 92 <7.470819>
20    15:05:13.757793 clone( 
20    15:05:21.107248 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 93 <7.349314>
20    15:05:22.124361 clone( 
20    15:05:29.295666 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 94 <7.171231>
20    15:05:30.310607 clone( 
20    15:05:37.046414 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 95 <6.735665>
20    15:05:38.068415 clone( 
20    15:05:46.211427 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 96 <8.142917>
20    15:05:47.226771 clone( 
20    15:05:53.226165 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 97 <5.999313>
20    15:05:54.242766 clone( 
20    15:06:00.285956 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 98 <6.043128>
20    15:06:01.306604 clone( 
20    15:06:07.240449 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 99 <5.933772>
20    15:06:08.256116 clone( 
20    15:06:14.568528 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 100 <6.312317>
20    15:06:15.585045 clone( 
20    15:06:23.787992 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 101 <8.202889>
20    15:06:24.815066 clone( 
20    15:06:34.568494 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 102 <9.753364>
20    15:06:35.587702 clone( 
20    15:06:43.426815 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 103 <7.838885>
20    15:06:44.444212 clone( 
20    15:06:52.036196 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 104 <7.591889>
20    15:06:53.058508 clone( 
20    15:07:00.057223 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 105 <6.998634>
20    15:07:01.073059 clone( 
20    15:07:07.954777 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 106 <6.881664>
20    15:07:08.973558 clone( 
20    15:07:14.888812 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 107 <5.915126>
20    15:07:15.909595 clone( 
20    15:07:23.126763 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 108 <7.217079>
20    15:07:24.142570 clone( 
20    15:07:32.927551 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 123 <8.784933>
20    15:07:33.946257 clone( 
20    15:07:39.911348 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 126 <5.964975>
20    15:07:40.928480 clone( 
20    15:07:46.991722 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 130 <6.063143>
20    15:07:48.007579 clone( 
20    15:07:55.027148 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 134 <7.019519>
20    15:07:56.044416 clone( 
20    15:08:01.988022 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0d53e709d0) = 135 <5.943499>
20    15:08:03.006698 clone( 
20    15:08:09.377830 <... clone resumed> ) = ?

@wangqr
Copy link
Author

wangqr commented Jun 15, 2019

WSL 2 on build 18917 does not have this issue.

@raggi
Copy link

raggi commented Dec 16, 2019

I just diagnosed a similar pathology with GN building fuchsia under wsl 1 and found that swapping out for tcmalloc mitigates the issue. It seems that the way that glibc's allocator is marking pages is pathological for the clone emulation, whereas tcmallocs doesn't have the problem.

Copy link
Contributor

This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants