New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clone() syscall infinitely restarts because of SIGPROF signals #97

Closed
advancedxy opened this Issue Mar 20, 2018 · 6 comments

Comments

Projects
None yet
2 participants
@advancedxy

advancedxy commented Mar 20, 2018

When profiling a Spark application with large memory and subprocess execution(launch a subprocess in the JVM or native library side), the whole process was hanging at fork forever.

After some debugging, I believe it's similar with https://bugzilla.redhat.com/show_bug.cgi?id=645528 .
And the workaround is simple: increase interval to 20ms.

You can add this to the README or I can send a pr for this

@apangin

This comment has been minimized.

Member

apangin commented Mar 20, 2018

Interesting. Can you reproduce the issue on a reduced test case? (to make sure this is exactly the problem you referred to).
A paragraph in the troubleshooting section will be helpful then.
Thanks.

@advancedxy

This comment has been minimized.

advancedxy commented Mar 20, 2018

Let's setup a minimal reproduce case first then. I will post back when I get one.

@advancedxy

This comment has been minimized.

advancedxy commented Mar 20, 2018

Let's setup a minimal reproduce case first then. I will post back when I get one.

Sorry, I tried to setup a minimal reproduce case in JVM only, however the scenario I described cannot be reproduced.

I reproduced the case by simplify my real workload, it's indeed hangs at clone. I cannot post my workload here, but the important debug process can be shared:

After I found the thread hanging forever, I use strace -p $lwpid to generate following output

--- SIGPROF (Profiling timer expired) @ 0 (0) ---
read(52, "w\322\3\0\0\0\0\0", 8)        = 8
gettid()                                = 32558
ioctl(52, 0x2403, 0)                    = 0
ioctl(52, 0x2402, 0x1)                  = 0
rt_sigreturn(0x34)                      = 56
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7feae49b19bc) = ? ERESTARTNOINTR (To be restarted)
--- SIGPROF (Profiling timer expired) @ 0 (0) ---
read(52, "\345\323\3\0\0\0\0\0", 8)     = 8
gettid()                                = 32558
ioctl(52, 0x2403, 0)                    = 0
ioctl(52, 0x2402, 0x1)                  = 0
rt_sigreturn(0x34)                      = 56
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7feae49b19bc) = ? ERESTARTNOINTR (To be restarted)
--- SIGPROF (Profiling timer expired) @ 0 (0) ---
read(52, "l\321\3\0\0\0\0\0", 8)        = 8
gettid()                                = 32558
ioctl(52, 0x2403, 0)                    = 0
ioctl(52, 0x2402, 0x1)                  = 0
rt_sigreturn(0x34)                      = 56
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7feae49b19bc) = ? ERESTARTNOINTR (To be restarted)

It looks that the clone is repeatedly interrupted by SIGPROF

@apangin apangin changed the title from Provide a gotcha/troubleshooting experience to clone() syscall infinitely restarts because of SIGPROF signals Mar 20, 2018

@apangin

This comment has been minimized.

Member

apangin commented Mar 20, 2018

Thank you for a great analysis! Feel free to add a README paragraph or leave it to me if you prefer.

@advancedxy

This comment has been minimized.

advancedxy commented Mar 21, 2018

I will send a PR for for the README then.

@apangin

This comment has been minimized.

Member

apangin commented Mar 27, 2018

Let me close this one. Thanks again for pointing out this issue.

@apangin apangin closed this Mar 27, 2018

ktoso added a commit to ktoso/sbt-jmh that referenced this issue May 9, 2018

Add optional sampling interval parameter for Async profiler + switch…
… to latest sbt version (#148)

* Add optional sampling interval parameter for Async profiler to avoid issues like: jvm-profiling-tools/async-profiler#97

* Switch to latest sbt version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment