New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A child process may hang when a parent process calls fork(). #178

Closed
alk opened this Issue Aug 22, 2015 · 5 comments

Comments

Projects
None yet
1 participant
@alk
Contributor

alk commented Aug 22, 2015

Originally reported on Google Code with ID 175

What steps will reproduce the problem?
1.A parent process is multi-threaded.
2. One of parent thread calls java_lang_UNIXProcess_forkAndExec, while
other thread is calling malloc().
3. The child process hangs since the lock is copied over.

What is the expected output? What do you see instead?
The child process hangs.

What version of the product are you using? On what operating system?
1.2 on Linux RHEL4

Please provide any additional information below.
This happens because java_lang_UNIXProcess_forkAndExec calls opendir()
between fork() and exec(). The lock should be acquired by the calling 
thread before for(), and be released after fork() by both parent and child.

Reported by hankyupark on 2009-10-05 18:04:29

@alk

This comment has been minimized.

Contributor

alk commented Aug 22, 2015

} 3. The child process hangs since the lock is copied over.

Can you clarify what lock you're talking about?  I take it it's the lock called by
malloc()?

I don't know much about the semantics of lock inheritence around forking.  My
understanding of how to make fork safe to use is that between fork and exec you can't
a) call any non-reentrant library functions, b) call a function that may acquire a
lock.  That's pretty restrictive.  I'm not surprised doing something like opendir()
there may cause problems.  In general, I do know that forking is super-hard to get
right, so it's not surprising to me that there should be problems like this.

I'm not sure I understand your proposed solution though. It's the client app that's
calling fork().  How is it supposed to acquire the proper locks, and then release
them?  Are you suggesting we provide an API to do that?  It seems pretty complicated,
if so.  Or maybe you had another proposal?

Reported by csilvers on 2009-10-13 21:20:15

  • Labels added: Type-Defect, Priority-Medium
@alk

This comment has been minimized.

Contributor

alk commented Aug 22, 2015

When tcmalloc tries to get a memory chunk from system memory, it acquires a (global)
lock so that it can exclusively access system memory. 
Let's assume there are thread A and B in the parent process, and thread A calls
java_lang_UNIXProcess_forkAndExec while thread B was holding this lock. Inside of
java_lang_UNIXProcess_forkAndExec, the child process will get a clone of the parent
thread (A). Since thread B was holding the lock, this state is also inherited to the
child process. When opendir() is called inside of
java_lang_UNIXProcess_forkAndExec(), the child process gets stuck since there is
nobody that will release the lock in the child process. This is not a problem for the
parent process since thread B will release it eventually for the parent process.

In the pthread, there is an api called pthread_atfork which provides handles
before/after fork(). My solution was to call this function with proper handlers when
tcmalloc is initialized. It seems to be working for linux system but I am not sure
if
it can be generalized for other systems.

Reported by hankyupark on 2009-10-14 06:16:30

@alk

This comment has been minimized.

Contributor

alk commented Aug 22, 2015

Ah, I see.  I'm worried, though, that this lock isn't the only one we need to keep
track of.  tcmalloc has a lot of locks, including a pageheap lock, a new-handler
lock, heap-checker locks, and at least one profiler lock.  The debug allocator adds
a
few more, and more may be added in the future.  It seems like each one of these could
cause a problem in the situation you describe: thread A holds the lock while thread
B
forks.  I guess we'll need to keep track of all of them?

I haven't thought a lot about the fork-safety of tcmalloc.  I wonder if there is
anything other than spinlocks that we need to worry about.

It sounds like you have a patch that solves the problem for you?  Do you mind
attaching it to this bug report?

Reported by csilvers on 2009-10-14 14:41:56

@alk

This comment has been minimized.

Contributor

alk commented Aug 22, 2015

I talked to one of the experts here on forking, and he pointed out that it's actually
correct for both parent and child to have the fork: the system memory is a shared
resource, and they can't both be accessing it at the same time.  The solution you use
with pthread_atfork doesn't seem safe.

He had this to say in general:
---
Traditionally, POSIX only ever allowed exec() to be called after fork().
Calling any other function after fork() but before exec() was undefined.

Over time, users discovered that on most systems there were a small number
of functions that they could still call in between fork() and exec() despite
POSIX warning them not to do so. I don't think this list of functions has
ever been formally specified, but it is generally believed that any
async-signal-safe function can be called at this time. And I expect most
runtime environments (including glibc) to try hard to actually guarantee
this behavior.

Please note that opendir() and malloc() are most definitely not
async-signal-safe and still cannot be called safely at this time.

For single-threaded applications, you can sometimes get away with calling
more complex functions after fork(), but this has always been error prone
and would often lead to subtle bugs.

A long time later, POSIX grew a threading API. And it is very clearly an
afterthought. All sorts of things don't quite work correctly once more than
one thread exists in the program. Most notably, calling fork() in a
multi-threaded application is poorly defined and surprisingly difficult to
get right.

Glibc tries hard to make this work. And POSIX made an attempt to fix it by
defining pthread_at_fork() handlers. But all of these are bandaid solutions
that don't work well practice. Apart from the problem with locks, there are
a whole slew of other subtle issues with shared resources that a very
difficult to fix and cause all sorts of very subtle bugs. A particularly
nasty problem are all the resources that are shared between threads and
inherited by child processes (e.g. guaranteeing that file descriptors get
passed correctly in all cases is quite tricky). The upshot is:

 - don't fork() after creating threads. It doesn't work well. If you know
that your application needs to fork(), then create a helper process as the
very first thing in main(). This helper process should stay single-threaded
and can thus fork() more easily.

 - if you cannot use a helper process, then stick to letter of the POSIX
specification. Do not make any function calls other than exec() after
calling fork(). In a multi-threaded application, even the async-signal-safe
functions cannot reliably be called after fork()'ing.

 - alternatively, have a look at posix_spawn(). It's API is somewhat
restricted, but it is often a thread-safe alternative to fork() and exec().
I have no idea how good the implementation in glibc is, though.
---

I don't know if you can do any of these for this java_lang_* call, but I think your
code will be safer in general if you can rewrite it so it doesn't do any work between
the fork and exec.

I'm going to close this WillNotFix.  I think it would be dangerous to do any
unlocking across forks.

Reported by csilvers on 2009-10-14 18:44:23

  • Status changed: WontFix
@alk

This comment has been minimized.

Contributor

alk commented Aug 22, 2015

What I was dealing with was a (global) lock from tcmalloc, not a system-provided lock
to get a memory from system. I think that the latter is already handled by OS.

The problem is that I don't have a control over java_lang_* methods which is a part
of JDK. And this method is called indirectly from my application via an underneath
library. So I don't have any control on timing, either.



I am attaching my patch, anyway.

diff -crBN ../google-perftools-1.2/Makefile.am google-perftools-1.2/Makefile.am
*** ../google-perftools-1.2/Makefile.am Fri Apr 17 19:37:09 2009
--- google-perftools-1.2/Makefile.am    Fri Oct  2 04:45:18 2009
***************
*** 309,314 ****
--- 309,315 ----
                                src/page_heap_allocator.h \
                                src/span.h \
                                src/static_vars.h \
+                               src/fork_handler.h \
                                src/thread_cache.h \
                                src/stack_trace_table.h \
                                src/base/thread_annotations.h \
***************
*** 338,343 ****
--- 339,345 ----
                                            src/span.cc \
                                            src/stack_trace_table.cc \
                                            src/static_vars.cc \
+                                           src/fork_handler.cc \
                                            src/thread_cache.cc \
                                            src/malloc_hook.cc \
                                            src/malloc_extension.cc \
diff -crBN ../google-perftools-1.2/src/fork_handler.cc
google-perftools-1.2/src/fork_handler.cc
*** ../google-perftools-1.2/src/fork_handler.cc Thu Jan  1 00:00:00 1970
--- google-perftools-1.2/src/fork_handler.cc    Fri Oct  2 18:51:04 2009
***************
*** 0 ****
--- 1,24 ----
+ #include "fork_handler.h"
+ #include <pthread.h>
+ 
+ namespace tcmalloc {
+ 
+ void prepare()
+ {
+   Static::pageheap_lock()->Lock();
+ }
+ void fork_parent()
+ {
+   Static::pageheap_lock()->Unlock();
+ }
+ void fork_child()
+ {
+   Static::pageheap_lock()->Unlock();
+ }
+ 
+ void Init_forkHandler()
+ {
+   pthread_atfork(prepare, fork_parent, fork_child);
+ }
+ 
+ }
diff -crBN ../google-perftools-1.2/src/fork_handler.h
google-perftools-1.2/src/fork_handler.h
*** ../google-perftools-1.2/src/fork_handler.h  Thu Jan  1 00:00:00 1970
--- google-perftools-1.2/src/fork_handler.h Fri Oct  2 04:45:12 2009
***************
*** 0 ****
--- 1,12 ----
+ #ifndef TCMALLOC_FORK_HANDLERS_H_
+ #define TCMALLOC_FORK_HANDLERS_H_
+ 
+ #include "static_vars.h"
+ 
+ namespace tcmalloc {
+ 
+ void Init_forkHandler();
+ 
+ }
+ 
+ #endif
diff -crBN ../google-perftools-1.2/src/static_vars.cc
google-perftools-1.2/src/static_vars.cc
*** ../google-perftools-1.2/src/static_vars.cc  Thu Feb 26 19:16:18 2009
--- google-perftools-1.2/src/static_vars.cc Fri Oct  2 04:46:11 2009
***************
*** 32,37 ****
--- 32,38 ----

  #include "static_vars.h"
  #include "sampler.h"  // for the init function
+ #include "fork_handler.h"

  namespace tcmalloc {

***************
*** 60,65 ****
--- 61,67 ----
    new ((void*)pageheap_memory_) PageHeap;
    DLL_Init(&sampled_objects_);
    Sampler::InitStatics();
+   Init_forkHandler();
  }

  }  // namespace tcmalloc

Reported by hankyupark on 2009-10-14 19:29:47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment