Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Intermittent Segmentation fault Mutex MacOS #2943

Closed
gatkinso opened this issue Apr 6, 2017 · 14 comments
Closed

[C++] Intermittent Segmentation fault Mutex MacOS #2943

gatkinso opened this issue Apr 6, 2017 · 14 comments
Assignees
Labels

Comments

@gatkinso
Copy link

gatkinso commented Apr 6, 2017

Version 3.2.0
C++
MacOS Sierra

Intermittent Segmentation fault when statically linking to libprotobufd. Same code is fine on Linux and Windows.

Debugger output:

hostname:testfolder gat$ lldb ./foo
(lldb) target create "foo"
Current executable set to 'foo' (x86_64).
(lldb) run
Process 18781 launched: '/Users/gat/work/testfolder/foo' (x86_64)
Process 18781 stopped

  • thread Added const qualifier to iterator to enable compiling with VS2008 #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000102560d33 libbar.dylib`google::protobuf::internal::Mutex::Lock(this=0x0000000000000000) at common.cc:375
    372 }
    373
    374 void Mutex::Lock() {
    -> 375 int result = pthread_mutex_lock(&mInternal->mutex);
    376 if (result != 0) {
    377 GOOGLE_LOG(FATAL) << "pthread_mutex_lock: " << strerror(result);
    378 }
    (lldb)
@gatkinso
Copy link
Author

gatkinso commented Apr 6, 2017

PS Note that simply compiling generated C++ files into the shared library libbar.dylib (not instantiating anything) and dloading the library will cause the seg fault. Same issue if actually instantiating a message.
Not calling ShutdownProtobufLIbrary.

@gatkinso gatkinso changed the title Intermittent Segmentation fault Mutex MacOS [C++] Intermittent Segmentation fault Mutex MacOS Apr 6, 2017
@xfxyjwf
Copy link
Contributor

xfxyjwf commented Apr 6, 2017

Does "make check" work for you when you install protobuf? Can you provide the .proto file in question and an example .cc program that can reproduce the issue?

@gatkinso
Copy link
Author

gatkinso commented Apr 9, 2017

OK, this is part of a commercial application, and I have stripped out a lot of code, hence it may seem a bit random. Creating a dylib with these files however will cause the issue every time that dylib is dlopened on MacOS. Linux and Windows DLL's have no problem at all.

telemetry - Copy.zip

@xfxyjwf xfxyjwf added the c++ label Apr 10, 2017
@xfxyjwf xfxyjwf self-assigned this Apr 10, 2017
@JamesRootNineAgile
Copy link

JamesRootNineAgile commented May 22, 2017

I can reproduce this with Xcode 8.3 onwards but with 8.2 and earlier I don't suffer the problem.

For us the failure is permanent and we are using protobuf 2.6.0.

make check passes for protobuf 2.7.0 and 2.6.0 for me, that said it doesn't cover the situation where libprotobuf.a is statically linked into another dynamic library.

@pretty-wise
Copy link

I experience the same problem on Mac OS 10.12.5, XCode 8.3.3 (both AppleClang 8.1.0 and Apple LLVM version 8.1.0 (clang-802.0.42))

Both program and a .so link libprotobufd statically. The program executes InitShutdownFunctions() at startup. When .so is loaded it doesn't execute InitShutdownFunctions() which causes a crash when OnShutdown() is called from generated InitGeneratedPool() (see stack trace below).

Exception Caught at 0x0. Signal 11.
Stack Trace:
0 link_server 0x00000001087efd6d _ZN4Base13fault_handlerEiP9__siginfoPv + 93
1 libsystem_platform.dylib 0x00007fffbb0b6b3a _sigtramp + 26
2 ??? 0x00007f8f97501568 0x0 + 140254695658856
3 libgated.dylib 0x0000000108ba9783 _ZN6google8protobuf8internal9MutexLockC2EPNS1_5MutexE + 35
4 libgated.dylib 0x0000000108ba94ed _ZN6google8protobuf8internal9MutexLockC1EPNS1_5MutexE + 29
5 libgated.dylib 0x0000000108d0130a _ZN6google8protobuf8internal10OnShutdownEPFvvE + 42
6 libgated.dylib 0x0000000108ba6933 _ZN6google8protobuf12_GLOBAL__N_1L17InitGeneratedPoolEv + 147
7 link_server 0x000000010878ea7f _ZN6google8protobuf8internal16FunctionClosure03RunEv + 31
8 link_server 0x0000000108794d2d _ZN6google8protobuf18GoogleOnceInitImplEPlPNS0_7ClosureE + 93
9 link_server 0x000000010876d8a6 _ZN6google8protobuf14GoogleOnceInitEPlPFvvE + 70
10 libgated.dylib 0x0000000108b4ec47 _ZN6google8protobuf12_GLOBAL__N_121InitGeneratedPoolOnceEv + 23
11 libgated.dylib 0x0000000108b4ec84 _ZN6google8protobuf14DescriptorPool24InternalAddGeneratedFileEPKvi + 20
12 libgated.dylib 0x0000000108b199ea _ZN4gate21protobuf_gate_2eproto18AddDescriptorsImplEv + 26
13 link_server 0x000000010878ea7f _ZN6google8protobuf8internal16FunctionClosure03RunEv + 31
14 link_server 0x0000000108794d2d _ZN6google8protobuf18GoogleOnceInitImplEPlPNS0_7ClosureE + 93
15 link_server 0x000000010876d8a6 _ZN6google8protobuf14GoogleOnceInitEPlPFvvE + 70
16 libgated.dylib 0x0000000108b19a57 _ZN4gate21protobuf_gate_2eproto14AddDescriptorsEv + 23
17 libgated.dylib 0x0000000108b27091 _ZN4gate21protobuf_gate_2eproto27StaticDescriptorInitializerC2Ev + 17
18 libgated.dylib 0x0000000108b19a75 _ZN4gate21protobuf_gate_2eproto27StaticDescriptorInitializerC1Ev + 21
19 libgated.dylib 0x0000000108b27fb0 __cxx_global_var_init + 16
20 libgated.dylib 0x0000000108b27fc9 _GLOBAL__sub_I_gate.pb.cc + 9
21 ??? 0x000000010919da1b 0x0 + 4447656475
22 ??? 0x000000010919dc1e 0x0 + 4447656990
23 ??? 0x00000001091994aa 0x0 + 4447638698
24 ??? 0x0000000109198524 0x0 + 4447634724
25 ??? 0x00000001091985b9 0x0 + 4447634873
26 ??? 0x000000010918d7cd 0x0 + 4447590349
27 ??? 0x00000001091953ec 0x0 + 4447622124
28 libdyld.dylib 0x00007fffbaea4832 dlopen + 59
29 link_server 0x00000001087ef656 _ZN4Base12SharedObject4OpenEPKc + 118
30 link_server 0x00000001085a75e7 _ZN4Link6Plugin4LoadEPKcS2_i + 71
31 link_server 0x00000001085b03b7 _ZN4Link13PluginManager4LoadEPKcS2_i + 167

@gatkinso
Copy link
Author

gatkinso commented Nov 23, 2017

Has there been any resolution or work arounds for this defect? I just reproduced it in version 3.5.0 running on High Sierra. I encounter this immediately when loading a library statically linked to libprotobuf.

Here is what is causing the crash. No idea what the root cause is.

Process 11289 launched: '/Users/gatkinso/work/test/mytest' (x86_64)
Process 11289 stopped

  • thread Added const qualifier to iterator to enable compiling with VS2008 #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: 0x000000010256b36f libmylibrary.dylib`::_cxx_global_var_init() at mytest.cpp:23
    20
    21 context
    ctx;
    22
    -> 23 static myname::proto::MyProto myproto;
    24
    25 int initialize()
    26 {
    Target 0: (mytest) stopped.
    (lldb) b common.cc:464
    Breakpoint 3: 2 locations.
    (lldb) cont
    Process 11289 resuming
    Process 11289 stopped
  • thread Added const qualifier to iterator to enable compiling with VS2008 #1, queue = 'com.apple.main-thread', stop reason = breakpoint 3.2
    frame #0: 0x00000001025b37fe libmylibrary.dylib`google::protobuf::internal::OnShutdownDestroyMessage(ptr=0x0000000102849760) at common.cc:464
    461
    462 void OnShutdownDestroyMessage(const void* ptr) {
    463 InitShutdownFunctionsOnce();
    -> 464 MutexLock lock(&shutdown_data->mutex);
    465 shutdown_data->messages.push_back(static_cast<const MessageLite*>(ptr));
    466 }
    467
    Target 0: (mytest) stopped.
    (lldb) print shutdown_data
    (google::protobuf::internal::ShutdownData *) $5 = 0x0000000000000000
    (lldb) n
    Process 11289 stopped
  • thread Added const qualifier to iterator to enable compiling with VS2008 #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x48)
    frame #0: 0x00000001025b3363 libmylibrary.dylib`google::protobuf::internal::Mutex::Lock(this=0x0000000000000048) at common.cc:376
    373 }
    374
    375 void Mutex::Lock() {
    -> 376 int result = pthread_mutex_lock(&mInternal->mutex);
    377 if (result != 0) {
    378 GOOGLE_LOG(FATAL) << "pthread_mutex_lock: " << strerror(result);
    379 }
    Target 0: (mytest) stopped

@acozzette acozzette added this to Backlog in Fixit Q4`17 (P2 Bugs) via automation Dec 11, 2017
@gerben-s
Copy link
Contributor

Are you sure the proto library is only linked in once? This seems like the proto lib is included twice in the final executable. Resulting in weird behavior like this.

@xfxyjwf
Copy link
Contributor

xfxyjwf commented Dec 13, 2017

After a few tries I was able to reproduce the segfault on my macosx laptop.

These don't segfault and will work:

  1. The main binary doesn't link with protobuf; the .dylib static linked with libprotobuf.a.
  2. The main binary doesn't link with protobuf; the .dylib is dynamically linked with libprotobuf.a.
  3. The main binary is dynamically linked with protobuf; the .dylib is also dynamically linked with protobuf

These will segfault:

  1. The main binary is static linked with protobuf; the .dylib is also static linked with libprotobuf.a
  2. The main binary is static linked with protobuf; the .dylib is dynamically linked with libprotobuf.a
  3. The main binary is dynamically linked with protobuf; the .dylib is static linked with libprotobuf.a

The stack trace I got for the crash case looks exactly the same as reported here. So if you get the same stack trace, you are likely linking in more than one copy of protobuf and the only working solution is to dynamically link with protobuf everywhere.

@xfxyjwf xfxyjwf closed this as completed Dec 13, 2017
Fixit Q4`17 (P2 Bugs) automation moved this from Backlog to Done Dec 13, 2017
@gatkinso
Copy link
Author

The initialization code is sloppy beyond belief, yet this is what passes as a fix? Fine. I'll fix it myself.

@xfxyjwf
Copy link
Contributor

xfxyjwf commented Dec 13, 2017

There is no fix for this problem as we have no plan to support double linking protobuf.

@pretty-wise
Copy link

To work around this problem I made void InitShutdownFunctionsOnce() (in src/google/stubs/common.cc) not inlined.

I hope it helps!

@gatkinso
Copy link
Author

gatkinso commented Dec 13, 2017

Pretty-wise that is basically the approach I am looking at as well.

However for me it is not a complete solution. The issue for me is that I am writing an application that can load plugins developed by third parties. Since I statically link to libprotobuf... this will prevent any third parties from doing likewise (which is how this defect was discovered to begin with, and shipping a shared library along with a plugin is a can of worms nobody wants to open).

Do I have to have third parties fork PB as well in order to get it to work?

Well, I guess the caveat "Do not use Google Protocol Buffers in your plugin" has to go in Doxygen. But at least I can provide this link to explain why - which basically boils down to "they don't feel like fixing it."

@pretty-wise
Copy link

pretty-wise commented Dec 13, 2017

That's my feeling too. In my experience protobufs is supporting static linking to multiple libraries. It looks like MacOS specific issue to me. I am building a similar application except I develop all the plugins myself. I am testing it on MacOS and CentOS and everything is working well on CentOS - only MacOS has the problem.

@gatkinso
Copy link
Author

Yup - Windows and various flavors of Linux seem to be fine.... and the 2.x lineage didn't seem to have this issue. Oh well, what's done is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

5 participants