Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mac shlib dependencies fixes #1891

Closed
wants to merge 3 commits into from
Closed

Conversation

ays7
Copy link
Contributor

@ays7 ays7 commented Dec 29, 2022

cmake tweaks to get shlibs to link on Mac with OTLP ON

@ays7 ays7 requested a review from a team as a code owner December 29, 2022 14:18
@codecov
Copy link

codecov bot commented Dec 29, 2022

Codecov Report

Merging #1891 (9665120) into main (32af352) will not change coverage.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1891   +/-   ##
=======================================
  Coverage   87.12%   87.12%           
=======================================
  Files         165      165           
  Lines        4596     4596           
=======================================
  Hits         4004     4004           
  Misses        592      592           

Copy link
Member

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one target (opentelemetry_exporter_otlp_grpc_client) can link gRPC::grpc++, or there may be symbols mixing/overlapping problem.

exporters/otlp/CMakeLists.txt Show resolved Hide resolved
exporters/otlp/CMakeLists.txt Show resolved Hide resolved

target_link_libraries(opentelemetry_exporter_otlp_grpc_client
PRIVATE gRPC::grpc++)
PUBLIC gRPC::grpc++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this to PUBLIC link here?This should be PRIVATE link or all targets depend opentelemetry_exporter_otlp_grpc_client will link gRPC::grpc++, which will cause the problem in #1603 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, struggling to reproduce #1603...
Before spending much time on it, let me guess: Is this about linking against static gRPC libs by a chance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, struggling to reproduce #1603... Before spending much time on it, let me guess: Is this about linking against static gRPC libs by a chance?

This BUG can be repreduce when compiling gRPC as static library and compiling otel-cpp as dynamic library on Linux(With ELF ABI).
In this situation, the global variable will shared the same address between dynamic libraries but the initialization guards is private, which will construct these global variables more than onece.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ays7 - Do you have further comments on above discussion ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will try to look into static gRPC linking soon -- marked PR as WIP for now

@ays7 ays7 changed the title mac shlib dependencies fixes WIP: mac shlib dependencies fixes Jan 30, 2023
@ays7
Copy link
Contributor Author

ays7 commented Feb 4, 2023

@owent I was able to reproduce #1603 with static gRPC libraries, however the fact that multiple opentelemetry libraries were linked against gRPC turns out to be a red-herring, the real culprit is OTEL's grpc tracer harness was defined in both libopentelemetry_exporter_otlp_grpc & libopentelemetry_exporter_otlp_grpc_log:

[root@b8ef01aee8b5 otlp]# nm libopentelemetry_exporter_otlp_grpc_log.so | c++filt | grep grpc_tracer_init
0000000000269570 T grpc_tracer_init(char const*)
00000000002693c0 T grpc_tracer_init()
0000000000188b03 t grpc_tracer_init() [clone .cold.4]
[root@b8ef01aee8b5 otlp]# nm libopentelemetry_exporter_otlp_grpc.so | c++filt | grep grpc_tracer_init
00000000002696f0 T grpc_tracer_init(char const*)
0000000000269540 T grpc_tracer_init()
0000000000188c83 t grpc_tracer_init() [clone .cold.4]

That was later resolved with move of common code into libopentelemetry_exporter_otlp_grpc_client

[root@b8ef01aee8b5 otlp]# nm libopentelemetry_exporter_otlp_grpc.so | c++filt | grep grpc_tracer_init
[root@b8ef01aee8b5 otlp]# nm libopentelemetry_exporter_otlp_grpc_log.so | c++filt | grep grpc_tracer_init
[root@b8ef01aee8b5 otlp]# nm libopentelemetry_exporter_otlp_grpc_client.so | c++filt | grep grpc_tracer_init
0000000000245b70 T grpc_tracer_init(char const*)
00000000002459c0 T grpc_tracer_init()
000000000017626f t grpc_tracer_init() [clone .cold.4]

That said, you right, gRPC library link[s] better be kept private

@ays7 ays7 requested a review from owent February 4, 2023 19:01
@ays7 ays7 changed the title WIP: mac shlib dependencies fixes mac shlib dependencies fixes Feb 4, 2023
PUBLIC opentelemetry_otlp_recordable
opentelemetry_exporter_otlp_grpc_client)
PUBLIC opentelemetry_otlp_recordable opentelemetry_exporter_otlp_grpc_client
PRIVATE gRPC::grpc++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we link gRPC::grpc++ into opentelemetry_exporter_otlp_grpc, opentelemetry_exporter_otlp_grpc_log and opentelemetry_exporter_otlp_grpc_metrics? If we build otel-cpp as static libraries, they will link it just by link opentelemetry_exporter_otlp_grpc_client, and if we build otel-cpp as dynamic libraries, I think the global variables of gRPC are both in opentelemetry_exporter_otlp_grpc_client, opentelemetry_exporter_otlp_grpc, opentelemetry_exporter_otlp_grpc_log and opentelemetry_exporter_otlp_grpc_metrics, the symbol overlap problem still exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ays7 - Can you reply to @owent concerns here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owent gRPC::grpc++ must be linked either directly or via shared dependency into all libs that make direct gRPC API calls to satisfy Mac shared library linker -- Mac shared libraries are linked like executables on linux -- all referenced symbols must be resolved to somewhere.
One way to achieve this is by exposing gPRC as a public dependency of opentelemetry_exporter_otlp_grpc_client, which is a common dependency of the 3 libs you mentioned -- that was my original suggestion.

For the static linking: here is my understanding: when linking shared libs on linux, linker first attempts to resolve referenced symbols to one of linked in shared libraries & if that fails to one of linked in static libraries -- in a later case code from the static library is copied into the output.

With that in mind, in the weird scenario of static gRPC/dynamic OTEL libraries mix, what happens is - bulk of gPRC library code is embedded into opentelemetry_exporter_otlp_grpc_client and all other otlp libs [which are dependent on _client] reference gRPC code from the _client .so and not explicitly linked static gRPC library.

There is a potential issue if, say, both opentelemetry_exporter_otlp_grpc_log & opentelemetry_exporter_otlp_grpc_metrics call certain function that is not called inside opentelemetry_exporter_otlp_grpc_client -- in that case both _log & _metrics may get a copy of some gRPC lib code. Fortunately there are no gRPC symbols like that at this point, but maybe it is indeed safer to declare gRPC lib as common PUBLIC dependency as I originally suggested?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not link static gRPC into more than one dynamic libraries and executable. There will be more than one copy of some global variables in gRPC and some of them will be initialized more than once and some will not be initialized for ever.So the solution is just link gRPC into one dynamic library, that is opentelemetry_exporter_otlp_grpc_client, other libraries should only use the API from opentelemetry_exporter_otlp_grpc_client and can not use gRPC APIs directly.
Here is a simple sample to reproduce the problem in gRPC.

a.h

struct foo {
  int bar;
  foo();
  ~foo();

  void print(const char*);

  static foo _;
};

void print_static_global(const char*);

a.cpp

#include "a.h"

#include <iostream>
#include <thread>
#include <chrono>
#include <memory>

struct bar {
  foo* ptr = nullptr;
};

foo foo::_;
static bar s_;

foo::foo(): bar(127) {
    s_.ptr = this;
    std::cout<< "construct "<< this<< std::endl;
}

foo::~foo() {
    std::cout<< "destroy "<< this<< std::endl;
}

void foo::print(const char* prefix) {
    std::cout<< prefix<< "-foo: "<< this<< ": "<< bar<< std::endl;
}

void print_static_global(const char* prefix) {
    foo::_.print(prefix);
    std::cout<< prefix<< "-piblic API bar: "<< s_.ptr<< std::endl;
}

std::shared_ptr<std::thread> g_t(new std::thread([]() {
  std::this_thread::sleep_for(std::chrono::seconds{1});
  std::cout<< "internal API bar: "<< s_.ptr<< std::endl;
}), [](std::thread* thd) {
  thd->join();
  delete thd;
});

b.cpp

#include "a.h"

void dll_func_b() {
    print_static_global("b");
}

c.cpp

#include "a.h"

void dll_func_b();

int main() {
  print_static_global("c");
  dll_func_b();
  return 0;
}

Compile and run

[owent@VM-144-59-centos test]$ clang++ a.cpp -o libtest_a.a -c -fPIC -pthread 
[owent@VM-144-59-centos test]$ clang++ b.cpp -o libtest_b.so -shared -fPIC -L$PWD -ltest_a -pthread
[owent@VM-144-59-centos test]$ clang++ c.cpp -fPIC -L$PWD -ltest_b -ltest_a '-Wl,-rpath=$ORIGIN' -pthread
[owent@VM-144-59-centos test]$ ./a.out 
[owent@VM-144-59-centos test]$ ./a.out 
construct 0x55af97755338
construct 0x55af97755338
c-foo: 0x55af97755338: 127
c-piblic API bar: 0x55af97755338
b-foo: 0x55af97755338: 127
b-piblic API bar: 0x55af97755338
internal API bar: (nil)
internal API bar: 0x55af97755338
destroy 0x55af97755338
destroy 0x55af97755338
[owent@VM-144-59-centos test]$

$ORIGIN may be replaced by @loader_path on macOS.

As you see, the global variable 0x55af97755338 is constructed twice and destroyed twice,And the internal API use different static bar s_;. I didn't test it on macOS but i guess it has the same problem.It's more complicated in gRPC and it will make gRPC do not work in some dynamic libraries.

Actually, if we set gRPC as private dependency of opentelemetry_exporter_otlp_grpc_client, and when we build opentelemetry_exporter_otlp_grpc_client into a static library. All the targets will links gRPC automatically by just link opentelemetry_exporter_otlp_grpc_client.

Copy link
Contributor Author

@ays7 ays7 Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owent Thanks for the test - I will play with it soon.
In a mean time here is another idea: anything particularly wrong with combining all _otlp_grpc_ libraries into a single opentelemetry_exporter_otlp_grpc library?

And one more thing: if static gRPC library can only be linked into one DLL then any application that use gRPC [either directly or indirectly via, say, third party dependency] will be in conflict with otlp_grpc libraries. Which logically means that use of static gRPC libraries is a bad idea. Right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owent Thanks for the test - I will play with it soon. In a mean time here is another idea: anything particularly wrong with combining all _otlp_grpc_ libraries into a single opentelemetry_exporter_otlp_grpc library?

I prefer spilt modules one by one personally, the build system may built all libraries but most users will use only part of them.

It will cost more memory and CPU if we pack all codes into one library, we have a project which will cost about 8+GB memory to link every executable in this project and it's easy OOM when we link them concurrency.We have done many jobs to split dependencies into small libraries to solve this problem.

And one more thing: if static gRPC library can only be linked into one DLL then any application that use gRPC [either directly or indirectly via, say, third party dependency] will be in conflict with otlp_grpc libraries. Which logically means that use of static gRPC libraries is a bad idea. Right?

Yes, in my understanding, it's safer to use static or dynamic libraries for all dependencies, not mixed. There were issues report this problem when only use otlp-cpp, so I think the only thing we can do is keep it right when users only use and link otlp-cpp’s SDK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ays7 - Do you have any questions on above comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lalitb Sorry, I don't think I see a way out: static linking gRPC is a bad idea one way or another. IMHO.
Looks like @owent is exploring some alternatives with PR #2005 -- perhaps this PR can be abandoned [as long as #2005 fixes mac build too]

@esigo
Copy link
Member

esigo commented Mar 27, 2023

closing as discussed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants