Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix symbol resolution across mount namespaces (#2029) #2030

Merged
merged 1 commit into from
Nov 8, 2018

Conversation

vijunag
Copy link
Contributor

@vijunag vijunag commented Nov 3, 2018

  • Symbol lookup fails when the process is running on a different namespace.
  • ProcSyms already has the capability to switch mount namespaces to resolve symbols.
  • Identified bug where the BPF doesn't migrate to process mount namespace before opening the symbol files and fails to locate the file.
  • Patch makes sure that 'ProcMountNSGuard g(ps->mount_ns_instance_.get())' is created every single time a text section for the process is loaded.
  • Patch was tested by 1) creating docker container and running the process 2) running "bcc/tools/profile" script on the global PID namespace of the process

Test - Performed a test by running bcc/tools/profile on a PID running in a container and found the resolution to be working fine.
`#define _GNU_SOURCE
#include <link.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/auxv.h>
#include <string.h>

int len;
int baz()
{
int i=0;
char a[8192], b[8192];

label:
for(i=0;i<10000;++i) {
memcpy(a,b,len);
}

goto label;
}

int foo()
{
baz();
}

int bar()
{
foo();
}

int main(int argc, char **argv)
{
printf("Start profiling\n");
len=8192;
bar();
return 1;
}`

python profile -U -p 26966 1
Sampling at 49 Hertz of PID 26966 by user stack for 1 secs.
warning: JITed object file architecture unknown is not compatible with target architecture i386:x86-64.

[unknown]
foo
bar
main
__libc_start_main
-                sampleme (26966)
    1

baz
foo
bar
main
__libc_start_main
-                sampleme (26966)

std::string exe = ebpf::get_pid_exe(pid_);
Module module(exe.c_str(), mount_ns_instance_.get(), &symbol_option_);

if (module.type_ != ModuleType::EXEC)
return;

ProcMountNSGuard g(mount_ns_instance_.get());

bcc_elf_foreach_load_section(exe.c_str(), &_add_load_sections, &module);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to move to mount namespace before accessing the file /proc/pid/exe file.

@@ -163,6 +163,7 @@ int ProcSyms::_add_module(const char *modname, uint64_t start, uint64_t end,
// It only gives the mmap offset. We need the real offset for symbol
// lookup.
if (module.type_ == ModuleType::SO) {
ProcMountNSGuard g(ps->mount_ns_instance_.get());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Migrate to task's mount namespace before accessing the symbol file.

@yonghong-song
Copy link
Collaborator

@tekumara could you check whether this fixed your issue (#1990)?
@palmtenor could you comment on the change as well?

@tekumara
Copy link
Contributor

tekumara commented Nov 5, 2018

Using this PR, I run bcc inside a container to debug another container, and I don't get the warnings! Thank you! 🙇
However I still get [unknown]s in the stack trace (see below). Should these be resolved?

host $ sudo docker run -it --privileged   -v /sys/kernel/debug:/sys/kernel/debug:rw -v /lib/modules:/lib/modules:ro   -v /usr/src:/usr/src:ro   -v /etc/localtime:/etc/localtime:ro   --workdir /usr/share/bcc/tools   --pid=host   bcc
root@22de24248ccc:/usr/share/bcc/tools# /usr/share/bcc/tools/trace 'SyS_write (arg1==1) "%s", arg2' -U -p `pidof java`
PID     TID     COMM            FUNC             -
5573    27047   java            SyS_write        2018-11-05 12:33:05,094 [info] application-akka.actor.default-dispatche
        __GI___libc_write+0x2d [libpthread-2.24.so]
        [unknown] [libjava.so]
        Java_java_io_FileOutputStream_writeBytes+0x1a [libjava.so]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown]
        [unknown] [libjvm.so]
        [unknown] [libjvm.so]
        [unknown] [libjvm.so]
        [unknown] [libjvm.so]
        [unknown] [libjvm.so]
        [unknown] [libjvm.so]
        [unknown] [libjvm.so]
        start_thread+0xc4 [libpthread-2.24.so]

@vijunag
Copy link
Contributor Author

vijunag commented Nov 5, 2018

@tekumara - You don't have to run bcc inside of a container. Can you try profiling the process using the default namespace pid and check if it works ?

@tekumara
Copy link
Contributor

tekumara commented Nov 6, 2018

@vijunag do you mean running bcc from the host? I've tried running bcc (compiled with this PR) on the host (ie: not in a container) and it produces the same output as above.

@vijunag
Copy link
Contributor Author

vijunag commented Nov 6, 2018

@tekumara - Are your Java shared objects compiled with frame pointers ? If no, then the problem is not related to this issue.

@tekumara
Copy link
Contributor

tekumara commented Nov 7, 2018

@vijunag I've installed the jdk8 debug symbols into the container and now I'm not getting [unknown] [libjvm.so] or [unknown] [libjava.so] lines.

I've also created a java perf map and am now getting Interpreter+0x24f0 [perf-8.map] instead of [unknown]. After letting the JVM warmup and regenerating the perf map these become correct
java stack lines.

TL;DR - all good, thankyou!

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@yonghong-song
Copy link
Collaborator

@palmtenor any opinion? Will this work for the case to trace a process running inside the container but user provided a host file path?

@palmtenor
Copy link
Member

Yeah this LGTM. Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants