Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to fix #2648 #2665

Closed
wants to merge 15 commits into from
Closed

Changes to fix #2648 #2665

wants to merge 15 commits into from

Conversation

jason-chenyixi-chengdu
Copy link

@jason-chenyixi-chengdu jason-chenyixi-chengdu commented Dec 23, 2019

Hi,
This changes is for fix issue #2648 . Please help to review if it OK.
Main changes are b93e4a3 and e6b9572. Others are some testing in my forked repo.
Thanks!

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@yonghong-song
Copy link
Collaborator

[buildbot, ok to test]

src/cc/bcc_elf.c Outdated

gelf_getphdr(e, 0, &phdr);

base_address = phdr.p_vaddr-phdr.p_offset;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base_address is computed based on index 0 which is typically PHDR. Does this calculated base_address is always the same for the loaded segment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we need to calculate the base address with Load segement rather than any type of segment in PHDR?
I have checked all the segements in PHDR of some elf files, no matter what type of the segment is, base address is always p_vaddr-p_offset

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I mean:

-bash-4.4$ cat main.c
int main() { return 0; }
-bash-4.4$ gcc main.c
-bash-4.4$ readelf -l a.out

Elf file type is EXEC (Executable file)
Entry point 0x4003e0
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x000000000000069c 0x000000000000069c  R E    200000
  LOAD           0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
                 0x0000000000000214 0x0000000000000218  RW     200000

We probably care about the first LOAD segment, right? In this case, the virtual_address - offset is the same. I am just wondering whether this is true in general.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I am not sure if this is the same in all cases. So how about update the code with below part?

for (i = 0; i < phdr_num; i++) {
    if (!gelf_getphdr(e, (int)i, &phdr))
      continue;
    if ((phdr.p_type == PT_LOAD) && (phdr.p_flags & PF_X)) {
      base_address = phdr.p_vaddr-phdr.p_offset;
      break;
    }
  }

@@ -683,7 +683,7 @@ static int _find_sym(const char *symname, uint64_t addr, uint64_t,
void *payload) {
struct bcc_symbol *sym = (struct bcc_symbol *)payload;
if (!strcmp(sym->name, symname)) {
sym->offset = addr;
sym->offset = addr - sym->base_address;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palmtenor could you double check? Does this always for non-so binaries?

Copy link
Member

@palmtenor palmtenor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have bcc_elf_foreach_load_section. Could you use that to find the load section? I'm mainly concerned about binaries with more than one load sections, which we've seen before

@jason-chenyixi-chengdu
Copy link
Author

We already have bcc_elf_foreach_load_section. Could you use that to find the load section? I'm mainly concerned about binaries with more than one load sections, which we've seen before

@palmtenor Glad to know that, thanks! I will update the code and test it.

@jason-chenyixi-chengdu
Copy link
Author

@yonghong-song @palmtenor the code has been updated, bcc_elf_foreach_load_section is used as an interface to get the load segment and update base address in callback. Please help to review it.

@yonghong-song
Copy link
Collaborator

@jason-chenyixi-chengdu there are a few test failures, could you take a look? The change mostly look okay. @palmtenor please take a look as you are more familiar for bcc symbolization than me.

In the past, we have the following bug fix for shared library symbol lookup,

commit 5a1106c6c62c735ebbc7396e5a2fc9843592dc79
Author: Joel Fernandes <joelaf@google.com>
Date:   Sat Apr 7 10:43:15 2018 +0000

    bcc/syms: Fix shared library symbol lookup
    
    Shared library addresses needed to be mapped what the address is expected in the
    symbol table. The address offset of the running shared library may be different from
    the one in the SO binary file. So we have to map it correctly in order for symbol
    look up to work.
    
    Often the address and file offset are the same so it works, however in Android this
    is not the case a lot of times. Fix the issue by adjusting the offset with the
    file offset from the ELF.

This also tries to deal with the case where file_offset is different from virtual address.
But this is done during add_module phase and check sections instead of program headers. Not sure whether this applies to your case or not with a slight extension. It would be good to keep changes with such uncommon configuration minimum in bcc code base.

@jason-chenyixi-chengdu
Copy link
Author

@yonghong-song For the failed tests, i have no permission to access builbot, could you share with me here?
For the previous commit, yes, we can get base_address from module.elf_so_offset_ and module.elf_so_addr_. But a Module instance should be fetched or created first in function bcc_resolve_symname, I think logic may more complicated.

@yonghong-song
Copy link
Collaborator

In the pull request page (#2665), there is a
Some checks were not successful in red color, below it has a X default - Build finished ... Details. Click Details to check the detailed results. Some tests may be flaky, but uprobe test failure probably not flaky.

@jason-chenyixi-chengdu
Copy link
Author

But as i said, i have no permission to get the details, it was said " jason-chenyixi-chengdu is missing the Overall/Read permission"

@pchaigno
Copy link
Contributor

pchaigno commented Jan 3, 2020

@yonghong-song We're all blocked from seeing the build logs since at least August (cf. #2501).

@yonghong-song
Copy link
Collaborator

@jason-chenyixi-chengdu @pchaigno I am not aware of the issue. Let me see how we can resolve this. cc @drzaeus77

@yonghong-song
Copy link
Collaborator

The following tests failed for the current pull request, which may be worth to investigate:

ubuntu 16.04:
21: Test command: /home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/build/tests/wrapper.sh "py_uprobes" "sudo" "/home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/tests/python/test_uprobes.py"
21: Test timeout computed to be: 9.99988e+06
21: .EArena 0:
21: system bytes     =   19595264
21: in use bytes     =    2943536
21: Total (incl. mmap):
21: system bytes     =   20385792
21: in use bytes     =    3734064
21: max mmap regions =          7
21: max mmap bytes   =    2781184
21: .
21: ======================================================================
21: ERROR: test_simple_binary (__main__.TestUprobes)
21: ----------------------------------------------------------------------
21: Traceback (most recent call last):
21:   File "/home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/tests/python/test_uprobes.py", line 59, in test_simple_binary
21:     b.attach_uprobe(name="/usr/bin/python", sym="main", fn_name="count")
21:   File "/home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/build/src/python/bcc-python/bcc/__init__.py", line 1025, in attach_uprobe
21:     (path, addr) = BPF._check_path_symbol(name, sym, addr, pid, sym_off)
21:   File "/home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/build/src/python/bcc-python/bcc/__init__.py", line 758, in _check_path_symbol
21:     raise Exception("could not determine address of symbol %s" % symname)
21: Exception: could not determine address of symbol main
21: 
21: ----------------------------------------------------------------------
21: Ran 3 tests in 6.295s
21: 
21: FAILED (errors=1)
21: Failed
21/40 Test #21: py_uprobes .......................***Failed    6.34 sec
ubuntu 16.04:
29: Test command: /home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/build/tests/wrapper.sh "py_test_tools_smoke" "sudo" "/home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/tests/python/test_tools_smoke.py"
29: Test timeout computed to be: 9.99988e+06
29: Traceback (most recent call last):
29:   File "../../tools/argdist.py", line 712, in run
29:     self._main_loop()
29:   File "../../tools/argdist.py", line 702, in _main_loop
29:     exit()
29:   File "/usr/lib/python2.7/site.py", line 375, in __call__
29:     raise SystemExit(code)
29: SystemExit: None
29: .Traceback (most recent call last):
29:   File "../../tools/bashreadline.py", line 68, in <module>
29:     b.attach_uretprobe(name=name, sym="readline", fn_name="printret")
29:   File "/home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/build/src/python/bcc-python/bcc/__init__.py", line 1057, in attach_uretprobe
29:     (path, addr) = BPF._check_path_symbol(name, sym, addr, pid)
29:   File "/home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/build/src/python/bcc-python/bcc/__init__.py", line 758, in _check_path_symbol
29:     raise Exception("could not determine address of symbol %s" % symname)
29: Exception: could not determine address of symbol readline
29: F..'unknown': I need something more specific.
29: .'unknown': I need something more specific.
29: 'unknown': I need something more specific.
29: 'unknown': I need something more specific.
29: 'unknown': I need something more specific.
29: .......s.s....sss....'unknown': I need something more specific.
29: ......Killed
29: .ss.ss...sss..s.ss'unknown': I need something more specific.
29: .ss..s..s...s..'unknown': I need something more specific.
29: ...'unknown': I need something more specific.
29: .s...'unknown': I need something more specific.
29: Error in atexit._run_exitfuncs:
29: Traceback (most recent call last):
29:   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
29:     func(*targs, **kargs)
29:   File "/home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/build/src/python/bcc-python/bcc/__init__.py", line 1368, in cleanup
29:     os.close(fn.fd)
29: OSError: [Errno 9] Bad file descriptor
29: Error in sys.exitfunc:
29: Traceback (most recent call last):
29:   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
29:     func(*targs, **kargs)
29:   File "/home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/build/src/python/bcc-python/bcc/__init__.py", line 1368, in cleanup
29:     os.close(fn.fd)
29: OSError: [Errno 9] Bad file descriptor
29: ....s....
29: ======================================================================
29: FAIL: test_bashreadline (__main__.SmokeTests)
29: ----------------------------------------------------------------------
29: Traceback (most recent call last):
29:   File "/home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/tests/python/test_tools_smoke.py", line 73, in test_bashreadline
29:     self.run_with_int("bashreadline.py")
29:   File "/home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/tests/python/test_tools_smoke.py", line 52, in run_with_int
29:     or (rc == 137 and kill), "rc was %d" % rc)
29: AssertionError: rc was 1
29: 
29: ----------------------------------------------------------------------
29: Ran 82 tests in 193.092s
29: 
29: FAILED (failures=1, skipped=22)
29: Failed
29/40 Test #29: py_test_tools_smoke ..............***Failed  193.13 sec

bashreadline.py also failed on ubuntu 17.04.

@jason-chenyixi-chengdu
Copy link
Author

@yonghong-song , the two cases failed are because offset address is used as target physical address for executable binaries(eg. python, bash). I have updated the code to get the correct target address.

@yonghong-song
Copy link
Collaborator

The following tests failed on all fedora hosts.

         3 - test_libbcc (Failed)
	 31 - py_test_usdt (Failed)
	 32 - py_test_usdt2 (Failed)
	 33 - py_test_usdt3 (Failed)

For example, py_test_usdt,

31: ERROR: test_attach1 (__main__.TestUDST)
31: ----------------------------------------------------------------------
31: Traceback (most recent call last):
31:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc28/tests/python/test_usdt.py", line 150, in test_attach1
31:     b = BPF(text=self.bpf_text, usdt_contexts=[u], debug=4)
31:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc28/build/src/python/bcc-python/bcc/__init__.py", line 352, in __init__
31:     usdt_context.attach_uprobes(self)
31:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc28/build/src/python/bcc-python/bcc/usdt.py", line 194, in attach_uprobes
31:     addr=addr, pid=pid)
31:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc28/build/src/python/bcc-python/bcc/__init__.py", line 1025, in attach_uprobe
31:     (path, addr) = BPF._check_path_symbol(name, sym, addr, pid, sym_off)
31:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc28/build/src/python/bcc-python/bcc/__init__.py", line 758, in _check_path_symbol
31:     raise Exception("could not determine address of symbol %s" % symname)
31: Exception: could not determine address of symbol b''

Some address adjustment might be needed for usdt as well?

@yonghong-song
Copy link
Collaborator

[buildbot, ok to test]

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@yonghong-song
Copy link
Collaborator

py_test_usdt on fc28 failed as

31: Test command: /home/fedora/jenkins/workspace/bcc-pr/label/fc28/build/tests/wrapper.sh "py_test_usdt" "sudo" "/home/fedora/jenkins/workspace/bcc-pr/label/fc28/tests/python/test_usdt.py"
31: Test timeout computed to be: 10000000
......
31: /home/fedora/jenkins/workspace/bcc-pr/label/fc28/build/tests/wrapper.sh: line 38: 11799 Segmentation fault      sudo bash -c "PYTHONPATH=$PYTHONPATH LD_LIBRARY_PATH=$LD_LIBRARY_PATH $cmd $1 $2"
31: Failed

@jason-chenyixi-chengdu
Copy link
Author

It is strange... Could you help to get the dump file? There is no problem in my host

@yonghong-song
Copy link
Collaborator

I will take a look today or tomorrow.

@yonghong-song
Copy link
Collaborator

@palmtenor could you help check this patch as well?

@yonghong-song
Copy link
Collaborator

I did not have fc28 VM any more. I can reproduce the issue with fc31.
The core stack does not help as the application is python:

#0  0x00007fd5bd198e84 in PyObject_Malloc () from /lib64/libpython3.7m.so.1.0
Missing separate debuginfos, use: dnf debuginfo-install python-unversioned-command-3.7.6-1.fc31.noarch
(gdb) where
#0  0x00007fd5bd198e84 in PyObject_Malloc () from /lib64/libpython3.7m.so.1.0
#1  0x00007fd5bd1bcd54 in _PyLong_New () from /lib64/libpython3.7m.so.1.0
#2  0x00007fd5bd1bd4ca in PyLong_FromLong () from /lib64/libpython3.7m.so.1.0
#3  0x00007fd5afcd5e25 in _ctypes_callproc () from /usr/lib64/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so
#4  0x00007fd5afcd63fc in PyCFuncPtr_call () from /usr/lib64/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so
#5  0x00007fd5bd1eb57c in _PyObject_FastCallKeywords () from /lib64/libpython3.7m.so.1.0
#6  0x00007fd5bd1ecce9 in call_function () from /lib64/libpython3.7m.so.1.0
#7  0x00007fd5bd228182 in _PyEval_EvalFrameDefault () from /lib64/libpython3.7m.so.1.0
#8  0x00007fd5bd1d9b40 in _PyEval_EvalCodeWithName () from /lib64/libpython3.7m.so.1.0
#9  0x00007fd5bd1dab82 in _PyFunction_FastCallKeywords () from /lib64/libpython3.7m.so.1.0

please try fc31.

@jason-chenyixi-chengdu
Copy link
Author

jason-chenyixi-chengdu commented Jan 15, 2020

I have tried. This fault has been raised too. But this fault is still exsisting even i revert my change. And I found that the fault is raised from bcc_func_load in bpf_module. I have print some messages around.

#include <bcc/footer.h>
In attach_uprobe

Module name: /proc/47548/root/tmp/tmpWsTWrH
In check_path_symbol, after bcc_resolve_symname

Before free /proc/47548/root/proc/47548/root/tmp/tmpWsTWrH
After free
After BPF._check_path_symbol

Before lib.bcc_func_load

Segmentation fault (core dumped)

@jason-chenyixi-chengdu
Copy link
Author

After investgating on my host, the fault is caused by python initial file is not the latest version, so the interface bcc_func_load is not compatiable. After i update the python initial file, the test case passed. Could you help to check if this is the same reason on the testing host?

before lib.bpf_attach_uprobe

after lib.bpf_attach_uprobe

str81
str80
str83
str82
str85
str84
.
----------------------------------------------------------------------
Ran 1 test in 42.102s

OK

@yonghong-song
Copy link
Collaborator

@jason-chenyixi-chengdu The following is my setup.

In fc31, x64 server iso image and installed as VM, I am using vmware fusion, but virtualbox should be the same, I guess.

following the INSTALL.md instruction to install all packages, clone latest bcc and build it.
the test_usdt.py (just run sudo ./test_usdt.py after installation) is okay.

with this patch, the sudo ./test_usdt.py segfaults. I briefly tried but did not go through where exactly the segfault location. You may be going through this direction to double check.

@yonghong-song
Copy link
Collaborator

The test appears passing now. There are conflicts with current master branch. Could you rebase, squash and resubmit? Thanks!

@jason-chenyixi-chengdu
Copy link
Author

updated.pls help to check, thanks!

@yonghong-song
Copy link
Collaborator

@jason-chenyixi-chengdu you stilll have 15 commits and it still have conflicts with trunk. Maybe you can close this one and submit a new pull request with a single commit with making a reference to this pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants