Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread debugging is not available for relocatable scylla package #4673

Closed
tgrabiec opened this issue Jul 8, 2019 · 37 comments · Fixed by #4863
Closed

Thread debugging is not available for relocatable scylla package #4673

tgrabiec opened this issue Jul 8, 2019 · 37 comments · Fixed by #4863
Assignees
Labels
Milestone

Comments

@tgrabiec
Copy link
Contributor

tgrabiec commented Jul 8, 2019

Scylla version: 3.1

Without thread debugging most of the scylla-gdb.py commands won't work. Thread-locals can't be read.

I guess this is due to /usr/lib/debug/opt/scylladb/libreloc/libthread_db.so.1, which matches the /opt/scylladb/bin/../libreloc/libpthread.so.0, being missing.

GDB log with set debug libthread-db 1:


Trying host libthread_db library: libthread_db.so.1.
Host libthread_db.so.1 resolved to: /lib64/libthread_db.so.1.
td_ta_new failed: versions of libpthread and libthread_db do not match
Trying host libthread_db library: /usr/lib/debug/opt/scylladb/libreloc/libthread_db.so.1.
dlopen failed: /usr/lib/debug/opt/scylladb/libreloc/libthread_db.so.1: cannot open shared object file: No such file or directory.
Trying host libthread_db library: /opt/scylladb/bin/../libreloc/libthread_db.so.1.
dlopen failed: /opt/scylladb/bin/../libreloc/libthread_db.so.1: cannot open shared object file: No such file or directory.
thread_db_load_search returning 0
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
@tgrabiec tgrabiec added this to the 3.1 milestone Jul 8, 2019
avikivity pushed a commit that referenced this issue Jul 11, 2019
In scylla-debuginfo package, we have /usr/lib/debug/opt/scylladb/libreloc/libthread_db-1.0.so-666.development-0.20190711.73a1978fb.el7.x86_64.debug
but we actually does not have libthread_db.so.1 in /opt/scylladb/libreloc
since it's not available on ldd result with scylla binary.

To debug thread, we need to add the library in a relocatable package manually.

Fixes #4673

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190711111058.7454-1-syuu@scylladb.com>
avikivity pushed a commit that referenced this issue Jul 15, 2019
In scylla-debuginfo package, we have /usr/lib/debug/opt/scylladb/libreloc/libthread_db-1.0.so-666.development-0.20190711.73a1978fb.el7.x86_64.debug
but we actually does not have libthread_db.so.1 in /opt/scylladb/libreloc
since it's not available on ldd result with scylla binary.

To debug thread, we need to add the library in a relocatable package manually.

Fixes #4673

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190711111058.7454-1-syuu@scylladb.com>
(cherry picked from commit 842f75d)
@tgrabiec
Copy link
Contributor Author

Even with libthread_db in place and loaded, there's a problem:

(gdb) p local_engine
Cannot find thread-local storage for Thread 0x7fdc01dff700 (LWP 23731), executable file /usr/lib/debug/opt/scylladb/libexec/scylla.bin-3.1.0.rc2-0.20190618.fa53994fb0.el7.x86_64.debug:
generic error

@tgrabiec tgrabiec reopened this Jul 16, 2019
@tgrabiec
Copy link
Contributor Author

I don't see anything interesting in strace, looks like gdb reads debug info files and then fails. When I execute the command for the second time, there's no shared lib reading:

...
fcntl(105, F_GETFD)                     = 0
fcntl(105, F_SETFD, FD_CLOEXEC)         = 0
fstat(105, {st_mode=S_IFREG|0644, st_size=646640, ...}) = 0
lseek(105, 24576, SEEK_SET)             = 24576
read(105, "\0\0\0\0\260\210\0\0\0\0\0\0g\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 484) = 484
read(105, "W\0\0\0\4\0dM\0\0\10P\0\0\0\0\255\n\0\0\22y\3\0\0\22\301\3\0\0v\241"..., 4096) = 4096
mmap(NULL, 155648, PROT_READ, MAP_PRIVATE, 105, 0x6000) = 0x7f0363515000
madvise(0x7f0363515000, 155648, MADV_WILLNEED) = 0
lseek(105, 176128, SEEK_SET)            = 176128
read(105, "\0\0\221\21\301\20\17\1U\t\3y/\2\0\0\0\0\0\0g\314\377\1\0\0\0\0\0\221\21\334"..., 4096) = 4096
mmap(NULL, 24576, PROT_READ, MAP_PRIVATE, 105, 0x2b000) = 0x7f036350f000
madvise(0x7f036350f000, 24576, MADV_WILLNEED) = 0
openat(AT_FDCWD, "/usr/lib/debug/usr/lib64/../../.dwz/libselinux-2.8-6.fc29.x86_64", O_RDONLY) = 106
fcntl(106, F_GETFD)                     = 0
fcntl(106, F_SETFD, FD_CLOEXEC)         = 0
fstat(106, {st_mode=S_IFREG|0644, st_size=79551, ...}) = 0
lseek(106, 40960, SEEK_SET)             = 40960
read(106, "t.h\0\n\0\0objimpl.h\0\n\0\0pyhash.h\0\n\0\0"..., 696) = 696
read(106, "__pad0\0obj0\0_PyGC_generation0\0te"..., 4096) = 4096
mmap(NULL, 36864, PROT_READ, MAP_PRIVATE, 106, 0xa000) = 0x7f0363506000
madvise(0x7f0363506000, 36864, MADV_WILLNEED) = 0
lseek(105, 290816, SEEK_SET)            = 290816
read(105, "t\5\25\1\5\27\1\6\10<\254\6\362\5'\1\5)\1\5;\6\10J\5)J\220<\5;\202"..., 4096) = 4096
lseek(105, 294912, SEEK_SET)            = 294912
read(105, "64_OFF64\0_SC_V7_LP64_OFF64\0lstat"..., 12288) = 12288
read(105, "n\0rbuflen\0fts_pathlen\0rootpathle"..., 4096) = 4096
lseek(105, 196608, SEEK_SET)            = 196608
read(105, "?\31\3\16:\v;\59\v'\31\21\1\22\17@\30\227B\31\1\25\0\0\35.\0011\25\21\1"..., 4096) = 4096
mmap(NULL, 98304, PROT_READ, MAP_PRIVATE, 105, 0x30000) = 0x7f03634ee000
madvise(0x7f03634ee000, 98304, MADV_WILLNEED) = 0
lseek(106, 0, SEEK_SET)                 = 0
read(106, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
mmap(NULL, 40960, PROT_READ, MAP_PRIVATE, 106, 0) = 0x7f03634e4000
madvise(0x7f03634e4000, 40960, MADV_WILLNEED) = 0
lseek(106, 36864, SEEK_SET)             = 36864
read(106, "\32\0\0\0\0\255\n\0\0 D^\0\0\3\223\33)\0\0\0\0\35\0\0\0\4\0\0\0\0\0"..., 4096) = 4096
lseek(106, 40960, SEEK_SET)             = 40960
openat(AT_FDCWD, "/usr/lib/debug/usr/lib64/libgpg-error.so.0.25.0-1.33-1.fc29.x86_64.debug", O_RDONLY) = 107
fcntl(107, F_GETFD)                     = 0
fcntl(107, F_SETFD, FD_CLOEXEC)         = 0
fstat(107, {st_mode=S_IFREG|0644, st_size=540568, ...}) = 0
lseek(107, 8192, SEEK_SET)              = 8192
read(107, ",\0\0\0\2\0\25'\1\0\10\0\0\0\0\0000\365\0\0\0\0\0\0'\0\0\0\0\0\0\0"..., 464) = 464
read(107, ";\0\0\0\4\0\0\0\0\0\10\5\0\0\0\0z.\0\0;\v\0\0\0<\2573\0\0\n\36"..., 4096) = 4096
mmap(NULL, 118784, PROT_READ, MAP_PRIVATE, 107, 0x2000) = 0x7f03634c7000
madvise(0x7f03634c7000, 118784, MADV_WILLNEED) = 0
lseek(107, 122880, SEEK_SET)            = 122880
read(107, "\243>\1\1U\2\108\0\6PL\1\0\0\0\0\0\340D\1\1U\t\3x\261\1\0\0\0\0"..., 4096) = 4096
lseek(107, 126976, SEEK_SET)            = 126976
read(107, "B\27\0\0\20\5\0001\25\2\27\267B\27\0\0\21\211\202\1\1\21\0011\25\0\0\22.\1\3\16"..., 12288) = 12288
read(107, "B\27\0\0R\5\0\3\241>:\v;\v9\vI\240>\2\27\267B\27\0\0S\n\0001\25\21"..., 4096) = 4096
openat(AT_FDCWD, "/usr/lib/debug/usr/lib64/../../.dwz/libgpg-error-1.33-1.fc29.x86_64", O_RDONLY) = 108
fcntl(108, F_GETFD)                     = 0
fcntl(108, F_SETFD, FD_CLOEXEC)         = 0
fstat(108, {st_mode=S_IFREG|0644, st_size=21951, ...}) = 0
lseek(108, 4096, SEEK_SET)              = 4096
read(108, " \241\1\0\0O\200 \350)\0\0P\200 \0'\0\0Q\200 \350\2\0\0R\200 \273\24\0"..., 2870) = 2870
read(108, "GPG_ERR_USER_10\0GPG_ERR_USER_11\0"..., 4096) = 4096
lseek(108, 11062, SEEK_SET)             = 11062
read(108, "G_ERR_UNEXPECTED_TAG\0GPG_ERR_INV"..., 8192) = 8192
read(108, "ase\0_IO_write_base\0_IO_save_base"..., 4096) = 2697
lseek(107, 200704, SEEK_SET)            = 200704
read(107, "\5\3f\254\236t\5\20\203\5\5\6t\5\10\6\1\5\37X\5\33\236\5\21\201\5\3\220\5\7\3"..., 4096) = 4096
mmap(NULL, 28672, PROT_READ, MAP_PRIVATE, 107, 0x31000) = 0x7f03634c0000
madvise(0x7f03634c0000, 28672, MADV_WILLNEED) = 0
lseek(107, 139264, SEEK_SET)            = 139264
read(107, "B\27\0\0R\5\0\3\241>:\v;\v9\vI\240>\2\27\267B\27\0\0S\n\0001\25\21"..., 4096) = 4096
mmap(NULL, 65536, PROT_READ, MAP_PRIVATE, 107, 0x22000) = 0x7f03634b0000
madvise(0x7f03634b0000, 65536, MADV_WILLNEED) = 0
lseek(108, 0, SEEK_SET)                 = 0
read(108, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\1\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(108, 4096, SEEK_SET)              = 4096
read(108, " \241\1\0\0O\200 \350)\0\0P\200 \0'\0\0Q\200 \350\2\0\0R\200 \273\24\0"..., 4096) = 4096
lseek(108, 8192, SEEK_SET)              = 8192
rt_sigaction(SIGSEGV, {sa_handler=0x55d1f2c63d60, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK, sa_restorer=0x7f045fea9030}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f045fea9030}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f045fea9030}, NULL, 8) = 0
rt_sigaction(SIGSEGV, {sa_handler=0x55d1f2c63d60, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK, sa_restorer=0x7f045fea9030}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f045fea9030}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f045fea9030}, NULL, 8) = 0
rt_sigaction(SIGSEGV, {sa_handler=0x55d1f2c63d60, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK, sa_restorer=0x7f045fea9030}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f045fea9030}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f045fea9030}, NULL, 8) = 0
rt_sigaction(SIGSEGV, {sa_handler=0x55d1f2c63d60, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK, sa_restorer=0x7f045fea9030}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f045fea9030}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f045fea9030}, NULL, 8) = 0
futex(0x7f045f4001a0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
ioctl(1, TCSBRK, 1)                     = 0
write(2, "Cannot find thread-local storage"..., 171) = 171
write(2, "generic error", 13)           = 13

Also, the backtrace has all things "optimized out", which is unusual. It may be a problem with matching debug info.

When debugging the same crash on a non-relocatable binary, there is plenty of things in the backtrace which are not optimized out.

@bhalevy
Copy link
Member

bhalevy commented Jul 24, 2019

Cc @espindola
@tgrabiec, does set sysroot helps gdb to locate the library as a workaround?

@tgrabiec
Copy link
Contributor Author

@bhalevy I think you could use set sysroot, but I just copied the correct libthread_db from the build machine into /usr/lib/debug/opt/scylladb/libreloc/libthread_db.so.1 where it's expected to be (and after Takuya's fix, should always be) and GDB found it. The issue is no longer about that.

@slivne
Copy link
Contributor

slivne commented Aug 14, 2019

@tgrabiec / @avi do we have a workaround for this - e.g. will we be able to analyze coredumps ?

@tgrabiec
Copy link
Contributor Author

@slivne I'm not aware of any workaround. @espindola, did you have a chance to look at this?

@slivne slivne removed the high label Aug 14, 2019
@slivne
Copy link
Contributor

slivne commented Aug 14, 2019 via email

@tgrabiec
Copy link
Contributor Author

tgrabiec commented Aug 14, 2019 via email

@avikivity
Copy link
Member

What about using the frozen toolchain? It even has gdb installed.

@tgrabiec
Copy link
Contributor Author

tgrabiec commented Aug 14, 2019 via email

@avikivity
Copy link
Member

I guess it happens because the core contains links to paths which no longer exist. Or perhaps rpm's post-processing to generate split debuginfo got confused.

@avikivity
Copy link
Member

I'll try to reproduce to see which is which.

@espindola
Copy link
Contributor

@slivne I'm not aware of any workaround. @espindola, did you have a chance to look at this?

No. I came back from vacation on Monday and have been going over review requests on the types.hh refactoring.

@avikivity
Copy link
Member

Well, one problem is that gdb thinks the binary is ld.so. It is correct in thinking so, but we need to trick it into thinking the binary is scylla.bin.

@avikivity
Copy link
Member

(that problem happens with gdb -p, maybe not with core dumps)

@avikivity
Copy link
Member

With gdb libexec/scylla.bin $(pgrep scylla) I get nice backtraces:

(gdb) bt
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x0000000002d0a72a in seastar::internal::io_pgetevents (io_context=140286094229504, min_nr=min_nr@entry=1, nr=nr@entry=128, events=events@entry=0x7ffda2325c00, timeout=<optimized out>, sigmask=sigmask@entry=0x600000020020, force_syscall=<optimized out>)
    at /home/avi/scylla/seastar/src/core/linux-aio.cc:147
#2  0x0000000002a96253 in seastar::reactor_backend_aio::await_events (this=this@entry=0x600000183500, timeout=timeout@entry=-1, active_sigmask=active_sigmask@entry=0x600000020020) at /home/avi/scylla/seastar/src/core/reactor.cc:879
#3  0x0000000002a96ace in seastar::reactor_backend_aio::wait_and_process (this=0x600000183500, timeout=-1, active_sigmask=0x600000020020) at /home/avi/scylla/seastar/src/core/reactor.cc:944
#4  0x000000000299a385 in seastar::reactor::wait_and_process (active_sigmask=0x600000020020, timeout=-1, this=0x600000020000) at /usr/include/c++/9/bits/unique_ptr.h:357
#5  seastar::reactor::sleep (this=0x600000020000) at /home/avi/scylla/seastar/src/core/reactor.cc:4475
#6  seastar::reactor::sleep (this=0x600000020000) at /home/avi/scylla/seastar/src/core/reactor.cc:4465
#7  0x0000000002a5c1bc in seastar::reactor::run (this=0x600000020000) at /home/avi/scylla/seastar/src/core/reactor.cc:4442
#8  0x0000000002942d6e in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) (this=<optimized out>, ac=<optimized out>, av=<optimized out>, func=...) at /home/avi/scylla/seastar/include/seastar/core/reactor.hh:918
#9  0x0000000002943c7f in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) (this=this@entry=0x7ffda2327630, ac=ac@entry=3, av=av@entry=0x7ffda2327960, func=...) at /usr/include/c++/9/bits/std_function.h:87
#10 0x00000000006cc1d5 in main (ac=3, av=0x7ffda2327960) at /usr/include/c++/9/bits/std_function.h:87
(gdb) 

in fact they're even nicer because gdb colorizes them.

@gleb-cloudius
Copy link
Contributor

gleb-cloudius commented Aug 15, 2019 via email

@avikivity
Copy link
Member

We don't know the interpreter path during link time.

@gleb-cloudius
Copy link
Contributor

gleb-cloudius commented Aug 15, 2019 via email

@avikivity
Copy link
Member

Yes, but patchself itself is dynamically linked. What we can do is do the ld.so trick with patchelf, then use patchelf during installation to adjust the binary.

@glommer
Copy link
Contributor

glommer commented Aug 15, 2019

Will there ever be a binary that we don't relocate?

Why can't we recompile patchelf statically ? Does it really need to be dynamically linked ?

@glommer
Copy link
Contributor

glommer commented Aug 15, 2019

Also, if we use relative paths, can't we know the path to the interpreter at build time ?
even with patchelf, I am using relative paths for python as as long as we keep the directory structure intact - which I think is a reasonable requirement - everything works.

So we could compile scylla setting RPATH to $(ORIGIN)/../lib/

@avikivity
Copy link
Member

The interpreter has to be an absolute path.

avikivity added a commit to avikivity/scylladb that referenced this issue Aug 15, 2019
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

Fixes scylladb#4673.
@espindola
Copy link
Contributor

The interpreter has to be an absolute path.

I was just checking that. The kernel passes PT_EXEC to open_exec(elf_interpreter), and that becomes do_open_execat(AT_FDCWD, filename, 0), so it should work with relative paths, no?

@espindola
Copy link
Contributor

relative paths do work, I just created a file with
[Requesting program interpreter: ../..//lib64/ld-linux-x86-64.so.2]
and it runs just fine.

@tgrabiec
Copy link
Contributor Author

With gdb libexec/scylla.bin $(pgrep scylla) I get nice backtraces:

But do thread-locals work? The issue was with thread locals, not lack of backtraces.

@avikivity
Copy link
Member

@espindola the paths are relative to $CWD, while we want them to be relative to $ORIGIN.

@avikivity
Copy link
Member

I wrote https://github.com/avikivity/scylla/commits/patchelf, which should fix the problem, except that it triggers a bug (in patchelf or debugedit) so we can't create rpms any more.

@avikivity
Copy link
Member

@tgrabiec I think they don't, because the executable (as far as gdb is concerned) is ld.so instead of scylla.

@avikivity
Copy link
Member

patchelf --set-interpreter does not trigger the bug. patchelf --set-rpath does, with either $ORIGIN or a full path. We could set rpath in the linker command line, but then ./build/release/scylla wouldn't work any more.

@avikivity
Copy link
Member

I'll just drop rpath modifications and rely on LD_LIBRARY_PATH. That means the binary has to be called through the thunk, but we have to have that for GNUTLS_SYSTEM_PRIORITY_FILE.

@gleb-cloudius
Copy link
Contributor

gleb-cloudius commented Aug 18, 2019 via email

@avikivity
Copy link
Member

We already did that (with exec -a), but I think the kernel records the true binary.

@avikivity
Copy link
Member

Strangely, my hack failed testing, but it passed testing with -ex set solib-absolute-path ... passed to gdb. I will check whether both the hack and the command are needed, or only solib-absolute-path.

@avikivity
Copy link
Member

Looks like both are needed.

avikivity added a commit to avikivity/scylladb that referenced this issue Aug 18, 2019
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

Fixes scylladb#4673.
@avikivity
Copy link
Member

With a bit of extra help, works on .deb too.

avikivity added a commit to avikivity/scylladb that referenced this issue Aug 18, 2019
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

dh_strip barfs on some binaries which were treated with patchelf,
so exclude them from dh_strip. This doesn't lose any functionality,
since these binaries didn't have debug information to begin with
(they are already-stripped Fedora executables).

Fixes scylladb#4673.
tgrabiec pushed a commit that referenced this issue Aug 19, 2019
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

dh_strip barfs on some binaries which were treated with patchelf,
so exclude them from dh_strip. This doesn't lose any functionality,
since these binaries didn't have debug information to begin with
(they are already-stripped Fedora executables).

Fixes #4673.
tgrabiec pushed a commit that referenced this issue Aug 19, 2019
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

dh_strip barfs on some binaries which were treated with patchelf,
so exclude them from dh_strip. This doesn't lose any functionality,
since these binaries didn't have debug information to begin with
(they are already-stripped Fedora executables).

Fixes #4673.
avikivity added a commit to avikivity/scylladb that referenced this issue Aug 20, 2019
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

dh_strip barfs on some binaries which were treated with patchelf,
so exclude them from dh_strip. This doesn't lose any functionality,
since these binaries didn't have debug information to begin with
(they are already-stripped Fedora executables).

Fixes scylladb#4673.

(cherry-picked from commit 698b72b)

Backport notes:
 - 3.1 doesn't call install.sh from the debian packager, so add an adjust_bin
   and call it from the debian rules file directly
 - adjusted install.sh for 3.1 prefix (/usr) compared to master prefix (/opt/scylladb)
avikivity added a commit that referenced this issue Aug 20, 2019
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

dh_strip barfs on some binaries which were treated with patchelf,
so exclude them from dh_strip. This doesn't lose any functionality,
since these binaries didn't have debug information to begin with
(they are already-stripped Fedora executables).

Fixes #4673.

(cherry-picked from commit 698b72b)

Backport notes:
 - 3.1 doesn't call install.sh from the debian packager, so add an adjust_bin
   and call it from the debian rules file directly
 - adjusted install.sh for 3.1 prefix (/usr) compared to master prefix (/opt/scylladb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants