Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[qnx] Cross-compiling to QNX #1429

Closed
ibogosavljevic opened this issue Sep 8, 2023 · 38 comments
Closed

[qnx] Cross-compiling to QNX #1429

ibogosavljevic opened this issue Sep 8, 2023 · 38 comments

Comments

@ibogosavljevic
Copy link

I want to use the CPU profiler on QNX, but I cannot cross-compile it. I did the following

./configure --host=aarch64-unknown-nto-qnx7.1.0

After make, I get the following error:

/bin/bash ./libtool  --tag=CXX   --mode=compile aarch64-unknown-nto-qnx7.1.0-g++ -std=gnu++11 -DHAVE_CONFIG_H -I. -I./src  -I./src   -pthread -DNDEBUG -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare -Wno-unused-result -fsized-deallocation -faligned-new  -fno-omit-frame-pointer -momit-leaf-frame-pointer -DNO_HEAP_CHECK -DENABLE_EMERGENCY_MALLOC -DTCMALLOC_FOR_DEBUGALLOCATION -g -O2 -MT src/libtcmalloc_debug_la-debugallocation.lo -MD -MP -MF src/.deps/libtcmalloc_debug_la-debugallocation.Tpo -c -o src/libtcmalloc_debug_la-debugallocation.lo `test -f 'src/debugallocation.cc' || echo './'`src/debugallocation.cc
libtool: compile:  aarch64-unknown-nto-qnx7.1.0-g++ -std=gnu++11 -DHAVE_CONFIG_H -I. -I./src -I./src -pthread -DNDEBUG -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare -Wno-unused-result -fsized-deallocation -faligned-new -fno-omit-frame-pointer -momit-leaf-frame-pointer -DNO_HEAP_CHECK -DENABLE_EMERGENCY_MALLOC -DTCMALLOC_FOR_DEBUGALLOCATION -g -O2 -MT src/libtcmalloc_debug_la-debugallocation.lo -MD -MP -MF src/.deps/libtcmalloc_debug_la-debugallocation.Tpo -c src/debugallocation.cc  -fPIC -shared -DPIC -o src/.libs/libtcmalloc_debug_la-debugallocation.o
In file included from src/libc_override.h:92,
                 from src/tcmalloc.cc:144,
                 from src/debugallocation.cc:85:
src/libc_override_gcc_and_weak.h:215:8: error: conflicting declaration of C function 'void cfree(void*)'
   void cfree(void* ptr) __THROW                   ALIAS(tc_cfree);
        ^~~~~
In file included from src/debugallocation.cc:44:
/home/ivica/qnx710/target/qnx7/usr/include/malloc.h:64:12: note: previous declaration 'int cfree(void*)'
 extern int cfree(void *__ptr);
            ^~~~~
In file included from src/libc_override.h:92,
                 from src/tcmalloc.cc:144,
                 from src/debugallocation.cc:85:
src/libc_override_gcc_and_weak.h:225:7: error: conflicting declaration of C function 'int mallopt(int, int)'
   int mallopt(int cmd, int value) __THROW         ALIAS(tc_mallopt);
       ^~~~~~~
In file included from src/debugallocation.cc:44:
/home/ivica/qnx710/target/qnx7/usr/include/malloc.h:140:12: note: previous declaration 'int mallopt(int, intptr_t)'
 extern int mallopt(int __cmd, intptr_t __value);
            ^~~~~~~
src/debugallocation.cc: In function 'void* DebugAllocate(size_t, int)':
src/debugallocation.cc:1011:30: warning: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'pthread_t' {aka 'int'} [-Wformat=]
       TracePrintf(TraceFd(), "%s\t%zu\t%p\t%" GPRIuPTHREAD,      \
                   name, size, addr, PRINTABLE_PTHREAD(pthread_self())); \
                                                                   ~
src/debugallocation.cc:1048:3: note: in expansion of macro 'MALLOC_TRACE'
   MALLOC_TRACE("malloc", size, ptr->data_addr());
   ^~~~~~~~~~~~
src/debugallocation.cc: In function 'void DebugDeallocate(void*, int, size_t)':
src/debugallocation.cc:1011:30: warning: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'pthread_t' {aka 'int'} [-Wformat=]
       TracePrintf(TraceFd(), "%s\t%zu\t%p\t%" GPRIuPTHREAD,      \
                   name, size, addr, PRINTABLE_PTHREAD(pthread_self())); \
                                                                   ~
src/debugallocation.cc:1053:3: note: in expansion of macro 'MALLOC_TRACE'
   MALLOC_TRACE("free",
   ^~~~~~~~~~~~
src/debugallocation.cc: In function 'void* tc_realloc(void*, size_t)':
src/debugallocation.cc:1011:30: warning: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'pthread_t' {aka 'int'} [-Wformat=]
       TracePrintf(TraceFd(), "%s\t%zu\t%p\t%" GPRIuPTHREAD,      \
                   name, size, addr, PRINTABLE_PTHREAD(pthread_self())); \
                                                                   ~
src/debugallocation.cc:1320:3: note: in expansion of macro 'MALLOC_TRACE'
   MALLOC_TRACE("realloc", p->actual_data_size(p->data_addr()), p->data_addr());
   ^~~~~~~~~~~~
Makefile:4902: recipe for target 'src/libtcmalloc_debug_la-debugallocation.lo' failed

Is there a way to compile only CPU profiler for QNX instead of everything?

@tangzhiqiang3
Copy link

Same mistake!
// void cfree(void* ptr) __THROW ALIAS(tc_cfree);
// int mallopt(int cmd, int value) __THROW ALIAS(tc_mallopt);
// cfree(p1); // synonym for free
My temporary comment,can build ok, get lib.

@tangzhiqiang3
Copy link

but, error:
apache/brpc#1979

@ibogosavljevic
Copy link
Author

Ok, I did as you suggested, the thing compilers, but at the end of the compilation I don't get libprofiler.so. I tried even with:
./configure --host=aarch64-unknown-nto-qnx7.1.0 --enable-cpu-profiler but it didn't help.

@sebaleme
Copy link

sebaleme commented Sep 18, 2023

Hi , I am also trying to build this repo with Qnx/Arm. From where are you getting the Posix headers? I have a linux workstation, and all the low level headers are coming from:
/usr/include/x86_64-linux-gnu/
When I build for QNX/Arm, I am not sure if these headers are provided by the ccross-compiler, or if I need to download another repository (and update my cmake configuration). Here is the list of headers that I need:

#include <bits/types.h>
#include <bits/types/sigset_t.h>
#include <bits/types/stack_t.h>
#include <sys/syscall.h>
#include <sys/ucontext.h>

If you could explain how did you get them on your system, would be great!

@ibogosavljevic
Copy link
Author

ibogosavljevic commented Sep 18, 2023 via email

@sebaleme
Copy link

sebaleme commented Sep 18, 2023

OK, thx, I found some. The ucontext.h header is in the crosscompiler folder, but the others are not available:
#include <bits/types.h> ==> not found
#include <bits/types/sigset_t.h> == not found
#include <bits/types/stack_t.h> == not found
#include <sys/syscall.h> ==> replaced by unistd.h, found (not all content available though)
#include <sys/ucontext.h> ==> found

Now, the main config is the access to PC register. In x86, this is as followed:

/* How to access the PC from a struct ucontext */
#define PC_FROM_UCONTEXT uc_mcontext.gregs[REG_RIP]

In my case, for arm, the REG_IP is not defined, so I have to find a workaround. I tried this and it did not work:

/* How to access the PC from a struct ucontext */
#define PC_FROM_UCONTEXT ARM_GET_REGIP(uc_mcontext.cpu)

@alk
Copy link
Contributor

alk commented Sep 18, 2023

Ok. I see some interest in qnx and I have few hours I can spare to help.

But I require someone to help me get qnx cross compilation toolchain or anything. Or pointers. I just got myself qnx account (they insist I register...) but I am frankly struggling to locate anything I can download. And instead of wasting more of my time, I think I'll just ask.

So tell me how I can get qnx cross toolchain for GNU/Linux and I'll help you get gperftools building and maybe even running on this OS.

@sebaleme
Copy link

Hi, thanks for the help. I am also using the QNX SDP 7.1. You can check anytime if your configuration is correct by calling the QNX compiler (qcc). Note that there is a license check when using it, so you need an active QNX license available on your host. However, I do not exactly know how to install manually the SDP, since it was preinstalled inside the (private) docker container I am using.

@alk
Copy link
Contributor

alk commented Sep 18, 2023

okay. I think I'll eventually get it. This is what I am getting on their website :)



Account Still Being ProcessedYour myQNX account is still being processed. We thank you for your patience. Please check back in 47.133333333333 minutes | Account Still Being ProcessedYour myQNX account is still being processed. We thank you for your patience. Please check back in 47.133333333333 minutes
-- | --
Account Still Being ProcessedYour myQNX account is still being processed. We thank you for your patience. Please check back in 47.133333333333 minutes

Well, I wait then.

@sebaleme
Copy link

sebaleme commented Sep 19, 2023

Hi,
Here are the steps I followed to compile in Linux x86. I am using cmake, I tried briefly the autoconf stuff, but could not really make it work.
cd /home/lsm1so/workspace/
git clone https://github.com/gperftools/gperftools.git
cd gperftools
cmake . -DCMAKE_BUILD_TYPE=Release -G "Unix Makefiles" -B./build_x86

Then I had to modify the cmake, because some libraries could not link due to:
/usr/bin/ld: libprofiler.so.5.9.5: undefined reference to pthread_key_create
/usr/bin/ld: libprofiler.so.5.9.5: undefined reference to pthread_key_delete
/usr/bin/ld: libprofiler.so.5.9.5: undefined reference to pthread_setspecific

First, I changed the cmakelist to only build what I need:
set(DEFAULT_BUILD_HEAP_PROFILER OFF)
set(DEFAULT_BUILD_DEBUGALLOC OFF)

2 tests were still complaining:

  • profiledata_unittest
  • profiler2_unittest

I solved the first issue by adding the "pthread" dependency line 1298 (defined within the LIBPROFILER label).
For the profiler2_unittest, since the compile definition said NO_THREAD, I was not getting why it would try to link with the pthread stuff, so I commented out this test in the cmakelist.
And then, that s it, I have a linux/x86 SO lib, and I can generate my profiler files.

Now, for QNX/aarch64, it has not worked that easily. The main issue is how to access the PC, and here I can only guess how to do this. This code compile, but is it the correct reg?

/* How to access the PC from a struct ucontext */
#define PC_FROM_UCONTEXT uc_mcontext.cpu.gpr[15]

Here are the steps I followed:
export CC=aarch64-unknown-nto-qnx7.1.0-gcc
export CXX=aarch64-unknown-nto-qnx7.1.0-g++
cmake . -DCMAKE_BUILD_TYPE=Release -G "Unix Makefiles" -B./build_aarch64
cmake --build .

And then, finally, the same error as everyone else:

[ 18%] Built target tcmalloc_minimal_internal_object
[ 19%] Building CXX object CMakeFiles/tcmalloc_minimal.dir/src/tcmalloc.cc.o
In file included from /workspace/bagfile/gperftools/src/libc_override.h:92,
from /workspace/bagfile/gperftools/src/tcmalloc.cc:144:
/workspace/bagfile/gperftools/src/libc_override_gcc_and_weak.h:215:8: error: conflicting declaration of C function 'void cfree(void*)'
void cfree(void* ptr) __THROW ALIAS(tc_cfree);

This error is not very bad, it just means that the qnx function signature from cfree is different in qnx than for the other systems.
After that, I had many tests not building, so I deactivated them (not really sure why, since I define most of them as OFF in the cmake). Then I had the pthread from before which is not found in QNX env. Then the SA_RESTART flag which is also not defined in QNX. The header containing it says explicitly:

x86: sigaction.h  ==> arm: signal.h
/* #define SA_RESTART      0x0040 (not supported yet) */    /* Restart the kernel call on signal return */

And after few other deactivation, I am able to build the lib and link it to my application. However, testing on the target did not work.

  • my program is executed, without crashing
  • it computes its output as usual
  • the profiler file is almost empty, so I guess the call stack acquisition is not working (same test on x86 yields 20 000 samples and a file of 1.5MB)

grafik

@sebaleme
Copy link

sebaleme commented Sep 19, 2023

So, to sum up, the questions would be:

  • How can we access to the PC register? Is there a unit test which checks if the access is successful? Testing that we are getting the callstack and not random values?
  • Is this a problem to not set the SA_RESTART flag? Is this mandatory? I just removed the flag in the ProfileHandler() constructor, assuming it is not mandatory for the profiling activities.

My usecase is only to run the runtime profiling. If you could explain me the code, maybe I could also try other things by myself?

@iDings
Copy link
Contributor

iDings commented Sep 20, 2023

@sebaleme I port some gperftool feature to qnx, heap profile and cpu profile. For selftest on qnx aarch64, just works.
Maybe you can have this an try, iDings@d24ed12 .

Porting already done:

  • ignore SA_RESTART
  • get mapinfo from /proc/self/pmap
  • QNX deprecate ITIMER_PROF, use ITIMER_REAL directly
  • add pc register field check in uc_mcontext struct

For stacktrace with libunwind

Still have some problem:

For stacktrace with generic_fp

It's just works and maybe not every backtrace info is correct. Because QNX aarch64, compiler default flags include -fomit-frame-pointer -momit-leaf-frame-pointer, for application it's fine to overwrite this config, and QNX's handcraft assembly function seems also no frame-pointer, such as QNX kernel call.

Test with autoconf compile command

from #1147 (comment)

./configure --host=aarch64-unknown-nto-qnx7.1.0 --enable-frame-pointers CPPFLAGS="-D__NO_EXT_QNX -D_SC_NPROCESSORS_ONLN=91"

@sebaleme
Copy link

sebaleme commented Sep 20, 2023

Hi, thanks for the reply!!
With your commit, I have been able to run successfully the UT getpc_test.cc. So I can read the PC register in my conf (QNX AArch64).
#define PC_FROM_UCONTEXT uc_mcontext.cpu.elr
It is also important to notice that the signal SIGPROF used in the code is not working in QNX, so it has to be replaced.

Now, when I run the profiler, it still generates an empty file, so some problem are still there. Can you share your config.h, I have the impression that cmake does not generate it properly:


/* Sometimes we accidentally #include this config.h instead of the one
   in .. -- this is particularly true for msys/mingw, which uses the
   unix config.h but also runs code in the windows directory.
   */
#ifdef __MINGW32__
#include "../config.h"
#define GOOGLE_PERFTOOLS_WINDOWS_CONFIG_H_
#endif

#ifndef GOOGLE_PERFTOOLS_WINDOWS_CONFIG_H_
#define GOOGLE_PERFTOOLS_WINDOWS_CONFIG_H_
/* used by tcmalloc.h */
#define GPERFTOOLS_CONFIG_H_

/* Enable aggressive decommit by default */
/* #undef ENABLE_AGGRESSIVE_DECOMMIT_BY_DEFAULT */

/* Build new/delete operators for overaligned types */
/* #undef ENABLE_ALIGNED_NEW_DELETE */

/* Build runtime detection for sized delete */
/* #undef ENABLE_DYNAMIC_SIZED_DELETE */

/* Report large allocation */
/* #undef ENABLE_LARGE_ALLOC_REPORT */

/* Build sized deletion operators */
/* #undef ENABLE_SIZED_DELETE */

/* Define to 1 if you have the <asm/ptrace.h> header file. */
/* #undef HAVE_ASM_PTRACE_H */

/* Define to 1 if you have the <cygwin/signal.h> header file. */
/* #undef HAVE_CYGWIN_SIGNAL_H */

/* Define to 1 if you have the declaration of `backtrace', and to 0 if you
   don't. */
#define HAVE_DECL_BACKTRACE 0

/* Define to 1 if you have the declaration of `cfree', and to 0 if you don't.
   */
#define HAVE_DECL_CFREE 0

/* Define to 1 if you have the declaration of `memalign', and to 0 if you
   don't. */
#define HAVE_DECL_MEMALIGN 0

/* Define to 1 if you have the declaration of `nanosleep', and to 0 if you
   don't. */
#define HAVE_DECL_NANOSLEEP 0

/* Define to 1 if you have the declaration of `posix_memalign', and to 0 if
   you don't. */
#define HAVE_DECL_POSIX_MEMALIGN 1

/* Define to 1 if you have the declaration of `pvalloc', and to 0 if you
   don't. */
#define HAVE_DECL_PVALLOC 0

/* Define to 1 if you have the declaration of `sleep', and to 0 if you don't.
   */
#define HAVE_DECL_SLEEP 0

/* Define to 1 if you have the declaration of `valloc', and to 0 if you don't.
   */
#define HAVE_DECL_VALLOC 1

/* Define to 1 if you have the <execinfo.h> header file. */
/* #undef HAVE_EXECINFO_H */

/* Define to 1 if you have the <fcntl.h> header file. */
#define HAVE_FCNTL_H

/* Define to 1 if you have the <features.h> header file. */
/* #undef HAVE_FEATURES_H */

/* Define to 1 if you have the `fork' function. */
#define HAVE_FORK

/* Define to 1 if you have the `geteuid' function. */
#define HAVE_GETEUID

/* Define to 1 if you have the <glob.h> header file. */
#define HAVE_GLOB_H

/* Define to 1 if you have the <grp.h> header file. */
#define HAVE_GRP_H

/* Define to 1 if you have the <libunwind.h> header file. */
#define HAVE_LIBUNWIND_H 0

/* #undef USE_LIBUNWIND */

/* Define if this is Linux that has SIGEV_THREAD_ID */
#define HAVE_LINUX_SIGEV_THREAD_ID 0

/* Define to 1 if you have the <malloc.h> header file. */
#define HAVE_MALLOC_H

/* Define to 1 if you have the <malloc/malloc.h> header file. */
/* #undef HAVE_MALLOC_MALLOC_H */

/* Define to 1 if you have a working `mmap' system call. */
#define HAVE_MMAP

/* Define to 1 if you have the <poll.h> header file. */
#define HAVE_POLL_H

/* define if libc has program_invocation_name */
/* #undef HAVE_PROGRAM_INVOCATION_NAME */

/* Define if you have POSIX threads libraries and header files. */
#define HAVE_PTHREAD

/* defined to 1 if pthread symbols are exposed even without include pthread.h
   */
/* #undef HAVE_PTHREAD_DESPITE_ASKING_FOR */

/* Define to 1 if you have the <pwd.h> header file. */
#define HAVE_PWD_H

/* Define to 1 if you have the `sbrk' function. */
#define HAVE_SBRK

/* Define to 1 if you have the <sched.h> header file. */
#define HAVE_SCHED_H

/* Define to 1 if the system has the type `struct mallinfo'. */
#define HAVE_STRUCT_MALLINFO

/* Define to 1 if you have the <sys/cdefs.h> header file. */
#define HAVE_SYS_CDEFS_H

/* Define to 1 if you have the <sys/malloc.h> header file. */
/* #undef HAVE_SYS_MALLOC_H */

/* Define to 1 if you have the <sys/resource.h> header file. */
#define HAVE_SYS_RESOURCE_H

/* Define to 1 if you have the <sys/socket.h> header file. */
#define HAVE_SYS_SOCKET_H

/* Define to 1 if you have the <sys/syscall.h> header file. */
#define HAVE_SYS_SYSCALL_H 0

/* Define to 1 if you have the <sys/types.h> header file. */
#define HAVE_SYS_TYPES_H

/* Define to 1 if you have the <sys/ucontext.h> header file. */
#define HAVE_SYS_UCONTEXT_H 0

/* Define to 1 if you have the <sys/wait.h> header file. */
#define HAVE_SYS_WAIT_H

/* Define to 1 if compiler supports __thread */
#define HAVE_TLS

/* Define to 1 if you have the <ucontext.h> header file. */
#define HAVE_UCONTEXT_H 1

/* Define to 1 if you have the <unistd.h> header file. */
#define HAVE_UNISTD_H

/* Whether <unwind.h> contains _Unwind_Backtrace */
#define HAVE_UNWIND_BACKTRACE

/* Define to 1 if you have the <unwind.h> header file. */
#define HAVE_UNWIND_H

/* define if your compiler has __attribute__ */
#define HAVE___ATTRIBUTE__

/* define if your compiler supports alignment of functions */
/* #undef HAVE___ATTRIBUTE__ALIGNED_FN */

/* Define to 1 if compiler supports __environ */
/* #undef HAVE___ENVIRON */

/* Define to 1 if you have the `__sbrk' function. */
#define HAVE___SBRK 0

/* prefix where we look for installed files */
/* #undef INSTALL_PREFIX */

/* Define to the sub-directory where libtool stores uninstalled libraries. */
/* #undef LT_OBJDIR */

/* Name of package */
#define PACKAGE "gperftools"

/* Define to the address where bug reports for this package should be sent. */
#define PACKAGE_BUGREPORT "gperftools@googlegroups.com"

/* Define to the full name of this package. */
#define PACKAGE_NAME "gperftools"

/* Define to the full name and version of this package. */
#define PACKAGE_STRING "gperftools 2.13"

/* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME "gperftools"

/* Define to the home page for this package. */
/* #undef PACKAGE_URL */

/* Define to the version of this package. */
#define PACKAGE_VERSION "2.13"

/* How to access the PC from a struct ucontext */
#define PC_FROM_UCONTEXT uc_mcontext.cpu.elr

/* Always the empty-string on non-windows systems. On windows, should be
   "__declspec(dllexport)". This way, when we compile the dll, we export our
   functions/classes. It's safe to define this here because config.h is only
   used internally, to compile the DLL, and every DLL source file #includes
   "config.h" before anything else. */
#ifndef WIN32
/* #undef WIN32 */
#endif
#if defined(WIN32)
#ifndef PERFTOOLS_DLL_DECL
# define PERFTOOLS_IS_A_DLL  1
# define PERFTOOLS_DLL_DECL  __declspec(dllexport)
# define PERFTOOLS_DLL_DECL_FOR_UNITTESTS  __declspec(dllimport)
#endif
#else
#ifndef PERFTOOLS_DLL_DECL
# define PERFTOOLS_DLL_DECL
# define PERFTOOLS_DLL_DECL_FOR_UNITTESTS
#endif
#endif

/* Mark the systems where we know it's bad if pthreads runs too
   early before main (before threads are initialized, presumably).  */
#ifdef __FreeBSD__
#define PTHREADS_CRASHES_IF_RUN_TOO_EARLY 1
#endif

/* Define 8 bytes of allocation alignment for tcmalloc */
/* #undef TCMALLOC_ALIGN_8BYTES */

/* Define internal page size for tcmalloc as number of left bitshift */
/* #undef TCMALLOC_PAGE_SIZE_SHIFT */

/* Version number of package */
#define VERSION 2.13

/* C99 says: define this to get the PRI... macros from stdint.h */
#ifndef __STDC_FORMAT_MACROS
# define __STDC_FORMAT_MACROS 1
#endif

// ---------------------------------------------------------------------
// Extra stuff not found in config.h.in
#if defined(WIN32)

// This must be defined before the windows.h is included.  We need at
// least 0x0400 for mutex.h to have access to TryLock, and at least
// 0x0501 for patch_functions.cc to have access to GetModuleHandleEx.
// (This latter is an optimization we could take out if need be.)
#ifndef _WIN32_WINNT
# define _WIN32_WINNT 0x0501
#endif

// We want to make sure not to ever try to #include heap-checker.h
#define NO_HEAP_CHECK 1

// TODO(csilvers): include windows/port.h in every relevant source file instead?
#include "windows/port.h"

#endif
#endif  /* GOOGLE_PERFTOOLS_WINDOWS_CONFIG_H_ */

@iDings
Copy link
Contributor

iDings commented Sep 20, 2023

@sebaleme generated config.h use autoconf build without libunwind config.h.txt

@alk
Copy link
Contributor

alk commented Sep 20, 2023

Thanks for updates.

So IMHO (emphasis on IMHO) there is no that much if any value for us trying to offer cpu profiler for qnx if they don't support cpu time timers (i.e. SIGPROF mentioned above).

We do offer 'wall-clock' time profiling too, and it is occasionally useful. But in my experience only very few engineers know how to use it. I don't think people wanting to profile on QNX are likely among those few.

Do I misunderstand anything ? Let me know what you think.

@sebaleme
Copy link

sebaleme commented Sep 21, 2023

Hi, the latest update.
Previously, the profiler file was empty, now I get something. However, when the profiler file on Linux/X86 are 50-100KB, the ones on QNX/AARCH64 are only 16KB. SO I am not entirely sure if I will get proper data.

Even so, when I try to process the profiler files on my X86 workstation, pprof is complaining because it is trying to find the libs through their path on the QNX board (dispite the fact that I give the path to the executable).
grafik

The obvious solution would be to run pprof on target, but I don t know how to do that. Of course, per default there is no perl or GO environment on my QNX board. There is no Go installation for QNX/AARCH64, so I tried Linux ARM64, but the tar command fails to unpack the archive on target:

/tmp# tar -C /usr/local -xzf go1.21.1.linux-amd64.tar.gz 
tar: Error opening archive: Failed to open 'go1.21.1.linux-amd64.tar.gz'

@sebaleme
Copy link

sebaleme commented Sep 21, 2023

Thanks for updates.

So IMHO (emphasis on IMHO) there is no that much if any value for us trying to offer cpu profiler for qnx if they don't support cpu time timers (i.e. SIGPROF mentioned above).

We do offer 'wall-clock' time profiling too, and it is occasionally useful. But in my experience only very few engineers know how to use it. I don't think people wanting to profile on QNX are likely among those few.

Do I misunderstand anything ? Let me know what you think.

I understand your position, properly support QNX would require some HW and QNX licenses (might not be possible for open source project). I would be ok if we just have the branch from @iDings classified as "experimental" for people like the ones in this discussion.

@sebaleme
Copy link

OK, now I get something, but it looks like that. I would say the gperftools works, it is just that I get too much samples from idle cores which prevent me from focusing on my SW. I copied the libraries on my linux pc, and run pprof locally, so now it finds the most important stuff. Although I gave libc.so, it still can t tell me the function names, but it is possible that the libs running on the board are built in release, so no debug info are available. It would be useful to filter the core running my process, in order to have a more exploitable profiling result.

[libm.so.3] 1
[libc.so.5] 1
customerfunction::labelSection;customerfunction::pointsAreCompatible (inline) 1
[libc.so.5] 169460
[libc.so.5] 7
[libc.so.5] 73
[libm.so.3] 1
[libc.so.5] 3
[libc.so.5] 4
[librecompute_activation.so] 11
[libc.so.5] 122
[libc.so.5] 14
customerfunction::labelSection 1
[librecompute_activation.so] 1
[libc.so.5] 6
customerfunction::labelSection;UnionFindStructure::unite (inline) 1
customerfunction::labelSection 1
[libc.so.5] 3
customerfunction::labelSection;operator() (inline) 1
[libm.so.3] 1
[libc.so.5] 5
[librecompute_activation.so] 5
[libc.so.5] 2
[libc.so.5] 19
[libc.so.5] 5

@iDings
Copy link
Contributor

iDings commented Sep 21, 2023

Even so, when I try to process the profiler files on my X86 workstation, pprof is complaining because it is trying to find the libs through their path on the QNX board (dispite the fact that I give the path to the executable).

PPROF_BINARY_PATH can combine multi search path, such as PPROF_BINARY_PATH=${QNX_TARGET}/aarch64le/lib/:${QNX_TARGET}/lib, and use the lastest version from pprof github, i try the go tool pprof on ubuntu which seems not the latest version and can't decode the all address to symbol

@sebaleme
Copy link

sebaleme commented Sep 21, 2023

Ok. One question: when we use linux Perf, we start it like this:
sudo timeout $1 perf record --freq 1000 -g --pid $2 --call-graph=dwarf
The second argument being the PID, so we are sure that we are only profiling one process.

I am not sure how it works for gperftools, since the only parameter we have to give is $CPUPROFILE_FREQUENCY. How does gperf knows which process ID it has to monitore? My guess would be that we read during init on which core are running each threads, and then only profile these cores. It might also be possible that when the interruption occurs for reading the HW register, we only record the values if we are interrupting the process threads.
I looked for this in the code, and could not find anything yet. The file linuxthreads.h just check the threads corresponding to the selected process.

I ask this, because I have the impression that the profiling is not working for QNX. When I use gperf on linux, all the samples are coming from my process, but the fact that I get only 10 samples over 170 000 about my process when profiling on QNX tells me that we might be profiling more than what we should be. When we checked the system loading with Momentics, we came to around 1 core utilisation (over 8), so I cannot believe that the 170 000 samples from libc are about idle cores.

@iDings
Copy link
Contributor

iDings commented Sep 22, 2023

I am not sure how it works for gperftools, since the only parameter we have to give is $CPUPROFILE_FREQUENCY. How does gperf knows which process ID it has to monitore? My guess would be that we read during init on which core are running each threads, and then only profile these cores. It might also be possible that when the interruption occurs for reading the HW register, we only record the values if we are interrupting the process threads.

From my understanding, gperftools cpu profiler use timer, when you LD_PRELOAD libprofile.so or link libprofile to your application, and start application with CPUPROFILE env, then the profiling timer will be starting, the timeout signal will be sended to your process. The problem i think is that process with multi-thread, from posix doc async signal:

signal(7) states that the kernel chooses an arbitrary thread to deliver process-directed signals when multiple threads are eligible.

So it is maybe possible QNX always deliver the signal the some threads in process unevenly. this blog have some more informations. And it provide an proftest use SIGPROF, maybe modiffied to SIGALARM to test on QNX.

@iDings
Copy link
Contributor

iDings commented Sep 22, 2023

uftrace is an nice function tracer tool to profile, with some more overheat but nice, but it use some linux features. unsure whether can be ported to QNX or not. The mcount part i think is same from compiler support, and QNX also have kernel trace api but totally different than linux.

@sebaleme
Copy link

sebaleme commented Sep 22, 2023

So it is maybe possible QNX always deliver the signal the some threads in process unevenly.

Ah, you mean that the profiler does not check each threads is equally represented in the trace. So if I have 3 threads, each loading a complete core, it is possible that I will not get the same number of samples for each of them. I thought that when the interruption occurs, we read the call stack from each thread. But maybe not, the interruption is only handled on one core, so only one thread is profiled.

Another thing that I noticed. In my process, we are using some extern SW which creates many threads. So my program has 2 threads that I want to monitore, but due to the external libraries, I end up with around 15 threads within my process. It also explains why I get so many samples from unknown code.

Is there a way to filter out which threads we want to monitore?

@iDings
Copy link
Contributor

iDings commented Sep 22, 2023

I thought that when the interruption occurs, we read the call stack from each thread. But maybe not, the interruption is only handled on one core, so only one thread is profiled.

Interruption mean timer irq? every core have an local timer, and for an application i think when enable timer, the timer is triggered on one core, and when timer expired, kernel deliver an signal to an thread of process, no matter which cpu core the choosed thread is running, kernel set an pending signal for it, and the choosed thread will handle it somewhere for example for linux is when thread have an syscall and when return to userspace will check pending signal.

@iDings
Copy link
Contributor

iDings commented Sep 22, 2023

Is there a way to filter out which threads we want to monitore?

maybe this api, i am unsure.
https://www.qnx.com/developers/docs/7.1/#com.qnx.doc.neutrino.prog/topic/timing_execution_times.html

@alk
Copy link
Contributor

alk commented Sep 22, 2023

Some discussion. Let me answer to few things:

a) yes, wall-clock time profiles are expected to get ticks into idle threads. My use of this mode usually involves heavy use of pprof's --focus and --hide options. I also highly recommend interactive UI which has all those features, since you will usually need quite some filtration to see anything. I usually spawn like this pprof --http=:0 . We also have internal filtering API. It is quite low-level. See ProfilerOptions struct in the header. And yes, this need for extensive filtering is why it is rarely good idea to do wall-clock time profiles.

b) for the balancing issue. I.e. which thread gets signals, that usually works fine, but your experience may vary. Especially on uncommon OS. So I advice you to test (but not speculate).

c) pprof indeed assumes some paths. But it has options. Check pprof --help. I think you'll need PPROF_BINARY_PATH to have it find various .so-s present in your profiles (the binary you already feed it via command line)

d) for the symbolization, the troubles are expected. Because pprof uses addr2line (and possibly other tools) to convert offsets into binaries to functions+lines. So you'll need to point it to your platform's tools. Check PPROF_TOOLS environment variable for that.

Have a nice day.

@alk
Copy link
Contributor

alk commented Oct 31, 2023

Quick update.

a) I took Xiang's first commit. And replaced second with my own.

We now cleanly build for qnx with simple ./configure --host=

b) I got academic qnx license, so I am able to compile. But I haven't yet figured out how to run this qnx code (there is some stuff that supposedly builds vm image but I am unable to make it work).

So we should now have basic malloc (but not as quick as we have on other OSes due to lack decent TLS). This should give us heap profiling too and debug malloc, probably. As noted above, sadly I am unable to test.

No cpu profiling patches landed yet. But as noted above, qnx can only support wall-time profiles, which is not nothing, but far from what people normally expect.

So not sure if anyone wants more stuff added here or we can close the ticket.

@iDings
Copy link
Contributor

iDings commented Nov 5, 2023

@alk Thanks for review and accept my patch.

I had found my patch still have an issue if use ./configure --host= directly as noted on #1443 (comment).
The issuse is If use ./configure --host= directly, the compiler didn't add libc++.so.1 dependency in libtcmalloc .so, the previous solution is to also force config CC and CXX to qcc and q++ when do configure.

Recently, I do some more analysis, and see some clues. @alk if you have any spare time, need your help to figure out the real reason about this issue.
If use ./configure --host= directly, from the link message showd in below picture, the libtool append an empty -L followd by -libc++, this seems the secrect why libc++ is missing in libmalloc.so
image
image

Then, i run the upper link command manually, but remove the magic empty -L. Bingo, libc++ shows up.
image

I checked the generated libtool file, the compiler_lib_search_path had some empty -L, but i don't why this is generated.
image

@iDings
Copy link
Contributor

iDings commented Nov 5, 2023

And if use ./configure CC=qcc CXX=q++ --host=x86_64-pc-nto-qnx7.1.0, the generated libtool have following info, no magic -L

image

@alk
Copy link
Contributor

alk commented Nov 5, 2023

So was I finally able to do vm image. It didn't like some qemu bits on my machine, so needed some help.

So, for that -L thingy, I have it too. It is some sort of libtool bug.

Basically, libtool.m4 writes part of configure script, which inspects the system (and --host arg) and writes ./libtool shell script specific to this specific setup. Then makefiles are using this shell script as a wrapper around building and linking to automagically handle shared libraries.

I did a super-brief inspection and it appears to be inspecting compiler output of some sort and taking those -L args out of it. And then somehow it ends up with those bogus args. Might be worth filing ticket to libtool.

Also, notably CXX=q++ doesn't work for me. Barks on -pthread. But if I manually unbreak ./libtool I am able to produce seemingly working binary. And malloc_bench runs a lot faster than whatever their libc does natively:

# export LD_LIBRARY_PATH=/data/home/root/local/lib 
# ldd ./malloc_bench                               
./malloc_bench:
	libtcmalloc_minimal.so.9 => /data/home/root/local/lib/libtcmalloc_minimal.so.9 (0x58e34cd000)
	libsocket.so.3 => /proc/boot/libsocket.so.3 (0x58e38a2000)
	libc++.so.1 => /system/lib/libc++.so.1 (0x58e3ae3000)
	libm.so.3 => /proc/boot/libm.so.3 (0x58e3dc9000)
	libgcc_s.so.1 => /proc/boot/libgcc_s.so.1 (0x58e4000000)
	libc.so.5 => /proc/boot/libc.so.5 (0x58e4217000)
	libcatalog.so.1 => /system/lib/libcatalog.so.1 (0x58e44d2000)
# ./malloc_bench 
Trying to randomize freelists...done.
Benchmark: bench_fastpath_throughput                        6.439252 nsec
Benchmark: bench_fastpath_throughput                        6.048287 nsec
Benchmark: bench_fastpath_throughput                        6.046205 nsec
Benchmark: bench_fastpath_dependent                         6.040457 nsec
Benchmark: bench_fastpath_dependent                         6.070286 nsec
Benchmark: bench_fastpath_dependent                         6.068448 nsec
Benchmark: bench_fastpath_simple(64)                        5.878824 nsec
Benchmark: bench_fastpath_simple(64)                        5.877041 nsec
Benchmark: bench_fastpath_simple(64)                        5.872837 nsec
Benchmark: bench_fastpath_simple(2048)                      6.237911 nsec
Benchmark: bench_fastpath_simple(2048)                      6.239803 nsec
Benchmark: bench_fastpath_simple(2048)                      6.239803 nsec
Benchmark: bench_fastpath_simple(16384)                     6.240010 nsec
Benchmark: bench_fastpath_simple(16384)                     6.239803 nsec
Benchmark: bench_fastpath_simple(16384)                     6.239803 nsec
Benchmark: bench_fastpath_simple_sized(64)                  6.046205 nsec

# /data/home/root/malloc_bench-sys 
Trying to randomize freelists...done.
Benchmark: bench_fastpath_throughput                        75.044923 nsec
Benchmark: bench_fastpath_throughput                        75.167743 nsec

tcmalloc_unittests are passing too.

\o/

@iDings
Copy link
Contributor

iDings commented Nov 5, 2023

Also, notably CXX=q++ doesn't work for me. Barks on -pthread.

Need config CC and CXX at the same time, otherwise, it will use x86_64-pc-nto-qnx7.1.0-gcc to check whether -pthread works, which is no supported by qcc or q++.

checking whether x86_64-pc-nto-qnx7.1.0-gcc is Clang... no                                                                                                                  
checking whether pthreads work with "-pthread" and "-lpthread"... no                  
checking whether pthreads work with -pthread... yes                                   

@iDings
Copy link
Contributor

iDings commented Nov 5, 2023

And malloc_bench runs a lot faster than whatever their libc does natively.

tcmalloc is better than their builtin malloc obviously and absolutely. 💯

@iDings
Copy link
Contributor

iDings commented Nov 5, 2023

Inspect the qnx gcc verbose message, some -L have space between the path, this seems the reason. but still need check m4/libtool.m4 to why and how to resolve this problem.
image

@iDings
Copy link
Contributor

iDings commented Nov 5, 2023

Below path can solve the probem, seems an unintentional mistake of libtool.m4.

--- libtool.m4  2023-11-06 00:57:57.536162152 +0800
+++ libtool.m4.new      2023-11-06 00:57:32.210979125 +0800
@@ -7560,10 +7560,10 @@
     -L* | -R* | -l*)
        # Some compilers place space between "-{L,R}" and the path.
        # Remove the space.
-       if test x-L = "$p" ||
-          test x-R = "$p"; then
-        prev=$p
-        continue
+       if test x-L = x"$p" ||
+          test x-R = x"$p"; then
+        prev=$p
+        continue
        fi
 
        # Expand the sysroot to ease extracting the directories later.

@iDings
Copy link
Contributor

iDings commented Nov 6, 2023

Had submitted a patch to libtool: https://savannah.gnu.org/patch/?10411.

@alk
Copy link
Contributor

alk commented Nov 6, 2023

Wow. Great. BTW it took me a bit to find what exactly did you change. So posting for others.

x was added on the right side of 'test =' thingy. Indeed, looks like a simple and obvious bug even to my shell-ignorant eye.

I'll see if I can "vendor" this fixed libtool temporarily.

alk added a commit that referenced this issue Nov 6, 2023
@iDings
Copy link
Contributor

iDings commented Nov 7, 2023

This blog https://www.vidarholen.net/contents/blog/?p=1035 have a really nice explanation about x-hack of shell.

@alk
Copy link
Contributor

alk commented Jan 1, 2024

I wonder if we should officially close this perhaps ?

@alk alk closed this as completed Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants