libbpf-tools: add profile #3782

ekyooo · 2022-01-02T07:49:56Z

It is based on Brendan Gregg's BCC profile and much of the code for it is borrowed from offcputime.c and ruqlen.c.
I'm not sure if libbpf-tools requires a profile. So I'm sharing this just to get some feedback before I finish developing of this patch.
Could you give me your opinion? (if necessary or not)
Thank you.

This is an example of usage.

# ./profile -iK
Sampling at 49 Hertz of all threads by kernel stack... Hit Ctrl-C to end.
^C    
    _raw_spin_unlock_irq
    finish_task_switch
    __schedule
    schedule
    schedule_timeout
    msleep
    HDMI21_Tx_EARC_MainThread
    kthread
    ret_from_fork
    -                hdmi21_earc_eng (1427)
        1

    __pi___clean_dcache_area_poc
    dma_direct_map_page
    dma_direct_map_sg
    dma_buf_unified_map
    dma_buf_map_attachment
    kbase_mem_umm_map_attachment
    kbase_mem_import
    kbase_ioctl
    __arm64_compat_sys_ioctl
    el0_svc_common.constprop.0
    el0_svc_compat_handler
    el0_svc_compat
    -                surface-manager (1911)
        4

    __pi___clean_dcache_area_poc
    dma_direct_map_page
    dma_direct_map_sg
    dma_buf_unified_map
    dma_buf_map_attachment
    kbase_mem_umm_map_attachment
    kbase_mem_import
    kbase_ioctl
    __arm64_compat_sys_ioctl
    el0_svc_common.constprop.0
    el0_svc_compat_handler
    el0_svc_compat
    -                tAiToneService (2141)
        12

    [Missed Kernel Stack]
    -                WebAppMgr (3624)
        1

...

This is the result of using the -f (folded) option.

# ./profile -fdK
...
tCMState;low_mem_notify_threshold;__alloc_pages_nodemask;__get_free_pages;__pollwait;unix_poll;sock_poll;do_sys_poll;__arm64_sys_poll;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 2
surface-manager;sock_poll;do_sys_poll;__arm64_sys_poll;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 1
qml-runner;sock_poll;do_sys_poll;__arm64_sys_poll;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 96
mali-cmar-backe;_raw_spin_unlock_irqrestore;kbase_pm_update_cores_state;kbase_pm_do_poweron;kbase_pm_update_active;kbase_hwaccess_pm_gpu_active;kbase_pm_context_active_handle_suspend;kbase_js_sched;kbase_jd_submit;kbase_ioctl;__arm64_compat_sys_ioctl;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 2
surface-manager;__fget;__fget_light;__fdget;sockfd_lookup_light;__sys_sendmsg;__arm64_compat_sys_sendmsg;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 1
...
WARNING: 4 stack traces could not be displayed. Consider increasing --stack-storage-size.

Signed-off-by: Eunseon Lee es.lee@lge.com

davemarchevsky · 2022-01-03T04:38:01Z

I'm not sure if libbpf-tools requires a profile. So I'm sharing this just to get some feedback before I finish developing of this patch.
Could you give me your opinion? (if necessary or not)

In general a conversion of a bcc python API tool to a libbpf tool is desired and very appreciated.

ekyooo · 2022-01-03T05:08:11Z

In general a conversion of a bcc python API tool to a libbpf tool is desired and very appreciated.

Thank you for your reply.
I'll check the failed case and fix it.
Thank you.

ekyooo · 2022-01-11T07:23:08Z

This patch has basic functionality with some limitations of the profile.
This is the test result.

# ./profile -iK
Sampling at 49 Hertz of all threads by kernel stack... Hit Ctrl-C to end.
^C    
    _raw_spin_unlock_irq
    finish_task_switch
    __schedule
    schedule
    schedule_timeout
    msleep
    HDMI21_Tx_EARC_MainThread
    kthread
    ret_from_fork
    -                hdmi21_earc_eng (1427)
        1

    __pi___clean_dcache_area_poc
    dma_direct_map_page
    dma_direct_map_sg
    dma_buf_unified_map
    dma_buf_map_attachment
    kbase_mem_umm_map_attachment
    kbase_mem_import
    kbase_ioctl
    __arm64_compat_sys_ioctl
    el0_svc_common.constprop.0
    el0_svc_compat_handler
    el0_svc_compat
    -                surface-manager (1911)
        4

    __pi___clean_dcache_area_poc
    dma_direct_map_page
    dma_direct_map_sg
    dma_buf_unified_map
    dma_buf_map_attachment
    kbase_mem_umm_map_attachment
    kbase_mem_import
    kbase_ioctl
    __arm64_compat_sys_ioctl
    el0_svc_common.constprop.0
    el0_svc_compat_handler
    el0_svc_compat
    -                tAiToneService (2141)
        12

    [Missed Kernel Stack]
    -                WebAppMgr (3624)
        1

...

This is the result of using the -f (folded) option.

# ./profile -fdK
...
tCMState;low_mem_notify_threshold;__alloc_pages_nodemask;__get_free_pages;__pollwait;unix_poll;sock_poll;do_sys_poll;__arm64_sys_poll;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 2
surface-manager;sock_poll;do_sys_poll;__arm64_sys_poll;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 1
qml-runner;sock_poll;do_sys_poll;__arm64_sys_poll;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 96
mali-cmar-backe;_raw_spin_unlock_irqrestore;kbase_pm_update_cores_state;kbase_pm_do_poweron;kbase_pm_update_active;kbase_hwaccess_pm_gpu_active;kbase_pm_context_active_handle_suspend;kbase_js_sched;kbase_jd_submit;kbase_ioctl;__arm64_compat_sys_ioctl;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 2
surface-manager;__fget;__fget_light;__fdget;sockfd_lookup_light;__sys_sendmsg;__arm64_compat_sys_sendmsg;el0_svc_common.constprop.0;el0_svc_compat_handler;el0_svc_compat 1
...
WARNING: 4 stack traces could not be displayed. Consider increasing --stack-storage-size.

Current limitations are as follows.

kernel_ip
The bpf code in the python profile uses some kernel macros like PAGE_OFFSET to get the kernel_ip, but as I checked it is not included in vmlinux.h or bpf_helpers.h.
I saw the page below about it, could you please let me know if there is another way to use macros if you have?

Unfortunately, BTF (as well as DWARF) doesn’t record #define macros, so some common macros might be missing with vmlinux.h. Most commonly missing ones might be provided as part of libbpf’s bpf_helpers.h (kernel-side "library", provided by libbpf). (https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html)

Alternatively, the code to get kernel_ip is excluded in this patch. How about having this for now and including the kernel_ip code after libbpf-tools supports the macro?

sorting
The printed output is not yet sorted similar to offcputime.
options
some options such as --cgroupmap, --mntnsmap, -a are not yet included.

Thank you for reading this.
If you agree to the restrictions above, I'll ask detailed opinions about the code.
All comments are welcome.
Thank you.

chenhengqi · 2022-01-11T12:31:38Z

libbpf-tools/Makefile

@@ -54,6 +54,7 @@ APPS = \
 	tcpconnlat \
 	tcprtt \
 	vfsstat \
+	profile \


Please keep the list sorted and add an entry to .gitignore.

It's fixed. Thank you.

chenhengqi · 2022-01-11T12:33:09Z

libbpf-tools/profile.bpf.c

@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2021 Wenbo Zhang


Wrong copyright info. Please use C++ style comments.

I edited comment style.
For copyright information, I've now followed opensnoop's license-identifier.
I hope to have one chance to fix it after more code review.
Thank you.

chenhengqi · 2022-01-11T12:35:21Z

libbpf-tools/profile.bpf.c

+
+struct {
+	__uint(type, BPF_MAP_TYPE_STACK_TRACE);
+	__uint(key_size, sizeof(u32));


You can use __type(key, u32); here.

I'm sorry, but the code is throwing a runtime error like below. So I keep the original code.
Thank you.

libbpf: Error in bpf_create_map_xattr(stackmap):Invalid argument(-22). Retrying without BTF.

Do you update your git submodule ?

chenhengqi · 2022-01-11T12:39:41Z

libbpf-tools/profile.bpf.c

+	u64 *valp;
+	static const u64 zero;
+
+	if (include_idle == false && pid == 0)


!include_idle

It's fixed. Thank you.

chenhengqi · 2022-01-11T12:41:49Z

libbpf-tools/profile.bpf.c

+		return 0;
+
+	// create map key
+	struct key_t key = {.pid = tgid};


Declare variables first.

This .pid = tgid is very counterintuitive. I would suggest you use pid and tid instead.

Ok. It's fixed. Thank you.

chenhengqi · 2022-01-11T12:47:08Z

For the PAGE_OFFSET things, please refer to https://facebookmicrosites.github.io/bpf/blog/2020/02/20/bcc-to-libbpf-howto-guide.html#dealing-with-compile-time-ifs-in-bcc .

chenhengqi · 2022-01-18T11:13:45Z

libbpf-tools/profile.bpf.c

+		return 0;
+
+	// create map key
+	struct key_t key = {.pid = pid};


Declare variable first.

Oh, I missed it and fixed it now.
Thank you.

chenhengqi · 2022-01-18T11:19:06Z

libbpf-tools/profile.c

+
+	libbpf_set_print(libbpf_print_fn);
+
+	err = bump_memlock_rlimit();


Please refer to other tools, we don't need this anymore.

I fixed it. Thank you.

ekyooo · 2022-01-18T11:54:00Z

For the PAGE_OFFSET things, please refer to https://facebookmicrosites.github.io/bpf/blog/2020/02/20/bcc-to-libbpf-howto-guide.html#dealing-with-compile-time-ifs-in-bcc .

Thank you for the information and I checked the page.
I think you recommend me to use "libbpf-provided extern Kconfig variables" for PAGE_OFFSET, right? (because struct flavors solution and other helper APIs of "BPF CO-RE reference guide" is not proper for this case, as I understand.)
I'm checking the macros on different architecture,

arm64 : It seems it can use CONFIG_ARM64 and CONFIG_ARM64_VA_BITS from kconfig
arm: It seems it can use CONFIG_PAGE_OFFSET from kconfig
x86: It seems it can use some kconfig, but has dependency with other macros, so could be vulnerable to change...
so I'm not sure if it's the right direction for now.

Alternatively, I'm trying to check if I can refer or use machine__get_kernel_start of perf(linux/tools/perf/util/machine.h) to get kernel start address.

I've shared my status for feedback to you, any opinions are welcome if you have for this.
Thank you.

ekyooo · 2022-02-02T14:53:42Z

libbpf-tools/profile.bpf.c

+
+		// arm64 only supported
+		if (CONFIG_ARM64) {
+			extern __u32 CONFIG_ARM64_VA_BITS __kconfig __weak;


I remember the 'declare variables first.' as you commented.
However, referencing other tools in libbpf-tools, I found some exceptions in cases like this (for variables used only in if conditions).
Could you please share your opinion?

ekyooo · 2022-02-02T15:10:31Z

libbpf-tools/profile.bpf.c

+		// populate extras to fix the kernel stack
+		extern bool CONFIG_ARM64 __kconfig __weak;
+
+		// arm64 only supported


For now, I've added logic to set kernel_ip only for arm64 due to some limitations and difficulty accessing kernel PAGE_OFFSET values at runtime with hope of more support for other architectures with next patches by me or more technical engineer.
Could you please share your opinion?
Thank you.

I wrote code to apply the same logic for x86 but to be honest: I do not understand what does this logic.
I saw it was present in bcc version of this gadget:

bcc/tools/profile.py

Line 178 in 55dcd2e

// populate extras to fix the kernel stack

So, does someone know what do we need this code and what is the meaning of "fixing kernel stack"?

[Here is my code for information:

diff --git a/libbpf-tools/profile.bpf.c b/libbpf-tools/profile.bpf.c index 08ac5687..9e4f5d0f 100644 --- a/libbpf-tools/profile.bpf.c +++ b/libbpf-tools/profile.bpf.c @@ -59,13 +59,13 @@ int do_perf_event(struct bpf_perf_event_data *ctx) { if (key.kern_stack_id >= 0) { // populate extras to fix the kernel stack + u64 ip = PT_REGS_IP(&ctx->regs); + u64 page_offset; // arm64 only supported -#ifdef __TARGET_ARCH_arm64 +#if defined(__TARGET_ARCH_arm64) extern __u32 CONFIG_ARM64_VA_BITS __kconfig __weak; extern bool CONFIG_ARM64_64K_PAGES __kconfig __weak; - u64 ip = PT_REGS_IP(&ctx->regs); - u64 page_offset; u32 va; if (CONFIG_ARM64_VA_BITS) @@ -79,11 +79,19 @@ int do_perf_event(struct bpf_perf_event_data *ctx) { page_offset = (-(1UL << (va))); else page_offset = (0xffffffffffffffffUL - (1UL << (va - 1)) + 1); - - if (ip > page_offset) { +#elif defined(__TARGET_ARCH_x86) +#define __PAGE_OFFSET_BASE_L4 0xffff888000000000UL +#define __PAGE_OFFSET_BASE_L5 0xff11000000000000UL + extern bool CONFIG_DYNAMIC_MEMORY_LAYOUT __kconfig __weak; + extern bool CONFIG_X86_5LEVEL __kconfig __weak; + + if (CONFIG_DYNAMIC_MEMORY_LAYOUT && CONFIG_X86_5LEVEL) + page_offset = __PAGE_OFFSET_BASE_L5; + else + page_offset = __PAGE_OFFSET_BASE_L4; +#endif /* __TARGET_ARCH_x86 */ + if (ip > page_offset) key.kernel_ip = ip; - } -#endif } valp = bpf_map_lookup_or_try_init(&counts, &key, &zero);

]

FYI, as I understand, It's to append ip address of the register to the top of the kernel stack trace to provide more detailed information to the user. page_offset is used to determine whether ip is a kernel address.

Thank you very much for the code for x86. But unfortunately I changed the code from yesterday...
because

If the target doesn't have Kconfig, the profile won't work at all.

On some versions of Linux, page_offset is not an effective value for this purpose.
Could you please check the updated code?
I'm really sorry and thank you.

I'm really sorry and thank you.

No problem, your actual version is simpler (and so better) than the previous and my proposal for x86.

Nonetheless, I do not really understand the goal of this code (or how it is printed).
Indeed, from my tests, I got the following output:

$ sudo ./profile -K 5 Sampling at 49 Hertz of all threads by kernel stack for 5 secs. #0 0xffffffff98ed69cb arch_local_irq_enable+0xb #1 0xffffffff99a00ec3 __schedule+0x773 #2 0xffffffff99a0109f schedule+0x4f #3 0xffffffff98edf5f7 do_sched_yield+0x87 #4 0xffffffff98edf62e __x64_sys_sched_yield+0xe #5 0xffffffff999f3b21 do_syscall_64+0x61 #6 0xffffffff99c0007c entry_SYSCALL_64_after_hwframe+0x44 kernel_ip#7 0xffffffff98ed69cb arch_local_irq_enable+0xb ...

You can see the entry where k->kernel_ip is not NULL begining with kernel_ip.
But this entry is the same as entry #0, so this is why I do not understand this code (maybe I am not using the tool like I should do).

First of all, I fixed kernel_ip to be at the top of the call stack. Thank you so much for finding it.
And yes, as you said, I found a duplicate with the first entry in the call stack. I will check and update my opinion on it. I have ever seen this before and it's was produced with some conditions in my faint memory. please let me check it. Thank you.

Where I see it before was a result of bcc docker. (https://github.com/iovisor/bcc/blob/master/QUICKSTART.md)
These are the test results for profile.py. And it has the same kernel_ip and call stack entry #0..

... do_syscall_64 // kernel_ip do_syscall_64 // call stack entry #0 entry_SYSCALL_64_after_hwframe [unknown] [unknown] - qemu-system-x86 (36165) 7 native_write_msr_safe // kernel_ip native_write_msr_safe // call stack entry #0 kvm_set_shared_msr vmx_save_host_state ...

@yonghong-song
If you have the same experience, could you please share your opinion on the equality of call stack entry #0 and kernel ip in profile.py?
Thank you.

Hi.

This code was indeed already present in the bcc version of this tool, I just do not understand why we need this (maybe it was mandatory with some older kernels because we cannot get the full kernel stack from the BPF_MAP_TYPE_STACK_TRACE?).

Thank you for your investigation though :)

Best regards.

ekyooo · 2022-02-02T15:13:36Z

libbpf-tools/profile.bpf.c

+			if (LINUX_KERNEL_VERSION >= KERNEL_VERSION(5, 4, 0))
+				page_offset = (-(1UL << (va)));
+			else
+				page_offset = (0xffffffffffffffffUL - (1UL << (va - 1)) + 1);


Since v5.4, the definition of PAGE_OFFSET has changed and it has been divided into two cases.
If you don't need it, I'd appreciate your feedback.
Thank you.

ekyooo · 2022-02-15T09:55:32Z

I found and fixed a bug in the first patch. I will update soon. Sorry and thank you.

ekyooo · 2022-02-20T11:24:08Z

This is the test result of c88462c
It's sorted by count.

# profile 
Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
^C    _raw_write_unlock_irq
    ep_poll
    do_epoll_wait
    __arm64_sys_epoll_pwait
    el0_svc_common.constprop.0
    el0_svc_handler
    el0_svc
    _raw_write_unlock_irq
    epoll_pwait
    sd_event_wait
    sd_event_run
    sd_event_loop
    [unknown]
    __libc_start_main
    [unknown]
    -                systemd-oomd (825)
        2

    _raw_spin_unlock_irq
    do_timerfd_settime
    __arm64_sys_timerfd_settime
    el0_svc_common.constprop.0
    el0_svc_handler
    el0_svc
    _raw_spin_unlock_irq
    timerfd_settime
    sd_event_prepare
    sd_event_run
    sd_event_loop
    [unknown]
    __libc_start_main
    [unknown]
    -                systemd-oomd (825)
        2

    _raw_spin_unlock_irqrestore
    finish_wait
    __skb_wait_for_more_packets
    __skb_recv_datagram
    skb_recv_datagram
    raw_recvmsg
    inet_recvmsg
    sock_recvmsg
    __sys_recvfrom
    __arm64_sys_recvfrom
    el0_svc_common.constprop.0
    el0_svc_handler
    el0_svc
    _raw_spin_unlock_irqrestore
    recvfrom
    [unknown]
    __libc_start_main
    [unknown]
    -                rdisc (346)
        1

ekyooo · 2022-02-20T11:28:50Z

This is the test result of 5a07dfe

Before:

  # profile -d
      psiginfo
      vscanf
      __snprintf_chk
      [unknown]
      [unknown]
      [unknown]
      [unknown]
      [unknown]
      sd_event_exit
      sd_event_dispatch
      sd_event_run
      [unknown]
      __libc_start_main
      [unknown]
      -                systemd-journal (204)
          1

      xas_load
      xas_find
      filemap_map_pages
      __handle_mm_fault
      handle_mm_fault
      do_page_fault
      do_translation_fault
      do_mem_abort
      do_el0_ia_bp_hardening
      el0_ia
      xas_load
      --
  failed to get syms
      -                PmLogCtl (138757)
        1

After:

  # profile -d
      #0  0xffffffc01018b7e8 __arm64_sys_clock_nanosleep+0x0
      #1  0xffffffc01009a93c el0_svc_handler+0x34
      #2  0xffffffc010084a08 el0_svc+0x8
      #3  0xffffffc01018b7e8 __arm64_sys_clock_nanosleep+0x0
      --
      #4  0x0000007fa0bffd14 clock_nanosleep+0x94 (/usr/lib/libc-2.31.so+0x9ed14)
      #5  0x0000007fa0c0530c nanosleep+0x1c (/usr/lib/libc-2.31.so+0xa430c)
      #6  0x0000007fa0c051e4 sleep+0x34 (/usr/lib/libc-2.31.so+0xa41e4)
      #7  0x000000558a5a9608 flb_loop+0x28 (/usr/bin/fluent-bit+0x52608)
      #8  0x000000558a59f1c4 flb_main+0xa84 (/usr/bin/fluent-bit+0x481c4)
      #9  0x0000007fa0b85124 __libc_start_main+0xe4 (/usr/lib/libc-2.31.so+0x24124)
      #10 0x000000558a59d828 _start+0x34 (/usr/bin/fluent-bit+0x46828)
      -                fluent-bit (1238)
          1

      #0  0xffffffc01027daa4 generic_copy_file_checks+0x334
      #1  0xffffffc0102ba634 __handle_mm_fault+0x8dc
      #2  0xffffffc0102baa20 handle_mm_fault+0x168
      #3  0xffffffc010ad23c0 do_page_fault+0x148
      #4  0xffffffc010ad27c0 do_translation_fault+0xb0
      #5  0xffffffc0100816b0 do_mem_abort+0x50
      #6  0xffffffc0100843b0 el0_da+0x1c
      #7  0xffffffc01027daa4 generic_copy_file_checks+0x334
      --
      #8  0x0000007f8dc12648 [unknown]
      #9  0x0000007f8dc0aef8 [unknown]
      #10 0x0000007f8dc1c990 [unknown]
      #11 0x0000007f8dc08b0c [unknown]
      #12 0x0000007f8dc08e48 [unknown]
      #13 0x0000007f8dc081c8 [unknown]
      -                PmLogCtl (2412)
          1

eiffel-fl

Hi.

Thank you for translating this tool!
I tested it and it works, I have just some comments (regarding things I do not understand).
Note that, I am simple contributor here, my review will not have consequence regarding the merging.

Best regards.

eiffel-fl · 2022-03-16T14:18:19Z

libbpf-tools/profile.bpf.c

+		if (CONFIG_ARM64_VA_BITS)
+			va = CONFIG_ARM64_VA_BITS;
+		else if (CONFIG_ARM64_64K_PAGES)
+			va = 42;


Can you please indicate what this value is.
I guess it is the number of bit used for a page virtual address, but a comment (or a define) would be helpful.

same as above

eiffel-fl · 2022-03-16T16:20:14Z

libbpf-tools/profile.bpf.c

+		// populate extras to fix the kernel stack
+		extern bool CONFIG_ARM64 __kconfig __weak;
+
+		// arm64 only supported


I wrote code to apply the same logic for x86 but to be honest: I do not understand what does this logic.
I saw it was present in bcc version of this gadget:

bcc/tools/profile.py

Line 178 in 55dcd2e

// populate extras to fix the kernel stack

So, does someone know what do we need this code and what is the meaning of "fixing kernel stack"?

[Here is my code for information:

diff --git a/libbpf-tools/profile.bpf.c b/libbpf-tools/profile.bpf.c index 08ac5687..9e4f5d0f 100644 --- a/libbpf-tools/profile.bpf.c +++ b/libbpf-tools/profile.bpf.c @@ -59,13 +59,13 @@ int do_perf_event(struct bpf_perf_event_data *ctx) { if (key.kern_stack_id >= 0) { // populate extras to fix the kernel stack + u64 ip = PT_REGS_IP(&ctx->regs); + u64 page_offset; // arm64 only supported -#ifdef __TARGET_ARCH_arm64 +#if defined(__TARGET_ARCH_arm64) extern __u32 CONFIG_ARM64_VA_BITS __kconfig __weak; extern bool CONFIG_ARM64_64K_PAGES __kconfig __weak; - u64 ip = PT_REGS_IP(&ctx->regs); - u64 page_offset; u32 va; if (CONFIG_ARM64_VA_BITS) @@ -79,11 +79,19 @@ int do_perf_event(struct bpf_perf_event_data *ctx) { page_offset = (-(1UL << (va))); else page_offset = (0xffffffffffffffffUL - (1UL << (va - 1)) + 1); - - if (ip > page_offset) { +#elif defined(__TARGET_ARCH_x86) +#define __PAGE_OFFSET_BASE_L4 0xffff888000000000UL +#define __PAGE_OFFSET_BASE_L5 0xff11000000000000UL + extern bool CONFIG_DYNAMIC_MEMORY_LAYOUT __kconfig __weak; + extern bool CONFIG_X86_5LEVEL __kconfig __weak; + + if (CONFIG_DYNAMIC_MEMORY_LAYOUT && CONFIG_X86_5LEVEL) + page_offset = __PAGE_OFFSET_BASE_L5; + else + page_offset = __PAGE_OFFSET_BASE_L4; +#endif /* __TARGET_ARCH_x86 */ + if (ip > page_offset) key.kernel_ip = ip; - } -#endif } valp = bpf_map_lookup_or_try_init(&counts, &key, &zero);

]

eiffel-fl · 2022-03-16T16:33:02Z

libbpf-tools/profile.c

@@ -250,13 +252,65 @@ static int cmp_counts(const void *dx, const void *dy)
 	return x > y ? -1 : !(x == y);
 }

+static bool batch_map_ops = true; /* hope for the best */


I do not understand this code.
You want to detect if bpf_map_lookup_batch() is available, right?

Yes batch_map_ops is initially set to true and then changed to false if bpf_map_lookup_batch() is not supported.
In fact, it was referenced in syscount.c. :). I couldn't figure out the meaning of the comments alone. So if you like, I'll edit the comment to be more detailed.

Oh sorry I did not see you set this to false later, this totally makes sense then!
Regarding the comment, if you think it is better to edit it, then do, otherwise I am fine with this, this is a cool feature.

ekyooo

Thank you very much for your test, review and opinion.

ekyooo · 2022-03-17T08:53:44Z

libbpf-tools/profile.c

@@ -250,13 +252,65 @@ static int cmp_counts(const void *dx, const void *dy)
 	return x > y ? -1 : !(x == y);
 }

+static bool batch_map_ops = true; /* hope for the best */


Yes batch_map_ops is initially set to true and then changed to false if bpf_map_lookup_batch() is not supported.
In fact, it was referenced in syscount.c. :). I couldn't figure out the meaning of the comments alone. So if you like, I'll edit the comment to be more detailed.

ekyooo · 2022-03-17T09:08:48Z

libbpf-tools/profile.bpf.c

+		// populate extras to fix the kernel stack
+		extern bool CONFIG_ARM64 __kconfig __weak;
+
+		// arm64 only supported


FYI, as I understand, It's to append ip address of the register to the top of the kernel stack trace to provide more detailed information to the user. page_offset is used to determine whether ip is a kernel address.

Thank you very much for the code for x86. But unfortunately I changed the code from yesterday...
because

If the target doesn't have Kconfig, the profile won't work at all.

On some versions of Linux, page_offset is not an effective value for this purpose.
Could you please check the updated code?
I'm really sorry and thank you.

ekyooo · 2022-03-17T09:08:54Z

libbpf-tools/profile.bpf.c

+		if (CONFIG_ARM64_VA_BITS)
+			va = CONFIG_ARM64_VA_BITS;
+		else if (CONFIG_ARM64_64K_PAGES)
+			va = 42;


same as above

ekyooo · 2022-07-20T06:00:52Z

Hi @ekyooo is it possible to sort the output like perf top like below? the top function cpu cycles percentage for processes/kernel, it is easier to understand :-)

@vincentmli, @eiffel-fl
It looks useful, I'll add it.
Thank you.

vincentmli · 2022-07-20T13:37:28Z

Hi @ekyooo @eiffel-fl I tested the profile, I noticed if I want to profile a process in container, profile is unable to get the process symbols in container, if I run the process in host, not in container, profile is able to get symbol for the process, the process has gcc flag -fno-omit-frame-pointer compiled that contains symbols, it appears all existing libbpf-tools tool are not name space aware, right. it would be nice to have profile support profiling process running in container with symbols resolved, maybe this is can be another follow-up PR once this PR is merged? by the way, I tested perf tool, perf is able to profile process in container and get symbol resolved.

eiffel-fl · 2022-07-20T14:16:41Z

it appears all existing libbpf-tools tool are not name space aware, right.

It is indeed the case but I am not sure if the goals of this tools are to be namespace aware.
In Inspektor Gadget, we modified some of the libbpf tools to be namespace aware (e.g. profile).

vincentmli · 2022-07-20T15:33:12Z

It is indeed the case but I am not sure if the goals of this tools are to be namespace aware. In Inspektor Gadget, we modified some of the libbpf tools to be namespace aware (e.g. profile).

right, I understand these tools in the beginning are not aimed for container environment, just thought it might be good to add it now since container environment like k8s is getting popular :-). your profile example is very good, I guess the changes you made can be directly copied/pasted/modified to existing libbpf-tools and it should work, right?

eiffel-fl · 2022-07-21T08:36:06Z

I guess the changes you made can be directly copied/pasted/modified to existing libbpf-tools and it should work, right?

I think you will need to add the mntns map handling in profile.c and also to add an option to this program, like --mnt-ns-id, to then load the mount namespace ID in the corresponding map so filtering is done in eBPF.
I will take a deeper look to this to see how it translates in terms of implementation.

ekyooo · 2022-08-04T11:19:38Z

Could you review it? Or are you waiting for me to do something?

ekyooo · 2023-12-20T09:39:04Z

It's just rebased on the master branch.
I am not requesting a review now.

ekyooo · 2024-01-05T08:36:37Z

This patch includes the following changes:

Reduced the nesting of the "print_counts" functions by separating sub-functions.
Removed the logic related to "adding kernel_ip." It can be added again if we find it necessary.
Show symbol names only in stack trace for code simplicity. An expanded form of the stack trace (including module information and symbol offsets) will be reapplied after the current review is complete.
Fixed other issues. (sorry for missing your comment)

This version of the code looks cleaner to read. Could you please review this?
Thank you.

@yezhem
Multi-process support has been implemented in Profile.
However, in order to simplify the current code for review, I plan to apply it as an additional patch after the current review is completed.
Thank you.

eiffel-fl

Hi!

Thank you for rebasing and still working on this PR despite it was first opened two years ago!
I will take a deeper look and test it later, as I want to fully understand the modifications you brought to the code. For now, I have some comments.

Best regards.

eiffel-fl · 2024-01-05T14:54:36Z

libbpf-tools/profile.c

+	case 'U':
+		env.user_stacks_only = true;
+		break;
+	case 'K':
+		env.kernel_stacks_only = true;
+		break;


In my usage of your tool, I am currently modifying this to have only one flag:
inspektor-gadget/inspektor-gadget#1417

Thank you for your good opinion.
But applying it in this patch make difference with bcc python Profile.
How about applying it to both the Python and C versions through a separate patch?

@chenhengqi
Could you please share your thoughts on this?

eiffel-fl · 2024-01-05T14:59:11Z

libbpf-tools/profile.c

+		items[i].k.pid = keys[i].pid;
+		items[i].k.user_stack_id = keys[i].user_stack_id;
+		items[i].k.kern_stack_id = keys[i].kern_stack_id;
+		strncpy(items[i].k.name, keys[i].name, TASK_COMM_LEN);


As you are using the key_t both in userspace and eBPF, you can use memcpy here.

I changed it. Thank you.

eiffel-fl · 2024-01-05T15:04:01Z

libbpf-tools/profile.c

+static void print_count(struct key_t *event, __u64 count, int sfd,
+			struct ksyms *ksyms, struct syms_cache *syms_cache)
+{
+	unsigned long *ip;
+
+	ip = calloc(env.perf_max_stack_depth, sizeof(*ip));
+	if (!ip) {
+		fprintf(stderr, "failed to alloc ip\n");
+		return;
+	}
+
+	/* kernel stack */
+	if (!env.user_stacks_only && !STACK_ID_EFAULT(event->kern_stack_id)) {
+		if (bpf_map_lookup_elem(sfd, &event->kern_stack_id, ip) != 0)
+			printf("    [Missed Kernel Stack]\n");
+		else
+			print_kernel_stacktrace(ip, ksyms);
+	}
+
+	/* user stack */
+	if (!env.kernel_stacks_only && !STACK_ID_EFAULT(event->user_stack_id)) {
+		if (NEED_DELIMITER(env.delimiter, event->user_stack_id, event->kern_stack_id))
+			printf("    --\n");
+
+		if (bpf_map_lookup_elem(sfd, &event->user_stack_id, ip) != 0)
+			printf("    [Missed User Stack]\n");
+		else
+			print_user_stacktrace(ip, syms_cache, event->pid);
+	}
+
+	printf("    %-16s %s (%d)\n", "-", event->name, event->pid);
+	printf("        %lld\n\n", count);
+
+	free(ip);
+}
+
+static void print_count_folded(struct key_t *event, __u64 count, int sfd,
+			       struct ksyms *ksyms, struct syms_cache *syms_cache)
+{
+	unsigned long *ip;
+
+	ip = calloc(env.perf_max_stack_depth, sizeof(*ip));
+	if (!ip) {
+		fprintf(stderr, "failed to alloc ip\n");
+		return;
+	}
+
+	printf("%s", event->name);
+
+	/* user stack */
+	if (!env.kernel_stacks_only && !STACK_ID_EFAULT(event->user_stack_id)) {
+		if (bpf_map_lookup_elem(sfd, &event->user_stack_id, ip) != 0)
+			printf(";[Missed User Stack]");
+		else
+			print_user_stacktrace_folded(ip, syms_cache, event->pid);
+	}
+
+	/* kernel stack */
+	if (!env.user_stacks_only && !STACK_ID_EFAULT(event->kern_stack_id)) {
+		if (NEED_DELIMITER(env.delimiter, event->user_stack_id, event->kern_stack_id))
+			printf(";-");
+
+		if (bpf_map_lookup_elem(sfd, &event->kern_stack_id, ip) != 0)
+			printf(";[Missed Kernel Stack]");
+		else
+			print_kernel_stacktrace_folded(ip, ksyms);
+	}
+
+	printf(" %lld\n", count);
+
+	free(ip);
+}


I think you can factorize this code to reduce the duplicated one.
Maybe, we can check the folded argument in the print_*_stacktrace() functions?

Maybe, we can check the folded argument in the print_*_stacktrace() functions?

It can reduce duplication slightly, but it cannot reduce meaningfully because iterating direction and format are different.

In principle, it's good to separate functions to handle data and representation. For separating this, I can try using iterator that takes addresses and syms and then return resolved symbols, or I can try returning resolved symbols through another array.
But It can make complex code in C language or need more memory usage.

Any thoughts on using iterators for this? Shall I try it?

ekyooo · 2024-01-11T11:27:33Z

I discovered and fixed an issue where some stack traces had zero counts.
Sorry for the issue.

And I'm working on removing duplication.
Thank you.

ekyooo · 2024-01-18T06:57:50Z

I think you can factorize this code to reduce the duplicated one. Maybe, we can check the folded argument in the print_*_stacktrace() functions?

I reduced duplicate code and excluded the folded option for the sake of simplicity. The folded option can be added in the next step if needed.
I further applied additional modifications to enhance code simplicity and clarity.
Could you please review this?
Thank you.

eiffel-fl · 2024-01-22T03:46:53Z

Hi!

I think you can factorize this code to reduce the duplicated one. Maybe, we can check the folded argument in the print_*_stacktrace() functions?

I reduced duplicate code and excluded the folded option for the sake of simplicity. The folded option can be added in the next step if needed. I further applied additional modifications to enhance code simplicity and clarity. Could you please review this? Thank you.

Thank you! I will take a look at it during this week!

eiffel-fl

Hi!

I tested it and it works perfectly:

$ yes > /dev/null &
$ sudo ./profile -p $!
...
        1

    vfs_write
    ksys_write
    __x64_sys_write
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    write
    [unknown]
    -                yes (36571)
        1
...

I just found one small nit and I have a question with regard to resource usage but in my opinion, this is ready to go.

Best regards.

eiffel-fl · 2024-01-25T07:14:58Z

libbpf-tools/profile.c

+	if (x > y)
+		return -1;
+
+	if (x < y)
+		return 1;
+
+	return 0;


Suggested change

if (x > y)

return -1;

if (x < y)

return 1;

return 0;

return y - x;

qsort function requires than something greater, equal or less than 0 is returned, so using this should do the trick.

Thank you for the feedback.
I have applied it.

eiffel-fl · 2024-01-25T07:32:08Z

libbpf-tools/profile.c

+			fprintf(stderr, "failed to attach perf event on cpu: "
+				"%d\n", i);
+			links[i] = NULL;
+			close(fd);


You do not call close() in cleanup, I suppose bpf_link__destroy() also closes the underlying perf event?

Yes, I checked it again and confirmed that the link->detach called in bpf_link__destroy() closes the file descriptor. Thank you.

Signed-off-by: Eunseon Lee <es.lee@lge.com>

ekyooo

It's been updated with your feedback along with some other minor changes.
Thank you.

ekyooo · 2024-01-25T09:53:04Z

libbpf-tools/profile.c

+			fprintf(stderr, "failed to attach perf event on cpu: "
+				"%d\n", i);
+			links[i] = NULL;
+			close(fd);


Yes, I checked it again and confirmed that the link->detach called in bpf_link__destroy() closes the file descriptor. Thank you.

ekyooo · 2024-01-25T09:54:42Z

libbpf-tools/profile.c

+	if (x > y)
+		return -1;
+
+	if (x < y)
+		return 1;
+
+	return 0;


Thank you for the feedback.
I have applied it.

Bojun-Seo

I have tested and approve this pull request.
Followings are the steps I did on Ubuntu 22.04.3 LTS, x86_64 machine.

/* Build profile */
$ git clone https://github.com/ekyooo/bcc.git -b profile & cd bcc
$ mkdir build & cd build
$ cmake ..
$ cd ../libbpf-tools
$ make profile

/* Build test program */
$ cat test.c
#include <stdlib.h>

unsigned int foo(unsigned int i) {
  unsigned int tmp = i + 50;
  return tmp * 3;
}

int main(int argc, char* argv[]) {
  unsigned int i = 0;
  while (i >= 0) {
    i = foo(i);
    ++i;
  }
  return 0;
}
$ gcc test.c

/* Run test program and profile */
$ ./a.out &
$ sudo ./profile -p 134827 -U
Sampling at 49 Hertz of PID 134827 by user... Hit Ctrl-C to end.
^C    main
    [unknown]
    -                a.out (134827)
        611

    main
    [unknown]
    -                a.out (134827)
        278

    foo
    main
    [unknown]
    -                a.out (134827)
        254

    foo
    main
    [unknown]
    -                a.out (134827)
        232

    main
    [unknown]
    -                a.out (134827)
        221

    foo
    main
    [unknown]
    -                a.out (134827)
        199

    foo
    main
    [unknown]
    -                a.out (134827)
        123

    foo
    main
    [unknown]
    -                a.out (134827)
        117

    foo
    main
    [unknown]
    -                a.out (134827)
        114

    foo
    main
    [unknown]
    -                a.out (134827)
        113

    foo
    [unknown]
    -                a.out (134827)
        105

/* Run profile.py and compare the result */
$ sudo python3 ../tools/profile.py -p 134827 -U
Sampling at 49 Hertz of PID [134827] by user stack... Hit Ctrl-C to end.

^C
    foo
    main
    __libc_start_call_main
    -                a.out (134827)
        3

    foo
    main
    __libc_start_call_main
    -                a.out (134827)
        3

    foo
    main
    __libc_start_call_main
    -                a.out (134827)
        4

    foo
    main
    __libc_start_call_main
    -                a.out (134827)
        5

    foo
    __libc_start_call_main
    -                a.out (134827)
        5

    foo
    main
    __libc_start_call_main
    -                a.out (134827)
        6

    main
    __libc_start_call_main
    -                a.out (134827)
        7

    foo
    main
    __libc_start_call_main
    -                a.out (134827)
        8

    main
    __libc_start_call_main
    -                a.out (134827)
        9

    foo
    main
    __libc_start_call_main
    -                a.out (134827)
        11

    main
    __libc_start_call_main
    -                a.out (134827)
        27

It looks same except unknown printing. And unknown printing issue is related with #4849. Which means it is not directly related with this PR.

ekyooo · 2024-02-02T07:29:09Z

@eiffel-fl @Bojun-Seo
Thank you for the review, testing and approving!

Can anybody take a look and possibly merge this?

yonghong-song · 2024-02-07T03:37:40Z

Thanks for detailed reviews by @chenhengqi @eiffel-fl @Bojun-Seo. The patch looks good to me. I see in the current patch bpf_map_get_next_key() is used. The kernel has implemented batch lookup which should be more reliable. But this can be done as a followup.

ekyooo · 2024-02-07T03:55:51Z

The kernel has implemented batch lookup which should be more reliable. But this can be done as a followup.

I have code that uses batch lookup. I will apply it with the next patch.

@yonghong-song @chenhengqi @eiffel-fl @Bojun-Seo
Thank you for all your efforts.
It's all thanks to you.

eiffel-fl · 2024-02-07T03:57:11Z

You are more than welcome! It was a pleasure to work with you!

ekyooo · 2024-04-16T19:33:58Z

@eiffel-fl
Are you in OSS NA? If so, I'd like to meet you for a moment.

eiffel-fl · 2024-04-18T11:46:48Z

Hi!

@eiffel-fl Are you in OSS NA? If so, I'd like to meet you for a moment.

Sadly no, I will mail you on your Signed-off-by mail the next time I will attend an event (I do know when it will be nonetheless).

Best regards and enjoy OSS :D!

ekyooo requested review from 4ast, brendangregg, drzaeus77, goldshtn and yonghong-song as code owners January 2, 2022 07:49

ekyooo force-pushed the profile branch from dc805ce to 57ada1d Compare January 2, 2022 08:52

ekyooo force-pushed the profile branch from 57ada1d to 04b3fbe Compare January 11, 2022 05:08

ekyooo requested a review from davemarchevsky as a code owner January 11, 2022 05:08

chenhengqi reviewed Jan 11, 2022

View reviewed changes

ekyooo force-pushed the profile branch from 04b3fbe to c7d7599 Compare January 18, 2022 05:50

chenhengqi reviewed Jan 18, 2022

View reviewed changes

ekyooo force-pushed the profile branch 2 times, most recently from 68df9ba to 3c5a967 Compare January 18, 2022 12:20

ekyooo force-pushed the profile branch from 3c5a967 to 8ce52c6 Compare February 2, 2022 14:38

ekyooo commented Feb 2, 2022

View reviewed changes

ekyooo force-pushed the profile branch 2 times, most recently from 8033322 to 5a07dfe Compare February 20, 2022 11:17

mauriciovasquezbernal mentioned this pull request Feb 22, 2022

Implement CRD support and CORE version for profile gadget inspektor-gadget/inspektor-gadget#533

Closed

ekyooo force-pushed the profile branch from 5a07dfe to 3e496a0 Compare March 12, 2022 09:54

eiffel-fl reviewed Mar 16, 2022

View reviewed changes

ekyooo commented Mar 17, 2022

View reviewed changes

ekyooo force-pushed the profile branch from 3e496a0 to 4517397 Compare March 17, 2022 09:10

ekyooo force-pushed the profile branch from 81a5e70 to 22daf15 Compare December 20, 2023 09:26

ekyooo force-pushed the profile branch from 22daf15 to ca44f69 Compare December 20, 2023 10:36

ekyooo force-pushed the profile branch from ca44f69 to 30592ea Compare January 5, 2024 08:09

eiffel-fl reviewed Jan 5, 2024

View reviewed changes

ekyooo force-pushed the profile branch 2 times, most recently from 992702f to 9098638 Compare January 11, 2024 11:22

ekyooo force-pushed the profile branch from 9098638 to d762776 Compare January 18, 2024 06:21

eiffel-fl approved these changes Jan 25, 2024

View reviewed changes

libbpf-tools: add CO-RE profile

be5faca

Signed-off-by: Eunseon Lee <es.lee@lge.com>

ekyooo force-pushed the profile branch from d762776 to be5faca Compare January 25, 2024 09:56

ekyooo commented Jan 25, 2024

View reviewed changes

Bojun-Seo approved these changes Feb 1, 2024

View reviewed changes

yonghong-song merged commit b14c463 into iovisor:master Feb 7, 2024
1 of 12 checks passed

		@@ -0,0 +1,67 @@
		// SPDX-License-Identifier: GPL-2.0
		// Copyright (c) 2021 Wenbo Zhang


		libbpf_set_print(libbpf_print_fn);

		err = bump_memlock_rlimit();

libbpf-tools: add profile #3782

libbpf-tools: add profile #3782

Conversation

ekyooo commented Jan 2, 2022 • edited Loading

davemarchevsky commented Jan 3, 2022

ekyooo commented Jan 3, 2022 • edited Loading

ekyooo commented Jan 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenhengqi commented Jan 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekyooo commented Jan 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekyooo commented Feb 15, 2022

ekyooo commented Feb 20, 2022

ekyooo commented Feb 20, 2022 • edited Loading

eiffel-fl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eiffel-fl Mar 17, 2022 • edited Loading

Choose a reason for hiding this comment

ekyooo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekyooo commented Jul 20, 2022

vincentmli commented Jul 20, 2022

eiffel-fl commented Jul 20, 2022 • edited Loading

vincentmli commented Jul 20, 2022

eiffel-fl commented Jul 21, 2022

ekyooo commented Aug 4, 2022

ekyooo commented Dec 20, 2023

ekyooo commented Jan 5, 2024

eiffel-fl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekyooo Jan 8, 2024 • edited Loading

Choose a reason for hiding this comment

ekyooo commented Jan 11, 2024 • edited Loading

ekyooo commented Jan 18, 2024

eiffel-fl commented Jan 22, 2024

eiffel-fl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekyooo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekyooo commented Jan 2, 2022 •

edited

Loading

ekyooo commented Jan 3, 2022 •

edited

Loading

ekyooo commented Feb 20, 2022 •

edited

Loading

eiffel-fl Mar 17, 2022 •

edited

Loading

eiffel-fl commented Jul 20, 2022 •

edited

Loading

ekyooo Jan 8, 2024 •

edited

Loading

ekyooo commented Jan 11, 2024 •

edited

Loading

ekyooo commented Feb 7, 2024 •

edited

Loading