Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flowtop segfault #183

Open
Safari77 opened this issue Nov 28, 2017 · 22 comments
Open

flowtop segfault #183

Safari77 opened this issue Nov 28, 2017 · 22 comments

Comments

@Safari77
Copy link

Fedora, x86_64, gcc 7.2.1-2, userspace-rcu 0.10.0, netsniff-ng 0.6.3.

(gdb) bt
#0  0x000055555555b3a9 in collector_refresh_procs () at flowtop.c:1666
#1  collector (null=<optimized out>) at flowtop.c:1858
#2  0x00007ffff734336d in start_thread (arg=0x7ffff34ca700) at pthread_create.c:456
#3  0x00007ffff6c2ee1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
(gdb) frame
#0  0x000055555555b3a9 in collector_refresh_procs () at flowtop.c:1666
1666                            p->rate_bytes_src += n->rate_bytes_src;
(gdb) p *p
$2 = {entry = {next = 0x7fffec006b90, prev = 0x555555768130 <proc_list>}, flows = {next = 0x7fffec014e80, prev = 0x7fffec0010a0}, rcu = {next = {
      next = 0x0}, func = 0x0}, last_update = {tv_sec = 1511884338, tv_usec = 867334}, pid = 3959, name = "ssh", '\000' <repeats 252 times>, 
  pkts_src = 9194, bytes_src = 658954, pkts_dst = 16054, bytes_dst = 4086234, rate_bytes_src = 7.2928881719315876e-304, 
  rate_bytes_dst = 1.6304166312761136e-322, rate_pkts_src = 6.9533392299152175e-310, rate_pkts_dst = 6.9533392297488162e-310, flows_count = 2}
(gdb) p n
$3 = (struct flow_entry *) 0x0
@Safari77
Copy link
Author

==2857==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a000013888 at pc 0x000000406045 bp 0x7fee84ea00d0 sp 0x7fee84ea00c0
WRITE of size 8 at 0x61a000013888 thread T1
    #0 0x406044 in cds_list_add /usr/include/urcu/list.h:53
    #1 0x407d49 in flow_entry_find_process /usr/src/redhat/BUILD/netsniff-ng-0.6.3/flowtop.c:555
    #2 0x409e60 in flow_entry_get_extended /usr/src/redhat/BUILD/netsniff-ng-0.6.3/flowtop.c:834
    #3 0x407271 in flow_list_new_entry /usr/src/redhat/BUILD/netsniff-ng-0.6.3/flowtop.c:442
    #4 0x40f65a in flow_dump_cb /usr/src/redhat/BUILD/netsniff-ng-0.6.3/flowtop.c:1786
    #5 0x7fee8c9e031a in __callback (/lib64/libnetfilter_conntrack.so.3+0x631a)
    #6 0x7fee8aa7ef5d  (/lib64/libnfnetlink.so.0+0x2f5d)
    #7 0x7fee8aa7f702 in nfnl_process (/lib64/libnfnetlink.so.0+0x3702)
    #8 0x7fee8aa7fa6b in nfnl_catch (/lib64/libnfnetlink.so.0+0x3a6b)
    #9 0x7fee8c9e116b in nfct_query (/lib64/libnetfilter_conntrack.so.3+0x716b)
    #10 0x40f883 in collector_dump_flows /usr/src/redhat/BUILD/netsniff-ng-0.6.3/flowtop.c:1804
    #11 0x40fc3d in collector /usr/src/redhat/BUILD/netsniff-ng-0.6.3/flowtop.c:1855
    #12 0x7fee8c36e36c in start_thread (/lib64/libpthread.so.0+0x736c)
    #13 0x7fee8bc59e1e in __GI___clone (/lib64/libc.so.6+0x110e1e)

0x61a000013888 is located 8 bytes inside of 1264-byte region [0x61a000013880,0x61a000013d70)
freed by thread T2 here:
    #0 0x7fee8cedf4b8 in __interceptor_free (/lib64/libasan.so.4+0xde4b8)
    #1 0x406a2a in __xfree /usr/src/redhat/BUILD/netsniff-ng-0.6.3/xmalloc.h:25
    #2 0x4070ca in flow_entry_xfree /usr/src/redhat/BUILD/netsniff-ng-0.6.3/flowtop.c:403
    #3 0x407101 in flow_entry_xfree_rcu /usr/src/redhat/BUILD/netsniff-ng-0.6.3/flowtop.c:410
    #4 0x7fee8cbfc45e in call_rcu_thread /home/rpmbuild/rpmbuild/BUILD/userspace-rcu-0.10.0/src/urcu-call-rcu-impl.h:371

previously allocated by thread T1 here:
    #0 0x7fee8cedf850 in malloc (/lib64/libasan.so.4+0xde850)
    #1 0x41c5ad in xmalloc /usr/src/redhat/BUILD/netsniff-ng-0.6.3/xmalloc.c:31
    #2 0x41c690 in xzmalloc /usr/src/redhat/BUILD/netsniff-ng-0.6.3/xmalloc.c:56
    #3 0x407066 in flow_entry_xalloc /usr/src/redhat/BUILD/netsniff-ng-0.6.3/flowtop.c:395
    #4 0x40720c in flow_list_new_entry /usr/src/redhat/BUILD/netsniff-ng-0.6.3/flowtop.c:436
    #5 0x40f65a in flow_dump_cb /usr/src/redhat/BUILD/netsniff-ng-0.6.3/flowtop.c:1786
    #6 0x7fee8c9e031a in __callback (/lib64/libnetfilter_conntrack.so.3+0x631a)
Thread T1 created by T0 here:
    #0 0x7fee8ce38a2f in pthread_create (/lib64/libasan.so.4+0x37a2f)
    #1 0x4100dd in main /usr/src/redhat/BUILD/netsniff-ng-0.6.3/flowtop.c:1967
    #2 0x7fee8bb69889 in __libc_start_main (/lib64/libc.so.6+0x20889)

Thread T2 created by T1 here:
    #0 0x7fee8ce38a2f in pthread_create (/lib64/libasan.so.4+0x37a2f)
    #1 0x7fee8cbfb0ff in call_rcu_data_init /home/rpmbuild/rpmbuild/BUILD/userspace-rcu-0.10.0/src/urcu-call-rcu-impl.h:436

SUMMARY: AddressSanitizer: heap-use-after-free /usr/include/urcu/list.h:53 in cds_list_add
Shadow bytes around the buggy address:
  0x0c347fffa6c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c347fffa6d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c347fffa6e0: 00 00 00 00 07 fa fa fa fa fa fa fa fa fa fa fa
  0x0c347fffa6f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c347fffa700: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c347fffa710: fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c347fffa720: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c347fffa730: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c347fffa740: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c347fffa750: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c347fffa760: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==2857==ABORTING

@tklauser
Copy link
Member

tklauser commented Dec 1, 2017

Thanks for the report. I'll have a quick look, but maybe @vkochan has a better idea since he was touching flowtop last.

tklauser pushed a commit that referenced this issue Dec 18, 2017
Use cds_list_del_rcu for safer deletion flow from the process flow
list to prevent possible use-after-free by UI thread when it is
refreshing the processes.

It may fix the #183 issue.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
@tklauser
Copy link
Member

@Safari77 commit 85f3536 by @vkochan might fix this issue. Care to try with the latest master branch again?

@Safari77
Copy link
Author

Can't reproduce the bug anymore! Thanks 👍

@tklauser
Copy link
Member

@Safari77 many thanks for testing!

@vkochan
Copy link
Contributor

vkochan commented Dec 18, 2017

May be it is better to test few times with some loads ? If it is possible. Thank you very much for testing and good report !

@tklauser
Copy link
Member

I have been running flowtop for several hours now on my workstation with decent load (git, ssh to several machines, $HOME on NFS, web browsing) and didn't observe any segfault.

@Safari77
Copy link
Author

Now it crashes quite fast when I just hold down U to toggle UDP,
I got the original crash when toggling process view 🤔

=================================================================
==31454==ERROR: AddressSanitizer: heap-use-after-free on address 0x61a000010888 at pc 0x000000514199 bp 0x7f8887f9fbc0 sp 0x7f8887f9fbb8
WRITE of size 8 at 0x61a000010888 thread T1
    #0 0x514198 in cds_list_add /usr/include/urcu/list.h:53:19
    #1 0x513313 in flow_entry_find_process /wrk/safari/cvs/netsniff-ng/flowtop.c:565:2
    #2 0x51207c in flow_entry_get_extended /wrk/safari/cvs/netsniff-ng/flowtop.c:844:3
    #3 0x510a3f in flow_list_new_entry /wrk/safari/cvs/netsniff-ng/flowtop.c:452:2
    #4 0x515a37 in flow_dump_cb /wrk/safari/cvs/netsniff-ng/flowtop.c:1772:9
    #5 0x7f888f61931a in __callback (/usr/lib64/libnetfilter_conntrack.so.3+0x631a)
    #6 0x7f888f40ff5d  (/usr/lib64/libnfnetlink.so.0+0x2f5d)
    #7 0x7f888f410702 in nfnl_process (/usr/lib64/libnfnetlink.so.0+0x3702)
    #8 0x7f888f410a6b in nfnl_catch (/usr/lib64/libnfnetlink.so.0+0x3a6b)
    #9 0x7f888f61a16b in nfct_query (/usr/lib64/libnetfilter_conntrack.so.3+0x716b)
    #10 0x50ff2b in collector_dump_flows /wrk/safari/cvs/netsniff-ng/flowtop.c:1790:3
    #11 0x50ec43 in collector /wrk/safari/cvs/netsniff-ng/flowtop.c:1841:4
    #12 0x4deaa2 in __asan::AsanThread::ThreadStart(unsigned long, __sanitizer::atomic_uintptr_t*) (/usr/sbin/flowtop-debug+0x4deaa2)
    #13 0x7f888eda136c in start_thread (/usr/lib64/libpthread.so.0+0x736c)
    #14 0x7f888dd53e1e in __GI___clone (/usr/lib64/libc.so.6+0x110e1e)

0x61a000010888 is located 8 bytes inside of 1264-byte region [0x61a000010880,0x61a000010d70)
freed by thread T2 here:
    #0 0x4d0e38 in __interceptor_free.localalias.0 (/usr/sbin/flowtop-debug+0x4d0e38)
    #1 0x515195 in __xfree /wrk/safari/cvs/netsniff-ng/./xmalloc.h:25:9
    #2 0x51513d in flow_entry_xfree /wrk/safari/cvs/netsniff-ng/flowtop.c:413:2
    #3 0x5150a0 in flow_entry_xfree_rcu /wrk/safari/cvs/netsniff-ng/flowtop.c:420:2
    #4 0x7f888f83545e in call_rcu_thread /home/rpmbuild/rpmbuild/BUILD/userspace-rcu-0.10.0/src/urcu-call-rcu-impl.h:371

previously allocated by thread T1 here:
    #0 0x4d0ff0 in malloc (/usr/sbin/flowtop-debug+0x4d0ff0)
    #1 0x52b245 in xmalloc /wrk/safari/cvs/netsniff-ng/xmalloc.c:31:8
    #2 0x52b5b4 in xzmalloc /wrk/safari/cvs/netsniff-ng/xmalloc.c:56:14
    #3 0x510d1f in flow_entry_xalloc /wrk/safari/cvs/netsniff-ng/flowtop.c:405:9
    #4 0x5109dd in flow_list_new_entry /wrk/safari/cvs/netsniff-ng/flowtop.c:446:6
    #5 0x515a37 in flow_dump_cb /wrk/safari/cvs/netsniff-ng/flowtop.c:1772:9
    #6 0x7f888f61931a in __callback (/usr/lib64/libnetfilter_conntrack.so.3+0x631a)

Thread T1 created by T0 here:
    #0 0x4344e0 in __interceptor_pthread_create (/usr/sbin/flowtop-debug+0x4344e0)
    #1 0x50e38d in main /wrk/safari/cvs/netsniff-ng/flowtop.c:1953:8
    #2 0x7f888dc63889 in __libc_start_main (/usr/lib64/libc.so.6+0x20889)

Thread T2 created by T1 here:
    #0 0x4344e0 in __interceptor_pthread_create (/usr/sbin/flowtop-debug+0x4344e0)
    #1 0x7f888f8340ff in call_rcu_data_init /home/rpmbuild/rpmbuild/BUILD/userspace-rcu-0.10.0/src/urcu-call-rcu-impl.h:436

SUMMARY: AddressSanitizer: heap-use-after-free /usr/include/urcu/list.h:53:19 in cds_list_add
Shadow bytes around the buggy address:
  0x0c347fffa0c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c347fffa0d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c347fffa0e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
  0x0c347fffa0f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c347fffa100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c347fffa110: fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c347fffa120: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c347fffa130: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c347fffa140: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c347fffa150: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c347fffa160: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==31454==ABORTING

@vkochan
Copy link
Contributor

vkochan commented Dec 18, 2017

Sorry, I just 'd like to clarify - you got crash only after you did press "U" on "Process List" tab ?

@vkochan
Copy link
Contributor

vkochan commented Dec 18, 2017

Anyway, will look on it, today later.

@Safari77
Copy link
Author

Now it crashes in both Flows and Process tabs when I press U.
When I reported 20 days ago, it crashed (IIRC) when I pressed tab to view Process.

@tklauser
Copy link
Member

Thanks @Safari77 for verifying. Indeed I can also reproduce if I press the 'U' key repeatedly.

@tklauser tklauser reopened this Dec 18, 2017
vkochan added a commit to vkochan/netsniff-ng that referenced this issue Dec 19, 2017
There is missing logic which removes flown entry from
related proc's entry while destroying global flows list on
filter reloading, hence add common __flow_list_del_entry which
handles this logic for both cases - when ct destroyed or filter
changed.

This is a 2nd fix for issue netsniff-ng#183.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
@vkochan
Copy link
Contributor

vkochan commented Dec 19, 2017

Hi @Safari77 !

Would you plz try this fix from my branch ?
https://github.com/vkochan/netsniff-ng/tree/fix_flowtop_on_reload

@tklauser Says he still see the issue, but I don't (but I saw it before the patch), so it would be great
if you can test it too.

tklauser pushed a commit that referenced this issue Dec 19, 2017
There is missing logic which removes flow entry from
related proc's entry while destroying global flows list on
filter reloading, hence add common __flow_list_del_entry which
handles this logic for both cases - when ct destroyed or filter
changed.

This is a 2nd fix for issue #183.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
@tklauser
Copy link
Member

@vkochan's patch is now applied to master as 6e850d4, so you may also just update to test.

@Safari77
Copy link
Author

It doesn't want to crash anymore. Has been running for five hours.
😉

@vkochan
Copy link
Contributor

vkochan commented Dec 22, 2017

Hi Tobias!

Is this still a blocker for the release ?

@tklauser
Copy link
Member

tklauser commented Jan 2, 2018

I can still reproduce the issue, though not when running under gdb 😞

@YJesus
Copy link

YJesus commented Dec 2, 2021

I have the same issue. First in CentOS 7 and now in CentOS 8. Flowtop could work for hours, perhaps days or just minutes and segfaults inexplicably. I'm using this CLI ./flowtop -G -I -U -T -4 -s

@YJesus
Copy link

YJesus commented Dec 6, 2021

With GDB attached to the process I got this error:

flowtop: api.c:965: nfct_query: Assertion `data != NULL' failed

@YJesus
Copy link

YJesus commented Apr 13, 2022

Finally I follow the clues and find that flowtop.c calls nfct_query() from api.c of libnetfilter_conntrack and trigger this assert:

assert(data != NULL);

int nfct_query(struct nfct_handle *h,
const enum nf_conntrack_query qt,
const void data)
{
const size_t size = 4096; /
enough for now */
union {
char buffer[size];
struct nfnlhdr req;
} u;

    assert(h != NULL);
    **assert(data != NULL);**

    if (__build_query_ct(h->nfnlssh_ct, qt, data, &u.req, size) == -1)
            return -1;

    return nfnl_query(h->nfnlh, &u.req.nlh);

}

So I have modified void collector_refresh_flows() adding a check to not use nfct_query() if n->ct is NULL.

static void collector_refresh_flows(struct nfct_handle *handle)
{
struct flow_entry *n;

    cds_list_for_each_entry_rcu(n, &flow_list.head, entry) {
            if (n->ct != NULL) {
                    nfct_query(handle, NFCT_Q_GET, n->ct);
            }
    }

}

I'm pretty sure that best option is to find out why n->ct reach collector_refresh_flows() NULL but apparently my fix resolve the coredump and flowtop remains stable.

@tklauser
Copy link
Member

@YJesus thanks a lot for the analysis! I think the patch you proposed with checking n->ct != NULL would be a viable fix to have until we figure out the underlying issue. Want to open a PR adding that check?

@YJesus
Copy link

YJesus commented May 11, 2022

Bad news, the root problem is n are pointed to invalid memory address so ... if you try to test if NULL, you get another coredump :( I have tested with many RCU versions and cds_list_for_each_entry_rcu randomly put n in a invalid memory address :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants