New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flowtop segfault #183
Comments
|
Thanks for the report. I'll have a quick look, but maybe @vkochan has a better idea since he was touching flowtop last. |
Use cds_list_del_rcu for safer deletion flow from the process flow list to prevent possible use-after-free by UI thread when it is refreshing the processes. It may fix the #183 issue. Signed-off-by: Vadim Kochan <vadim4j@gmail.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Can't reproduce the bug anymore! Thanks |
@Safari77 many thanks for testing! |
May be it is better to test few times with some loads ? If it is possible. Thank you very much for testing and good report ! |
I have been running flowtop for several hours now on my workstation with decent load (git, ssh to several machines, $HOME on NFS, web browsing) and didn't observe any segfault. |
Now it crashes quite fast when I just hold down U to toggle UDP,
|
Sorry, I just 'd like to clarify - you got crash only after you did press "U" on "Process List" tab ? |
Anyway, will look on it, today later. |
Now it crashes in both |
Thanks @Safari77 for verifying. Indeed I can also reproduce if I press the 'U' key repeatedly. |
There is missing logic which removes flown entry from related proc's entry while destroying global flows list on filter reloading, hence add common __flow_list_del_entry which handles this logic for both cases - when ct destroyed or filter changed. This is a 2nd fix for issue netsniff-ng#183. Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Hi @Safari77 ! Would you plz try this fix from my branch ? @tklauser Says he still see the issue, but I don't (but I saw it before the patch), so it would be great |
There is missing logic which removes flow entry from related proc's entry while destroying global flows list on filter reloading, hence add common __flow_list_del_entry which handles this logic for both cases - when ct destroyed or filter changed. This is a 2nd fix for issue #183. Signed-off-by: Vadim Kochan <vadim4j@gmail.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
It doesn't want to crash anymore. Has been running for five hours. |
Hi Tobias! Is this still a blocker for the release ? |
I can still reproduce the issue, though not when running under gdb |
I have the same issue. First in CentOS 7 and now in CentOS 8. Flowtop could work for hours, perhaps days or just minutes and segfaults inexplicably. I'm using this CLI ./flowtop -G -I -U -T -4 -s |
With GDB attached to the process I got this error: flowtop: api.c:965: nfct_query: Assertion `data != NULL' failed |
Finally I follow the clues and find that flowtop.c calls nfct_query() from api.c of libnetfilter_conntrack and trigger this assert: assert(data != NULL); int nfct_query(struct nfct_handle *h,
} So I have modified void collector_refresh_flows() adding a check to not use nfct_query() if n->ct is NULL. static void collector_refresh_flows(struct nfct_handle *handle)
} I'm pretty sure that best option is to find out why n->ct reach collector_refresh_flows() NULL but apparently my fix resolve the coredump and flowtop remains stable. |
@YJesus thanks a lot for the analysis! I think the patch you proposed with checking |
Bad news, the root problem is n are pointed to invalid memory address so ... if you try to test if NULL, you get another coredump :( I have tested with many RCU versions and cds_list_for_each_entry_rcu randomly put n in a invalid memory address :( |
Fedora, x86_64, gcc 7.2.1-2, userspace-rcu 0.10.0, netsniff-ng 0.6.3.
The text was updated successfully, but these errors were encountered: