Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ndpiReader crash while processiong the live traffic #423

Closed
lynn1050 opened this issue Jul 19, 2017 · 18 comments
Closed

ndpiReader crash while processiong the live traffic #423

lynn1050 opened this issue Jul 19, 2017 · 18 comments
Labels

Comments

@lynn1050
Copy link

ndpiReader crash on one server, but not on another server, when it is processing the live traffic.
The backtrace is as follows:

#0 0x000000308287b81c in free () from /lib64/libc.so.6
#1 0x000000000040a32a in ndpi_free_flow_info_half (flow=0x7ffff220d8b0, func=0x438480 "node_idle_scan_walker") at ndpi_util.c:79
#2 0x00000000004071fc in node_idle_scan_walker (node=, which=, depth=, user_data=) at ndpiReader.c:1099
#3 0x000000000040b028 in ndpi_trecurse (root=0x7fffeb5889e0, action=0x407140 <node_idle_scan_walker>, level=9, user_data=0x7ffff7d64b3e) at ndpi_main.c:143
#4 0x000000000040b06a in ndpi_trecurse (root=0x7fffe855fcd0, action=0x407140 <node_idle_scan_walker>, level=8, user_data=0x7ffff7d64b3e) at ndpi_main.c:148
#5 0x000000000040b041 in ndpi_trecurse (root=0x7ffff1e6e290, action=0x407140 <node_idle_scan_walker>, level=7, user_data=0x7ffff7d64b3e) at ndpi_main.c:145
#6 0x000000000040b06a in ndpi_trecurse (root=0x7ffff0752120, action=0x407140 <node_idle_scan_walker>, level=6, user_data=0x7ffff7d64b3e) at ndpi_main.c:148
#7 0x000000000040b041 in ndpi_trecurse (root=0x7fffea32fc70, action=0x407140 <node_idle_scan_walker>, level=5, user_data=0x7ffff7d64b3e) at ndpi_main.c:145
#8 0x000000000040b041 in ndpi_trecurse (root=0x7ffff088fcd0, action=0x407140 <node_idle_scan_walker>, level=4, user_data=0x7ffff7d64b3e) at ndpi_main.c:145
#9 0x000000000040b06a in ndpi_trecurse (root=0x7ffff2197a90, action=0x407140 <node_idle_scan_walker>, level=3, user_data=0x7ffff7d64b3e) at ndpi_main.c:148
#10 0x000000000040b06a in ndpi_trecurse (root=0x7ffff16cb800, action=0x407140 <node_idle_scan_walker>, level=2, user_data=0x7ffff7d64b3e) at ndpi_main.c:148
#11 0x000000000040b06a in ndpi_trecurse (root=0x7ffff0cbad30, action=0x407140 <node_idle_scan_walker>, level=1, user_data=0x7ffff7d64b3e) at ndpi_main.c:148
#12 0x000000000040b041 in ndpi_trecurse (root=0x7ffff1ff9390, action=0x407140 <node_idle_scan_walker>, level=0, user_data=0x7ffff7d64b3e) at ndpi_main.c:145
#13 0x0000000000408910 in pcap_process_packet (args=, header=0x7ffff7d64bd0, packet=0x7ffff7edd046 "\350\261\374\311'\360\220\261\034\006\254\300\201") at ndpiReader.c:2020
#14 0x00000033b7808ace in ?? () from /usr/lib64/libpcap.so.1
#15 0x00000033b781137d in pcap_loop () from /usr/lib64/libpcap.so.1
#16 0x0000000000408d5f in runPcapLoop (_thread_id=0x0) at ndpiReader.c:2117
#17 processing_thread (_thread_id=0x0) at ndpiReader.c:2144
#18 0x0000003082c079d1 in start_thread () from /lib64/libpthread.so.0
#19 0x00000030828e8b6d in clone () from /lib64/libc.so.6

I found some clues.
In the idle flow cleanup, some nodes cannot be deleted from the "ndpi_flow_root" when calling function ndpi_tdelete, and then the flow, which is the key of the node, is free.
In the next loop, ndpi_twalk finds the node, and calls node_idle_scan_walker, and then calls ndpi_free_flow_info_half. The memory is double free here and finally crash.

And then I found there was some porblems in the sorting function of the binary tree.
For example, there are three nodes, node A, node B and node C.
A: 0x7fffe9079520(hashval: 3484336491, vlan_id:401, protocol:6, lower_ip: 3534356490, lower_port: 52459, upper_ip: 4244875456, upper_port: 18975)
B: 0x7ffff04b3c60(hashval: 3484336491, vlan_id:401, protocol:6, lower_ip: 3534356490, lower_port: 52971, upper_ip: 4244875456, upper_port: 18463)
C: 0x7fffe93c2f50(hashval: 3484336491, vlan_id:401, protocol:6, lower_ip: 4244875456, lower_port: 18975, upper_ip: 3534356490, upper_port: 52459)

We call funcion ndpi_workflow_node_cmp, and get this result: A<B, C>B, A==C.
This results the failure of deleting node from "ndpi_flow_root".

But it doesn't happen frequently, so crash happened only on some server.

@kYroL01
Copy link
Contributor

kYroL01 commented Jul 23, 2017

Thanks for reporting. I'll check asap

@qianguozheng
Copy link
Contributor

@lynn1050 have you found that crash reason ? I encounter this problem too, but I don't understand the program deeply, so it's hard for me to do a deep diagnostic.
If you have any other clue or anything help to fix such bug, please let me know, thanks.

@qianguozheng
Copy link
Contributor

@kYroL01 any update on this questions ? I can do some test

@kYroL01
Copy link
Contributor

kYroL01 commented Dec 7, 2017

Maybe related to this #499

@kYroL01
Copy link
Contributor

kYroL01 commented Dec 7, 2017

@qianguozheng @lynn1050 Could you please do more test after latest commits ? For what I understand, the problem occurs after many time and many traffic, right ? Any help is appreciated to better understand, thanks.

@lynn1050
Copy link
Author

lynn1050 commented Dec 8, 2017

@kYroL01 & @qianguozheng, Reasons were described in the above comment. Have you got it?

@qianguozheng
Copy link
Contributor

@kYroL01 Update to latest commit, still crash.
@lynn1050 You mean when hashval is the same, there would be double free ?

@qianguozheng
Copy link
Contributor

@lynn1050 have you got any method to fix such things ?

@lynn1050
Copy link
Author

Yes. I modified the function get_ndpi_flow_info(...) in ndpi_util.c.

  When a new packet comes in, it's requested to get the flow from ndpi_flows_root. If the flow cannot be got,  swap the source and the desination, and then try to get the flow. If neither of them cannot work, then create a new flow(a new node in ndpi_flows_root). Such modification can lead to a unique node in ndpi_flows_root.

......
flow.protocol = iph->protocol, flow.vlan_id = vlan_id;
flow.src_ip = iph->saddr, flow.dst_ip = iph->daddr;
flow.src_port = htons(*sport), flow.dst_port = htons(*dport);
flow.hashval = hashval = flow.protocol + flow.vlan_id + flow.src_ip + flow.dst_ip + flow.src_port + flow.dst_port;
idx = hashval % workflow->prefs.num_roots;
ret = ndpi_tfind(&flow, &workflow->ndpi_flows_root[idx], ndpi_workflow_node_cmp);

/************************************** newly added code, start ****************************************/
/
to avoid two nodes in one binary tree for a flow */
bool is_changed = false;
if(ret == NULL)
{
u_int32_t orig_src_ip = flow.src_ip;
u_int16_t orig_src_port = flow.src_port;
u_int32_t orig_dst_ip = flow.dst_ip;
u_int16_t orig_dst_port = flow.dst_port;

flow.src_ip = orig_dst_ip;
flow.src_port = orig_dst_port;
flow.dst_ip = orig_src_ip;
flow.dst_port = orig_src_port;

is_changed = true;

ret = ndpi_tfind(&flow, &workflow->ndpi_flows_root[idx], ndpi_workflow_node_cmp);    

}
/newly added code, end***/
if(ret == NULL) {
if(workflow->stats.ndpi_flow_count == workflow->prefs.max_ndpi_flows) {
NDPI_LOG(0, workflow->ndpi_struct, NDPI_LOG_ERROR,
"maximum flow count (%u) has been exceeded\n",
workflow->prefs.max_ndpi_flows);
exit(-1);
}
......

I hope that would be useful for you. Good luck!

@qianguozheng
Copy link
Contributor

@lynn1050 That's great, thanks a lot.
BTW, I saw you define is_changed variable, but not use for other usage, did you lose the usage of this variable later , or it just a unused variable.

@lynn1050
Copy link
Author

The variable is_changed is related to the last output parameter of get_ndpi_flow_info. And my code is written as below,
......
ndpi_tsearch(newflow, &workflow->ndpi_flows_root[idx], ndpi_workflow_node_cmp); /* Add */
workflow->stats.ndpi_flow_count++;

  *src = newflow->src_id, *dst = newflow->dst_id;

  return newflow;
}

} else {
struct ndpi_flow_info flow = (struct ndpi_flow_info)ret;

if(is_changed)
{
    if(flow->src_ip == iph->saddr
     && flow->dst_ip == iph->daddr
     && flow->src_port == htons(*sport)
     && flow->dst_port == htons(*dport)
     )
    *src = flow->dst_id, *dst = flow->src_id, *src_to_dst_direction = 0, flow->bidirectional = 1;
  else
    *src = flow->src_id, *dst = flow->dst_id, *src_to_dst_direction = 1;

}
else 
{
if(flow->src_ip == iph->saddr
     && flow->dst_ip == iph->daddr
     && flow->src_port == htons(*sport)
     && flow->dst_port == htons(*dport)
     )
    *src = flow->src_id, *dst = flow->dst_id, *src_to_dst_direction = 1;
  else
    *src = flow->dst_id, *dst = flow->src_id, *src_to_dst_direction = 0, flow->bidirectional = 1;
}

return flow;

} //the end of get_ndpi_flow_info function.

Anyway, I found the output parameter had been used nowhere.

@qianguozheng
Copy link
Contributor

Thanks a lot. @lynn1050

@kYroL01
Copy link
Contributor

kYroL01 commented Dec 14, 2017

Guys, if you find the problem and you wanna contribute to fix this issue definetley, send us a pull request, so we can test and integrate inside the code. Is the best way for all the community :)

Thank you

@lynn1050
Copy link
Author

@kYroL01 Ok, I will have a try.

@kYroL01
Copy link
Contributor

kYroL01 commented Dec 14, 2017

It's the easiest things to discuss togheter and test better. Thanks ;)

@lynn1050
Copy link
Author

@qianguozheng I need help to send a pull request. My qq number is: 928335321.

@qianguozheng
Copy link
Contributor

@kYroL01 pull request #501

@kYroL01
Copy link
Contributor

kYroL01 commented Dec 19, 2017

Solved by last commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants