New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2016.2.4: Possible crash caused by backported 1003-batman-adv-Fix-double-free-during-fragment-merge-err.patch #1083
Comments
this may be the reason for the constant reboots and two complete crashes we experienced since updating to 2016.2.4. |
@ecsv Thanks for the detailed report. AFAICT, the following patch should fix it: --- a/net/batman-adv/fragmentation.c
+++ b/net/batman-adv/fragmentation.c
@@ -328,8 +328,6 @@ bool batadv_frag_skb_buffer(struct sk_buff **skb,
goto out;
skb_out = batadv_frag_merge_packets(&head);
- if (!skb_out)
- goto out_err;
out:
ret = true; |
This is not enough. The batadv_frag_insert_packet call (earlier in batadv_frag_skb_buffer) can also fail (and free the skb). As result, batadv_frag_skb_buffer will then return false and thus batadv_recv_frag_packet will return NET_RX_DROP |
Okay, let's try this again... Instead of fixing up the faulty backport, this replaces it: --- a/net/batman-adv/fragmentation.c
+++ b/net/batman-adv/fragmentation.c
@@ -326,8 +326,6 @@ bool batadv_frag_skb_buffer(struct sk_buff **skb,
goto out;
skb_out = batadv_frag_merge_packets(&head);
- if (!skb_out)
- goto out_err;
out:
*skb = skb_out; (funny, this is the exact same patch as above, just with slightly changed line numbers and context) This leaves batadv_frag_insert_packet untouched (so it won't free the SKB on errors), only batadv_frag_skb_buffer is changed to return true when batadv_frag_merge_packets fails to avoid the original double free the backport tried to fix. |
This should work |
Thanks, commited in d452a7c. |
yesterday night we reverted both patches and rolled out an experimental fw. this one is stable until now. |
Am 28.03.2017 um 14:44 schrieb Matthias Schiffer:
Bitte nochmal prüfen.
Ich habe 2016.2.x gebaut. Seither laufen einige Nodes instabil. Sie sind
nach ein paar Stunden/Tagen platt bis zum powercycle. Genaueres kann ich
nicht sagen, weil keine eigene Node betroffen war. Aber es sind
signifikant viele um hier ein entsprechendes Feedback zu geben.
- Vorher war 2016.2.4 drauf. Diese Version lief stabil
- Ich hatte make dirclean gemacht. Wir haben also sauber gebaut
Hat noch wer getestet? Könnt ihr ähnliches berichten?
|
it's the opposite for us. so far no problem with the newest commit. |
Am 31.03.2017 um 01:31 schrieb Andreas Ziegler:
it's the opposite for us. so far no problem with the newest commit.
with 2016.2.4 we have many nodes rebooting over and over again. not all
of the nodes, but many. most of the affected ones have a direct fastd
vpn connection and many clients.
disclaimer: the newest commit based firmware isn't out there for a long
time yet, which makes it difficult to directly compare it to 2016.2.4
I would be happy if they would just reboot. We need a powercycle...
Configuration: CPE with MoL and MoW seems to be affected...
|
It seems to me that a regression was introduced in the batman-adv package of 2016.2.4. The backport of the patch 1003-batman-adv-Fix-double-free-during-fragment-merge-err.patch is incomplete and isn't correctly integrated for the pre-2016.5 rxhandler behavior. I don't know for sure if this crash actually happens (I don't use the batman-adv version from gluon 2016.2.4). But at least in theory, this is a possible problem.
The explanation was already posted at b7eeef9#commitcomment-21514812. Added here again for completeness:
@NeoRaider This patch seems to be "incorrectly" backported. The version which are you using is for v2016.3-104-g83fef82 or later. But you still seem to use v2016.2
The change 83fef820ecdb ("batman-adv: Consume skb in receive handlers") moved the consume_skb/kfree_skb handling towards the receive handlers.
Let's think a little bit about how it worked in batman-adv 2016.2:
If my explanation is correct, then you would have to make sure that batadv_recv_frag_packet returns NET_RX_SUCCESS when batadv_frag_skb_buffer fails. At least for your batman-adv versions older than v2016.5.
The text was updated successfully, but these errors were encountered: