Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2016.2.4: Possible crash caused by backported 1003-batman-adv-Fix-double-free-during-fragment-merge-err.patch #1083

Closed
ecsv opened this issue Mar 28, 2017 · 10 comments
Labels
0. type: bug This is a bug

Comments

@ecsv
Copy link
Contributor

ecsv commented Mar 28, 2017

It seems to me that a regression was introduced in the batman-adv package of 2016.2.4. The backport of the patch 1003-batman-adv-Fix-double-free-during-fragment-merge-err.patch is incomplete and isn't correctly integrated for the pre-2016.5 rxhandler behavior. I don't know for sure if this crash actually happens (I don't use the batman-adv version from gluon 2016.2.4). But at least in theory, this is a possible problem.

The explanation was already posted at b7eeef9#commitcomment-21514812. Added here again for completeness:


@NeoRaider This patch seems to be "incorrectly" backported. The version which are you using is for v2016.3-104-g83fef82 or later. But you still seem to use v2016.2

The change 83fef820ecdb ("batman-adv: Consume skb in receive handlers") moved the consume_skb/kfree_skb handling towards the receive handlers.

Let's think a little bit about how it worked in batman-adv 2016.2:

  • batadv_recv_frag_packet is called to handle the packet
  • batadv_frag_skb_buffer fails to assemble the packet
    • the change you've introduced always frees the skb in this case
  • batadv_recv_frag_packet detects the failure and returns NET_RX_DROP
  • batadv_batman_skb_recv detects that the handler failed
    • kfree_skb is called again for this skb (!!!! double free crash michael bay explosion !!!!)

If my explanation is correct, then you would have to make sure that batadv_recv_frag_packet returns NET_RX_SUCCESS when batadv_frag_skb_buffer fails. At least for your batman-adv versions older than v2016.5.

@rotanid rotanid added the 0. type: bug This is a bug label Mar 28, 2017
@rotanid
Copy link
Member

rotanid commented Mar 28, 2017

this may be the reason for the constant reboots and two complete crashes we experienced since updating to 2016.2.4.
at the moment we're testing 2016.2.4 minus the new batman-adv patches.

@neocturne
Copy link
Member

@ecsv Thanks for the detailed report. AFAICT, the following patch should fix it:

--- a/net/batman-adv/fragmentation.c
+++ b/net/batman-adv/fragmentation.c
@@ -328,8 +328,6 @@ bool batadv_frag_skb_buffer(struct sk_buff **skb,
                goto out;

        skb_out = batadv_frag_merge_packets(&head);
-       if (!skb_out)
-               goto out_err;

 out:
        ret = true;

@ecsv
Copy link
Contributor Author

ecsv commented Mar 28, 2017

This is not enough. The batadv_frag_insert_packet call (earlier in batadv_frag_skb_buffer) can also fail (and free the skb). As result, batadv_frag_skb_buffer will then return false and thus batadv_recv_frag_packet will return NET_RX_DROP

@neocturne
Copy link
Member

Okay, let's try this again... Instead of fixing up the faulty backport, this replaces it:

--- a/net/batman-adv/fragmentation.c
+++ b/net/batman-adv/fragmentation.c
@@ -326,8 +326,6 @@ bool batadv_frag_skb_buffer(struct sk_buff **skb,
 		goto out;
 
 	skb_out = batadv_frag_merge_packets(&head);
-	if (!skb_out)
-		goto out_err;
 
 out:
 	*skb = skb_out;

(funny, this is the exact same patch as above, just with slightly changed line numbers and context)

This leaves batadv_frag_insert_packet untouched (so it won't free the SKB on errors), only batadv_frag_skb_buffer is changed to return true when batadv_frag_merge_packets fails to avoid the original double free the backport tried to fix.

@ecsv
Copy link
Contributor Author

ecsv commented Mar 28, 2017

This should work

@neocturne
Copy link
Member

Thanks, commited in d452a7c.

@rotanid
Copy link
Member

rotanid commented Mar 28, 2017

yesterday night we reverted both patches and rolled out an experimental fw. this one is stable until now.
we will use your updated patch and test it starting tonight.

@A-Kasper
Copy link
Contributor

A-Kasper commented Mar 30, 2017 via email

@rotanid
Copy link
Member

rotanid commented Mar 30, 2017

it's the opposite for us. so far no problem with the newest commit.
with 2016.2.4 we have many nodes rebooting over and over again. not all of the nodes, but many. most of the affected ones have a direct fastd vpn connection and many clients.
disclaimer: the newest commit based firmware isn't out there for a long time yet, which makes it difficult to directly compare it to 2016.2.4

@A-Kasper
Copy link
Contributor

A-Kasper commented Mar 31, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. type: bug This is a bug
Projects
None yet
Development

No branches or pull requests

4 participants