New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MP-BGP SRv6 VPNv4 (dplane) #2
Conversation
* Support sr tunsrc set/show * Add seg6 client side * Add seg6 server side
Netlink周りのメモseg6 経路のnetlink msg
FRRのnetlink関連のヘルパー関数
|
Zebraでの経路設定時の権限昇格ZEBRAではOSの設定書き換えを行うときだけ, privs系のマクロを使って権限昇格する.
|
CLIの用意
これらのコードはstaticdに書くのだが, d-planeに関する設定は, zebraしか許されていないため, まずは, zebra側でprintfをするだけのzapiをはやして, それをstaticdからやる.
zapiを使って, staticd -> zebra -> netlinkで設定する部分はとりあえずできた.
|
Our Address Sanitizer CI is finding this issue: error 09-Oct-2019 19:28:33 r4: bgpd triggered an exception by AddressSanitizer error 09-Oct-2019 19:28:33 ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffdd425b060 at pc 0x00000068575f bp 0x7ffdd4258550 sp 0x7ffdd4258540 error 09-Oct-2019 19:28:33 READ of size 1 at 0x7ffdd425b060 thread T0 error 09-Oct-2019 19:28:33 #0 0x68575e in prefix_cmp lib/prefix.c:776 error 09-Oct-2019 19:28:33 #1 0x5889f5 in rfapiItBiIndexSearch bgpd/rfapi/rfapi_import.c:2230 error 09-Oct-2019 19:28:33 #2 0x5889f5 in rfapiBgpInfoFilteredImportVPN bgpd/rfapi/rfapi_import.c:3520 error 09-Oct-2019 19:28:33 #3 0x58b909 in rfapiProcessWithdraw bgpd/rfapi/rfapi_import.c:4071 error 09-Oct-2019 19:28:33 #4 0x4c459b in bgp_withdraw bgpd/bgp_route.c:3736 error 09-Oct-2019 19:28:33 #5 0x484122 in bgp_nlri_parse_vpn bgpd/bgp_mplsvpn.c:237 error 09-Oct-2019 19:28:33 #6 0x497f52 in bgp_nlri_parse bgpd/bgp_packet.c:315 error 09-Oct-2019 19:28:33 #7 0x49d06d in bgp_update_receive bgpd/bgp_packet.c:1598 error 09-Oct-2019 19:28:33 #8 0x49d06d in bgp_process_packet bgpd/bgp_packet.c:2274 error 09-Oct-2019 19:28:33 #9 0x6b9f54 in thread_call lib/thread.c:1531 error 09-Oct-2019 19:28:33 #10 0x657037 in frr_run lib/libfrr.c:1052 error 09-Oct-2019 19:28:33 #11 0x42d268 in main bgpd/bgp_main.c:486 error 09-Oct-2019 19:28:33 #12 0x7f806032482f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f) error 09-Oct-2019 19:28:33 #13 0x42bcc8 in _start (/usr/lib/frr/bgpd+0x42bcc8) error 09-Oct-2019 19:28:33 error 09-Oct-2019 19:28:33 Address 0x7ffdd425b060 is located in stack of thread T0 at offset 240 in frame error 09-Oct-2019 19:28:33 #0 0x483945 in bgp_nlri_parse_vpn bgpd/bgp_mplsvpn.c:103 error 09-Oct-2019 19:28:33 error 09-Oct-2019 19:28:33 This frame has 5 object(s): error 09-Oct-2019 19:28:33 [32, 36) 'label' error 09-Oct-2019 19:28:33 [96, 108) 'rd_as' error 09-Oct-2019 19:28:33 [160, 172) 'rd_ip' error 09-Oct-2019 19:28:33 [224, 240) 'prd' <== Memory access at offset 240 overflows this variable error 09-Oct-2019 19:28:33 [288, 336) 'p' error 09-Oct-2019 19:28:33 HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext error 09-Oct-2019 19:28:33 (longjmp and C++ exceptions *are* supported) error 09-Oct-2019 19:28:33 SUMMARY: AddressSanitizer: stack-buffer-overflow lib/prefix.c:776 prefix_cmp error 09-Oct-2019 19:28:33 Shadow bytes around the buggy address: error 09-Oct-2019 19:28:33 0x10003a8435b0: 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 00 error 09-Oct-2019 19:28:33 0x10003a8435c0: 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 error 09-Oct-2019 19:28:33 0x10003a8435d0: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 error 09-Oct-2019 19:28:33 0x10003a8435e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 error 09-Oct-2019 19:28:33 0x10003a8435f0: f1 f1 04 f4 f4 f4 f2 f2 f2 f2 00 04 f4 f4 f2 f2 error 09-Oct-2019 19:28:33 =>0x10003a843600: f2 f2 00 04 f4 f4 f2 f2 f2 f2 00 00[f4]f4 f2 f2 error 09-Oct-2019 19:28:33 0x10003a843610: f2 f2 00 00 00 00 00 00 f4 f4 f3 f3 f3 f3 00 00 error 09-Oct-2019 19:28:33 0x10003a843620: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 error 09-Oct-2019 19:28:33 0x10003a843630: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 02 f4 error 09-Oct-2019 19:28:33 0x10003a843640: f4 f4 f2 f2 f2 f2 04 f4 f4 f4 f2 f2 f2 f2 00 00 error 09-Oct-2019 19:28:33 0x10003a843650: f4 f4 f2 f2 f2 f2 00 00 00 00 f2 f2 f2 f2 00 00 error 09-Oct-2019 19:28:33 Shadow byte legend (one shadow byte represents 8 application bytes): error 09-Oct-2019 19:28:33 Addressable: 00 error 09-Oct-2019 19:28:33 Partially addressable: 01 02 03 04 05 06 07 error 09-Oct-2019 19:28:33 Heap left redzone: fa error 09-Oct-2019 19:28:33 Heap right redzone: fb error 09-Oct-2019 19:28:33 Freed heap region: fd error 09-Oct-2019 19:28:33 Stack left redzone: f1 error 09-Oct-2019 19:28:33 Stack mid redzone: f2 error 09-Oct-2019 19:28:33 Stack right redzone: f3 error 09-Oct-2019 19:28:33 Stack partial redzone: f4 error 09-Oct-2019 19:28:33 Stack after return: f5 error 09-Oct-2019 19:28:33 Stack use after scope: f8 error 09-Oct-2019 19:28:33 Global redzone: f9 error 09-Oct-2019 19:28:33 Global init order: f6 error 09-Oct-2019 19:28:33 Poisoned by user: f7 error 09-Oct-2019 19:28:33 Container overflow: fc error 09-Oct-2019 19:28:33 Array cookie: ac error 09-Oct-2019 19:28:33 Intra object redzone: bb error 09-Oct-2019 19:28:33 ASan internal: fe error 09-Oct-2019 19:28:36 r3: Daemon bgpd not running This is the result of this code pattern in rfapi/rfapi_import.c: prefix_cmp((struct prefix *)&bpi_result->extra->vnc.import.rd, (struct prefix *)prd)) Effectively prd or vnc.import.rd are `struct prefix_rd` which are being typecast to a `struct prefix`. Not a big deal except commit 1315d74 modified the prefix_cmp function to allow for a sorted prefix_cmp. In prefix_cmp we were looking at the offset and shift. In the case of vnc we were passing a prefix length of 64 which is the exact length of the remaining data structure for struct prefix_rd. So we calculated a offset of 8 and a shift of 0. The data structures for the prefix portion happened to be equal to 64 bits of data. So we checked that with the memcmp got a 0 and promptly read off the end of the data structure for the numcmp. The fix is if shift is 0 that means thei the memcmp has checked everything and there is nothing to do. Please note: We will still crash if we set the prefixlen > then ~312 bits currently( ie if the prefixlen specifies a bit length longer than the prefix length ). I do not think there is anything to do here( nor am I sure how to correct this either ) as that we are going to have some severe problems when we muck up the prefixlen. Fixes: FRRouting#5025 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This code is called from the zebra main pthread during shutdown but the thread event is scheduled via the zebra dplane pthread. Hence, we should be using the `thread_cancel_async()` API to cancel the thread event on a different pthread. This is only ever hit in the rare case that we still have work left to do on the update queue during shutdown. Found via zebra crash: ``` (gdb) bt \#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 \#1 0x00007f4e4d3f7535 in __GI_abort () at abort.c:79 \#2 0x00007f4e4d3f740f in __assert_fail_base (fmt=0x7f4e4d559ee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7f4e4d9071d0 "master->owner == pthread_self()", file=0x7f4e4d906cf8 "lib/thread.c", line=1185, function=<optimized out>) at assert.c:92 \#3 0x00007f4e4d405102 in __GI___assert_fail (assertion=assertion@entry=0x7f4e4d9071d0 "master->owner == pthread_self()", file=file@entry=0x7f4e4d906cf8 "lib/thread.c", line=line@entry=1185, function=function@entry=0x7f4e4d906b68 <__PRETTY_FUNCTION__.15817> "thread_cancel") at assert.c:101 \#4 0x00007f4e4d8d095a in thread_cancel (thread=0x55b40d01a640) at lib/thread.c:1185 \#5 0x000055b40c291845 in zebra_dplane_shutdown () at zebra/zebra_dplane.c:3274 \#6 0x000055b40c27ee13 in zebra_finalize (dummy=<optimized out>) at zebra/main.c:202 \#7 0x00007f4e4d8d1416 in thread_call (thread=thread@entry=0x7ffcbbc08870) at lib/thread.c:1599 \#8 0x00007f4e4d8a1ef8 in frr_run (master=0x55b40ce35510) at lib/libfrr.c:1024 \#9 0x000055b40c270916 in main (argc=8, argv=0x7ffcbbc08c78) at zebra/main.c:483 (gdb) down \#4 0x00007f4e4d8d095a in thread_cancel (thread=0x55b40d01a640) at lib/thread.c:1185 1185 assert(master->owner == pthread_self()); (gdb) ``` Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
別ブランチで作業しているので閉じます. |
Address Sanitizer is reporting this issue: ==26177==ERROR: AddressSanitizer: heap-use-after-free on address 0x6120000238d8 at pc 0x7f88f7c4fa93 bp 0x7fff9a641830 sp 0x7fff9a641820 READ of size 8 at 0x6120000238d8 thread T0 #0 0x7f88f7c4fa92 in if_delete lib/if.c:290 #1 0x42192e in ospf_vl_if_delete ospfd/ospf_interface.c:912 #2 0x42192e in ospf_vl_delete ospfd/ospf_interface.c:990 #3 0x4a6208 in no_ospf_area_vlink ospfd/ospf_vty.c:1227 #4 0x7f88f7c1553d in cmd_execute_command_real lib/command.c:1073 #5 0x7f88f7c19b1e in cmd_execute_command lib/command.c:1132 #6 0x7f88f7c19e8e in cmd_execute lib/command.c:1288 #7 0x7f88f7cd7523 in vty_command lib/vty.c:516 #8 0x7f88f7cd79ff in vty_execute lib/vty.c:1285 #9 0x7f88f7cde4f9 in vtysh_read lib/vty.c:2119 #10 0x7f88f7ccb845 in thread_call lib/thread.c:1549 #11 0x7f88f7c5d6a7 in frr_run lib/libfrr.c:1093 #12 0x412976 in main ospfd/ospf_main.c:221 #13 0x7f88f73b082f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f) #14 0x413c78 in _start (/usr/local/master/sbin/ospfd+0x413c78) Effectively we are in a shutdown phase and as part of shutdown we delete the ospf interface pointer ( ifp->info ). The interface deletion code was modified in the past year to pass in the address of operator to allow us to NULL out the holding pointer. The catch here is that we free the oi and then delete the interface passing in the address of the oi->ifp pointer, causing a use after free. Fixes: FRRouting#5555 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We were not connecting the default zebra_ns to the default ns->info at namespace initialization in zebra. Thus, when we tried to use the `ns_walk_func()` it would ignore the default zebra_ns since there is no pointer to it from the ns struct. Fix this by connecting them in `zebra_ns_init()` and, if the default ns is not found, exit with failure since this is not recoverable. This was found during a crash where we fail to cancel the kernel_read thread at termination (via the `ns_walk_func()`) and then we get a netlink notification trying to use the zns struct that has already been freed. ``` (gdb) bt \#0 0x00007fc1134dc7bb in raise () from /lib/x86_64-linux-gnu/libc.so.6 \#1 0x00007fc1134c7535 in abort () from /lib/x86_64-linux-gnu/libc.so.6 \#2 0x00007fc113996f8f in core_handler (signo=11, siginfo=0x7ffe5429d070, context=<optimized out>) at lib/sigevent.c:254 \#3 <signal handler called> \#4 0x0000561880e15449 in if_lookup_by_index_per_ns (ns=0x0, ifindex=174) at zebra/interface.c:269 \#5 0x0000561880e1642c in if_up (ifp=ifp@entry=0x561883076c50) at zebra/interface.c:1043 \#6 0x0000561880e10723 in netlink_link_change (h=0x7ffe5429d8f0, ns_id=<optimized out>, startup=<optimized out>) at zebra/if_netlink.c:1384 \#7 0x0000561880e17e68 in netlink_parse_info (filter=filter@entry=0x561880e17680 <netlink_information_fetch>, nl=nl@entry=0x561882497238, zns=zns@entry=0x7ffe542a5940, count=count@entry=5, startup=startup@entry=0) at zebra/kernel_netlink.c:932 \#8 0x0000561880e186a5 in kernel_read (thread=<optimized out>) at zebra/kernel_netlink.c:406 \#9 0x00007fc1139a4416 in thread_call (thread=thread@entry=0x7ffe542a5b70) at lib/thread.c:1599 \#10 0x00007fc113974ef8 in frr_run (master=0x5618823c9510) at lib/libfrr.c:1024 \#11 0x0000561880e0b916 in main (argc=8, argv=0x7ffe542a5f78) at zebra/main.c:483 ``` Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Our Address Sanitizer CI is finding this issue: error 09-Oct-2019 19:28:33 r4: bgpd triggered an exception by AddressSanitizer error 09-Oct-2019 19:28:33 ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffdd425b060 at pc 0x00000068575f bp 0x7ffdd4258550 sp 0x7ffdd4258540 error 09-Oct-2019 19:28:33 READ of size 1 at 0x7ffdd425b060 thread T0 error 09-Oct-2019 19:28:33 #0 0x68575e in prefix_cmp lib/prefix.c:776 error 09-Oct-2019 19:28:33 #1 0x5889f5 in rfapiItBiIndexSearch bgpd/rfapi/rfapi_import.c:2230 error 09-Oct-2019 19:28:33 #2 0x5889f5 in rfapiBgpInfoFilteredImportVPN bgpd/rfapi/rfapi_import.c:3520 error 09-Oct-2019 19:28:33 #3 0x58b909 in rfapiProcessWithdraw bgpd/rfapi/rfapi_import.c:4071 error 09-Oct-2019 19:28:33 #4 0x4c459b in bgp_withdraw bgpd/bgp_route.c:3736 error 09-Oct-2019 19:28:33 #5 0x484122 in bgp_nlri_parse_vpn bgpd/bgp_mplsvpn.c:237 error 09-Oct-2019 19:28:33 #6 0x497f52 in bgp_nlri_parse bgpd/bgp_packet.c:315 error 09-Oct-2019 19:28:33 #7 0x49d06d in bgp_update_receive bgpd/bgp_packet.c:1598 error 09-Oct-2019 19:28:33 #8 0x49d06d in bgp_process_packet bgpd/bgp_packet.c:2274 error 09-Oct-2019 19:28:33 #9 0x6b9f54 in thread_call lib/thread.c:1531 error 09-Oct-2019 19:28:33 #10 0x657037 in frr_run lib/libfrr.c:1052 error 09-Oct-2019 19:28:33 #11 0x42d268 in main bgpd/bgp_main.c:486 error 09-Oct-2019 19:28:33 #12 0x7f806032482f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f) error 09-Oct-2019 19:28:33 #13 0x42bcc8 in _start (/usr/lib/frr/bgpd+0x42bcc8) error 09-Oct-2019 19:28:33 error 09-Oct-2019 19:28:33 Address 0x7ffdd425b060 is located in stack of thread T0 at offset 240 in frame error 09-Oct-2019 19:28:33 #0 0x483945 in bgp_nlri_parse_vpn bgpd/bgp_mplsvpn.c:103 error 09-Oct-2019 19:28:33 error 09-Oct-2019 19:28:33 This frame has 5 object(s): error 09-Oct-2019 19:28:33 [32, 36) 'label' error 09-Oct-2019 19:28:33 [96, 108) 'rd_as' error 09-Oct-2019 19:28:33 [160, 172) 'rd_ip' error 09-Oct-2019 19:28:33 [224, 240) 'prd' <== Memory access at offset 240 overflows this variable error 09-Oct-2019 19:28:33 [288, 336) 'p' error 09-Oct-2019 19:28:33 HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext error 09-Oct-2019 19:28:33 (longjmp and C++ exceptions *are* supported) error 09-Oct-2019 19:28:33 SUMMARY: AddressSanitizer: stack-buffer-overflow lib/prefix.c:776 prefix_cmp error 09-Oct-2019 19:28:33 Shadow bytes around the buggy address: error 09-Oct-2019 19:28:33 0x10003a8435b0: 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 00 error 09-Oct-2019 19:28:33 0x10003a8435c0: 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 error 09-Oct-2019 19:28:33 0x10003a8435d0: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 error 09-Oct-2019 19:28:33 0x10003a8435e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 error 09-Oct-2019 19:28:33 0x10003a8435f0: f1 f1 04 f4 f4 f4 f2 f2 f2 f2 00 04 f4 f4 f2 f2 error 09-Oct-2019 19:28:33 =>0x10003a843600: f2 f2 00 04 f4 f4 f2 f2 f2 f2 00 00[f4]f4 f2 f2 error 09-Oct-2019 19:28:33 0x10003a843610: f2 f2 00 00 00 00 00 00 f4 f4 f3 f3 f3 f3 00 00 error 09-Oct-2019 19:28:33 0x10003a843620: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 error 09-Oct-2019 19:28:33 0x10003a843630: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 02 f4 error 09-Oct-2019 19:28:33 0x10003a843640: f4 f4 f2 f2 f2 f2 04 f4 f4 f4 f2 f2 f2 f2 00 00 error 09-Oct-2019 19:28:33 0x10003a843650: f4 f4 f2 f2 f2 f2 00 00 00 00 f2 f2 f2 f2 00 00 error 09-Oct-2019 19:28:33 Shadow byte legend (one shadow byte represents 8 application bytes): error 09-Oct-2019 19:28:33 Addressable: 00 error 09-Oct-2019 19:28:33 Partially addressable: 01 02 03 04 05 06 07 error 09-Oct-2019 19:28:33 Heap left redzone: fa error 09-Oct-2019 19:28:33 Heap right redzone: fb error 09-Oct-2019 19:28:33 Freed heap region: fd error 09-Oct-2019 19:28:33 Stack left redzone: f1 error 09-Oct-2019 19:28:33 Stack mid redzone: f2 error 09-Oct-2019 19:28:33 Stack right redzone: f3 error 09-Oct-2019 19:28:33 Stack partial redzone: f4 error 09-Oct-2019 19:28:33 Stack after return: f5 error 09-Oct-2019 19:28:33 Stack use after scope: f8 error 09-Oct-2019 19:28:33 Global redzone: f9 error 09-Oct-2019 19:28:33 Global init order: f6 error 09-Oct-2019 19:28:33 Poisoned by user: f7 error 09-Oct-2019 19:28:33 Container overflow: fc error 09-Oct-2019 19:28:33 Array cookie: ac error 09-Oct-2019 19:28:33 Intra object redzone: bb error 09-Oct-2019 19:28:33 ASan internal: fe error 09-Oct-2019 19:28:36 r3: Daemon bgpd not running This is the result of this code pattern in rfapi/rfapi_import.c: prefix_cmp((struct prefix *)&bpi_result->extra->vnc.import.rd, (struct prefix *)prd)) Effectively prd or vnc.import.rd are `struct prefix_rd` which are being typecast to a `struct prefix`. Not a big deal except commit 1315d74 modified the prefix_cmp function to allow for a sorted prefix_cmp. In prefix_cmp we were looking at the offset and shift. In the case of vnc we were passing a prefix length of 64 which is the exact length of the remaining data structure for struct prefix_rd. So we calculated a offset of 8 and a shift of 0. The data structures for the prefix portion happened to be equal to 64 bits of data. So we checked that with the memcmp got a 0 and promptly read off the end of the data structure for the numcmp. The fix is if shift is 0 that means thei the memcmp has checked everything and there is nothing to do. Please note: We will still crash if we set the prefixlen > then ~312 bits currently( ie if the prefixlen specifies a bit length longer than the prefix length ). I do not think there is anything to do here( nor am I sure how to correct this either ) as that we are going to have some severe problems when we muck up the prefixlen. Fixes: FRRouting#5025 Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Running with --enable-address-sanitizer I am seeing this: ================================================================= ==19520==ERROR: AddressSanitizer: heap-use-after-free on address 0x6020003ef850 at pc 0x7fe9b8f7b57b bp 0x7fffbac6f9c0 sp 0x7fffbac6f170 READ of size 6 at 0x6020003ef850 thread T0 #0 0x7fe9b8f7b57a (/lib/x86_64-linux-gnu/libasan.so.5+0xb857a) #1 0x55e33d1071e5 in bgp_process_mac_rescan_table bgpd/bgp_mac.c:159 #2 0x55e33d107c09 in bgp_mac_rescan_evpn_table bgpd/bgp_mac.c:252 #3 0x55e33d107e39 in bgp_mac_rescan_all_evpn_tables bgpd/bgp_mac.c:266 #4 0x55e33d108270 in bgp_mac_remove_ifp_internal bgpd/bgp_mac.c:291 #5 0x55e33d108893 in bgp_mac_del_mac_entry bgpd/bgp_mac.c:351 #6 0x55e33d21412d in bgp_ifp_down bgpd/bgp_zebra.c:257 #7 0x7fe9b8cbf3be in if_down_via_zapi lib/if.c:198 #8 0x7fe9b8db303a in zclient_interface_down lib/zclient.c:1549 #9 0x7fe9b8db8a06 in zclient_read lib/zclient.c:2693 #10 0x7fe9b8d7b95a in thread_call lib/thread.c:1599 #11 0x7fe9b8cd824e in frr_run lib/libfrr.c:1024 #12 0x55e33d09d463 in main bgpd/bgp_main.c:477 #13 0x7fe9b879409a in __libc_start_main ../csu/libc-start.c:308 #14 0x55e33d09c189 in _start (/usr/lib/frr/bgpd+0x168189) 0x6020003ef850 is located 0 bytes inside of 16-byte region [0x6020003ef850,0x6020003ef860) freed by thread T0 here: #0 0x7fe9b8fabfb0 in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.5+0xe8fb0) #1 0x7fe9b8ce4ea9 in qfree lib/memory.c:129 #2 0x55e33d10825c in bgp_mac_remove_ifp_internal bgpd/bgp_mac.c:289 #3 0x55e33d108893 in bgp_mac_del_mac_entry bgpd/bgp_mac.c:351 #4 0x55e33d21412d in bgp_ifp_down bgpd/bgp_zebra.c:257 #5 0x7fe9b8cbf3be in if_down_via_zapi lib/if.c:198 #6 0x7fe9b8db303a in zclient_interface_down lib/zclient.c:1549 #7 0x7fe9b8db8a06 in zclient_read lib/zclient.c:2693 #8 0x7fe9b8d7b95a in thread_call lib/thread.c:1599 #9 0x7fe9b8cd824e in frr_run lib/libfrr.c:1024 #10 0x55e33d09d463 in main bgpd/bgp_main.c:477 #11 0x7fe9b879409a in __libc_start_main ../csu/libc-start.c:308 previously allocated by thread T0 here: #0 0x7fe9b8fac518 in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0xe9518) #1 0x7fe9b8ce4d93 in qcalloc lib/memory.c:110 #2 0x55e33d106b29 in bgp_mac_hash_alloc bgpd/bgp_mac.c:96 #3 0x7fe9b8cb8350 in hash_get lib/hash.c:149 #4 0x55e33d10845b in bgp_mac_add_mac_entry bgpd/bgp_mac.c:303 #5 0x55e33d226757 in bgp_ifp_create bgpd/bgp_zebra.c:2644 #6 0x7fe9b8cbf1e6 in if_new_via_zapi lib/if.c:176 #7 0x7fe9b8db2d3b in zclient_interface_add lib/zclient.c:1481 #8 0x7fe9b8db87f8 in zclient_read lib/zclient.c:2659 #9 0x7fe9b8d7b95a in thread_call lib/thread.c:1599 #10 0x7fe9b8cd824e in frr_run lib/libfrr.c:1024 #11 0x55e33d09d463 in main bgpd/bgp_main.c:477 #12 0x7fe9b879409a in __libc_start_main ../csu/libc-start.c:308 Effectively we are passing to bgp_mac_remove_ifp_internal the macaddr that is associated with the bsm data structure. There exists a path where the bsm is freed and then we immediately pass the macaddr into bgp_mac_rescan_all_evpn_tables. So just make a copy of the macaddr data structure before we free the bsm Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
================================================================= ==13611==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe9e5c8694 at pc 0x0000004d18ac bp 0x7ffe9e5c8330 sp 0x7ffe9e5c7ae0 WRITE of size 17 at 0x7ffe9e5c8694 thread T0 #0 0x4d18ab in __asan_memcpy (/usr/lib/frr/zebra+0x4d18ab) #1 0x7f16f04bd97f in stream_get2 /home/qlyoung/frr/lib/stream.c:277:2 #2 0x6410ec in zebra_vxlan_remote_macip_del /home/qlyoung/frr/zebra/zebra_vxlan.c:7718:4 #3 0x68fa98 in zserv_handle_commands /home/qlyoung/frr/zebra/zapi_msg.c:2611:3 #4 0x556add in main /home/qlyoung/frr/zebra/main.c:309:2 #5 0x7f16eea3bb96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 #6 0x431249 in _start (/usr/lib/frr/zebra+0x431249) This decode is the result of a buffer overflow because we are not checking ipa_len. Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
================================================================= ==3058==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010 (pc 0x7f5bf3ef7477 bp 0x7ffdfaa20d40 sp 0x7ffdfaa204c8 T0) ==3058==The signal is caused by a READ memory access. ==3058==Hint: address points to the zero page. #0 0x7f5bf3ef7476 in memcpy /build/glibc-OTsEL5/glibc-2.27/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:134 #1 0x4d158a in __asan_memcpy (/usr/lib/frr/zebra+0x4d158a) #2 0x7f5bf58da8ad in stream_put /home/qlyoung/frr/lib/stream.c:605:3 #3 0x67d428 in zsend_ipset_entry_notify_owner /home/qlyoung/frr/zebra/zapi_msg.c:851:2 #4 0x5c70b3 in zebra_pbr_add_ipset_entry /home/qlyoung/frr/zebra/zebra_pbr.c #5 0x68e1bb in zread_ipset_entry /home/qlyoung/frr/zebra/zapi_msg.c:2465:4 #6 0x68f958 in zserv_handle_commands /home/qlyoung/frr/zebra/zapi_msg.c:2611:3 #7 0x55666d in main /home/qlyoung/frr/zebra/main.c:309:2 #8 0x7f5bf3e5db96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 #9 0x4311d9 in _start (/usr/lib/frr/zebra+0x4311d9) the ipset->backpointer was NULL as that the hash lookup failed to find anything. Prevent this crash from happening. Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
==25402==ERROR: LeakSanitizer: detected memory leaks Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x533302 in calloc (/usr/lib/frr/zebra+0x533302) #1 0x7fee84cdc80b in qcalloc /home/qlyoung/frr/lib/memory.c:110:27 #2 0x5a3032 in create_label_chunk /home/qlyoung/frr/zebra/label_manager.c:188:3 #3 0x5a3c2b in assign_label_chunk /home/qlyoung/frr/zebra/label_manager.c:354:8 #4 0x5a2a38 in label_manager_get_chunk /home/qlyoung/frr/zebra/label_manager.c:424:9 #5 0x5a1412 in hook_call_lm_get_chunk /home/qlyoung/frr/zebra/label_manager.c:60:1 #6 0x5a1412 in lm_get_chunk_call /home/qlyoung/frr/zebra/label_manager.c:81:2 #7 0x72a234 in zread_get_label_chunk /home/qlyoung/frr/zebra/zapi_msg.c:2026:2 #8 0x72a234 in zread_label_manager_request /home/qlyoung/frr/zebra/zapi_msg.c:2073:4 #9 0x73150c in zserv_handle_commands /home/qlyoung/frr/zebra/zapi_msg.c:2688:2 When creating label chunk that has a specified base, we eventually are calling assign_specific_label_chunk. This function finds the appropriate list node and deletes it from the lbl_mgr.lc_list but since the function uses list_delete_node() the deletion function that is specified for lbl_mgr.lc_list is not called thus dropping the memory. Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com> Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The only two safi's that are usable for zebra for installation of routes into the rib are SAFI_UNICAST and SAFI_MULTICAST. The acceptance of other safi's is causing a memory leak: Direct leak of 56 byte(s) in 1 object(s) allocated from: #0 0x5332f2 in calloc (/usr/lib/frr/zebra+0x5332f2) #1 0x7f594adc29db in qcalloc /opt/build/frr/lib/memory.c:110:27 #2 0x686849 in zebra_vrf_get_table_with_table_id /opt/build/frr/zebra/zebra_vrf.c:390:11 #3 0x65a245 in rib_add_multipath /opt/build/frr/zebra/zebra_rib.c:2591:10 #4 0x7211bc in zread_route_add /opt/build/frr/zebra/zapi_msg.c:1616:8 #5 0x73063c in zserv_handle_commands /opt/build/frr/zebra/zapi_msg.c:2682:2 Collapse Sequence of events: Upon vrf creation there is a zvrf->table[afi][safi] data structure that tables are auto created for. These tables only create SAFI_UNICAST and SAFI_MULTICAST tables. Since these are the only safi types that are zebra can actually work on. zvrf data structures also have a zvrf->otable data structure that tracks in a RB tree other tables that are created ( say you have routes stuck in any random table in the 32bit route table space in linux ). This data structure is only used if the lookup in zvrf->table[afi][safi] fails. After creation if we pass a route down from an upper level protocol that has non unicast or multicast safi *but* has the actual tableid of the vrf we are in, the initial lookup will always return NULL leaving us to look in the otable. This will create a data structure to track this data. If after this event you pass in a second route with the same afi/safi/table_id, the otable will be created and attempted to be stored, but the RB_TREE_UNIQ data structure when it sees this will return the original otable returned and the lookup function zebra_vrf_get_table_with_table_id will just drop the second otable. Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com> Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Upper level clients ask for default routes of a particular family This change ensures that they only receive the family that they have asked for. Discovered when testing in ospf `default-information originate` ================================================================= ==246306==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffffffa2e8 at pc 0x7ffff73c44e2 bp 0x7fffffffa090 sp 0x7fffffffa088 READ of size 16 at 0x7fffffffa2e8 thread T0 #0 0x7ffff73c44e1 in prefix_copy lib/prefix.c:310 #1 0x7ffff741c0aa in route_node_lookup lib/table.c:255 #2 0x5555556cd263 in ospf_external_info_delete ospfd/ospf_asbr.c:178 #3 0x5555556a47cc in ospf_zebra_read_route ospfd/ospf_zebra.c:852 #4 0x7ffff746f5d8 in zclient_read lib/zclient.c:3028 #5 0x7ffff742fc91 in thread_call lib/thread.c:1549 #6 0x7ffff7374642 in frr_run lib/libfrr.c:1093 #7 0x5555555bfaef in main ospfd/ospf_main.c:235 #8 0x7ffff70a2bba in __libc_start_main ../csu/libc-start.c:308 #9 0x5555555bf499 in _start (/usr/lib/frr/ospfd+0x6b499) Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Implement tests to verify BGP link-bandwidth and weighted ECMP functionality. These tests validate one of the primary use cases for weighted ECMP (a.k.a. Unequal cost multipath) using BGP link-bandwidth: https://tools.ietf.org/html/draft-mohanty-bess-ebgp-dmz The included tests are: Test #1: Test BGP link-bandwidth advertisement based on number of multipaths Test #2: Test cumulative link-bandwidth propagation Test #3: Test weighted ECMP - multipath with next hop weights Test #4: Test weighted ECMP rebalancing upon change (link flap) Test #5: Test weighted ECMP for a second anycast IP Test #6: Test paths with and without link-bandwidth - receiver should resort to regular ECMP Test #7: Test different options for processing link-bandwidth on the receiver Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
This problem was reported by the sanitizer - ================================================================= ==24764==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d0000115c8 at pc 0x55cb9cfad312 bp 0x7fffa0552140 sp 0x7fffa0552138 READ of size 8 at 0x60d0000115c8 thread T0 #0 0x55cb9cfad311 in zebra_evpn_remote_es_flush zebra/zebra_evpn_mh.c:2041 #1 0x55cb9cfad311 in zebra_evpn_es_cleanup zebra/zebra_evpn_mh.c:2234 #2 0x55cb9cf6ae78 in zebra_vrf_disable zebra/zebra_vrf.c:205 #3 0x7fc8d478f114 in vrf_delete lib/vrf.c:229 #4 0x7fc8d478f99a in vrf_terminate lib/vrf.c:541 #5 0x55cb9ceba0af in sigint zebra/main.c:176 #6 0x55cb9ceba0af in sigint zebra/main.c:130 #7 0x7fc8d4765d20 in quagga_sigevent_process lib/sigevent.c:103 #8 0x7fc8d4787e8c in thread_fetch lib/thread.c:1396 #9 0x7fc8d4708782 in frr_run lib/libfrr.c:1092 #10 0x55cb9ce931d8 in main zebra/main.c:488 #11 0x7fc8d43ee09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) #12 0x55cb9ce94c09 in _start (/usr/lib/frr/zebra+0x8ac09) ================================================================= Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Current code in bgp bestpath selection would accept the newest locally originated path as the best path. Making the selection non-deterministic. Modify the code to always come to the same bestpath conclusion when you have multiple locally originated paths in bestpath selection. Before: eva# conf eva(config)# router bgp 323 eva(config-router)# address-family ipv4 uni eva(config-router-af)# redistribute connected eva(config-router-af)# network 192.168.161.0/24 eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0 BGP routing table entry for 192.168.161.0/24 Paths: (2 available, best #1, table default) Not advertised to any peer Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin) Last update: Wed Sep 16 15:03:03 2020 Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin incomplete, metric 0, weight 32768, valid, sourced Last update: Wed Sep 16 15:02:52 2020 eva(config-router-af)# no redistribute connected eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0 BGP routing table entry for 192.168.161.0/24 Paths: (1 available, best #1, table default) Not advertised to any peer Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received) Last update: Wed Sep 16 15:03:03 2020 eva(config-router-af)# redistribute connected eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0 BGP routing table entry for 192.168.161.0/24 Paths: (2 available, best #2, table default) Not advertised to any peer Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin incomplete, metric 0, weight 32768, valid, sourced Last update: Wed Sep 16 15:03:32 2020 Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin) Last update: Wed Sep 16 15:03:03 2020 eva(config-router-af)# Notice the route choosen depends on order received Fixed behavior: eva# conf eva(config)# router bgp 323 eva(config-router)# address-family ipv4 uni eva(config-router-af)# redistribute connected eva(config-router-af)# network 192.168.161.0/24 eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0 BGP routing table entry for 192.168.161.0/24 Paths: (2 available, best #1, table default) Not advertised to any peer Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin) Last update: Wed Sep 16 15:03:03 2020 Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin incomplete, metric 0, weight 32768, valid, sourced Last update: Wed Sep 16 15:02:52 2020 eva(config-router-af)# no redistribute connected eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0 BGP routing table entry for 192.168.161.0/24 Paths: (1 available, best #1, table default) Not advertised to any peer Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received) Last update: Wed Sep 16 15:03:03 2020 eva(config-router-af)# redistribute connected eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0 BGP routing table entry for 192.168.161.0/24 Paths: (2 available, best #2, table default) Not advertised to any peer Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin incomplete, metric 0, weight 32768, valid, sourced Last update: Wed Sep 16 15:03:32 2020 Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin) Last update: Wed Sep 16 15:03:03 2020 eva(config-router-af)# Ticket: CM-31490 Found-by: Trey Aspelund <taspelund@nvidia.com> Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The bgp_l3vpn_to_bgp_vrf test is looking for a prefix on multiple routers that the ordered received is non-deterministic. As such the regex's are failing occassionaly when the route is received in an unexpected order. One possible order: (FRRouting#89) scripts/check_routes.py:120 COMMAND:ce3:vtysh -c "show bgp ipv4 uni 6.0.1.0":2 available, best .*192.168.1.1.* Local.* 99.0.0.3 from 0.0.0.0 .99.0.0.3.* Origin IGP, metric 200, localpref 50, weight 32768, valid, sourced, local, best .Weight.* Community: 0:67.* Extended Community: RT:89:123.* Large Community: 12:34:56.* Local.* 192.168.1.1 from 192.168.1.1 .192.168.1.1.* Origin IGP, metric 98, localpref 123, valid, internal.* Community: 0:67.* Extended Community: RT:52:100 RT:89:123.* Large Community: 12:34:56:pass:Redundant route 1 details c: COMMAND OUTPUT:BGP routing table entry for 6.0.1.0/24^M Paths: (2 available, best #1, table default)^M Advertised to non peer-group peers:^M 192.168.1.1^M Local^M 99.0.0.3 from 0.0.0.0 (99.0.0.3)^M Origin IGP, metric 200, localpref 50, weight 32768, valid, sourced, local, best (Weight)^M Community: 0:67^M Extended Community: RT:89:123^M Large Community: 12:34:56^M Last update: Wed Oct 7 11:12:22 2020^M Local^M 192.168.1.1 from 192.168.1.1 (192.168.1.1)^M Origin IGP, metric 98, localpref 123, valid, internal^M Community: 0:67^M Extended Community: RT:52:100 RT:89:123^M Large Community: 12:34:56^M Last update: Wed Oct 7 11:12:41 2020: R:89 ce3 Redundant route 1 details c 1 0 Second possible order: (FRRouting#89) scripts/check_routes.py:120 COMMAND:ce3:vtysh -c "show bgp ipv4 uni 6.0.1.0":2 available, best .*192.168.1.1.* Local.* 99.0.0.3 from 0.0.0.0 .99.0.0.3.* Origin IGP, metric 200, localpref 50, weight 32768, valid, sourced, local, best .Weight.* Community: 0:67.* Extended Community: RT:89:123.* Large Community: 12:34:56.* Local.* 192.168.1.1 from 192.168.1.1 .192.168.1.1.* Origin IGP, metric 98, localpref 123, valid, internal.* Community: 0:67.* Extended Community: RT:52:100 RT:89:123.* Large Community: 12:34:56:pass:Redundant route 1 details c: COMMAND OUTPUT:BGP routing table entry for 6.0.1.0/24^M Paths: (2 available, best #2, table default)^M Advertised to non peer-group peers:^M 192.168.1.1^M Local^M 192.168.1.1 from 192.168.1.1 (192.168.1.1)^M Origin IGP, metric 98, localpref 123, valid, internal^M Community: 0:67^M Extended Community: RT:52:100 RT:89:123^M Large Community: 12:34:56^M Last update: Wed Oct 7 11:14:45 2020^M Local^M 99.0.0.3 from 0.0.0.0 (99.0.0.3)^M Origin IGP, metric 200, localpref 50, weight 32768, valid, sourced, local, best (Weight)^M Community: 0:67^M Extended Community: RT:89:123^M Large Community: 12:34:56^M Last update: Wed Oct 7 11:14:27 2020: R:89 ce3 Redundant route 1 details c 0 1 BGP displays the paths in the order received since it's just a linked list. For this test modify/add the luCommands to track that we may receive the paths in a non-deterministic order. Signed-off-by: Donald Sharp <sharpd@nvidia.com> Signed-off-by: Quentin Young <qlyoung@nvidia.com>
Current code in bgp bestpath selection would accept the newest locally originated path as the best path. Making the selection non-deterministic. Modify the code to always come to the same bestpath conclusion when you have multiple locally originated paths in bestpath selection. Before: eva# conf eva(config)# router bgp 323 eva(config-router)# address-family ipv4 uni eva(config-router-af)# redistribute connected eva(config-router-af)# network 192.168.161.0/24 eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0 BGP routing table entry for 192.168.161.0/24 Paths: (2 available, best #1, table default) Not advertised to any peer Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin) Last update: Wed Sep 16 15:03:03 2020 Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin incomplete, metric 0, weight 32768, valid, sourced Last update: Wed Sep 16 15:02:52 2020 eva(config-router-af)# no redistribute connected eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0 BGP routing table entry for 192.168.161.0/24 Paths: (1 available, best #1, table default) Not advertised to any peer Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received) Last update: Wed Sep 16 15:03:03 2020 eva(config-router-af)# redistribute connected eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0 BGP routing table entry for 192.168.161.0/24 Paths: (2 available, best #2, table default) Not advertised to any peer Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin incomplete, metric 0, weight 32768, valid, sourced Last update: Wed Sep 16 15:03:32 2020 Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin) Last update: Wed Sep 16 15:03:03 2020 eva(config-router-af)# Notice the route choosen depends on order received Fixed behavior: eva# conf eva(config)# router bgp 323 eva(config-router)# address-family ipv4 uni eva(config-router-af)# redistribute connected eva(config-router-af)# network 192.168.161.0/24 eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0 BGP routing table entry for 192.168.161.0/24 Paths: (2 available, best #1, table default) Not advertised to any peer Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin) Last update: Wed Sep 16 15:03:03 2020 Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin incomplete, metric 0, weight 32768, valid, sourced Last update: Wed Sep 16 15:02:52 2020 eva(config-router-af)# no redistribute connected eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0 BGP routing table entry for 192.168.161.0/24 Paths: (1 available, best #1, table default) Not advertised to any peer Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received) Last update: Wed Sep 16 15:03:03 2020 eva(config-router-af)# redistribute connected eva(config-router-af)# do show bgp ipv4 uni 192.168.161.0 BGP routing table entry for 192.168.161.0/24 Paths: (2 available, best #2, table default) Not advertised to any peer Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin incomplete, metric 0, weight 32768, valid, sourced Last update: Wed Sep 16 15:03:32 2020 Local 0.0.0.0(eva) from 0.0.0.0 (192.168.161.245) Origin IGP, metric 0, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (Origin) Last update: Wed Sep 16 15:03:03 2020 eva(config-router-af)# Ticket: CM-31490 Found-by: Trey Aspelund <taspelund@nvidia.com> Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When zebra is running with debugs turned on there is a use after free reported by the address sanitizer: 2020/10/16 12:58:02 ZEBRA: rib_delnode: (0:254):4.5.6.16/32: rn 0x60b000026f20, re 0x6080000131a0, removing 2020/10/16 12:58:02 ZEBRA: rib_meta_queue_add: (0:254):4.5.6.16/32: queued rn 0x60b000026f20 into sub-queue 3 ================================================================= ==3101430==ERROR: AddressSanitizer: heap-use-after-free on address 0x608000011d28 at pc 0x555555705ab6 bp 0x7fffffffdab0 sp 0x7fffffffdaa8 READ of size 8 at 0x608000011d28 thread T0 #0 0x555555705ab5 in re_list_const_first zebra/rib.h:222 #1 0x555555705b54 in re_list_first zebra/rib.h:222 #2 0x555555711a4f in process_subq_route zebra/zebra_rib.c:2248 #3 0x555555711d2e in process_subq zebra/zebra_rib.c:2286 #4 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320 #5 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291 #6 0x7ffff7450e9c in thread_call lib/thread.c:1581 #7 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099 #8 0x55555561a578 in main zebra/main.c:455 #9 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308 #10 0x5555555e3429 in _start (/usr/lib/frr/zebra+0x8f429) 0x608000011d28 is located 8 bytes inside of 88-byte region [0x608000011d20,0x608000011d78) freed by thread T0 here: #0 0x7ffff768bb6f in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.6+0xa9b6f) #1 0x7ffff739ccad in qfree lib/memory.c:129 #2 0x555555709ee4 in rib_gc_dest zebra/zebra_rib.c:746 #3 0x55555570ca76 in rib_process zebra/zebra_rib.c:1240 #4 0x555555711a05 in process_subq_route zebra/zebra_rib.c:2245 #5 0x555555711d2e in process_subq zebra/zebra_rib.c:2286 #6 0x555555711ec7 in meta_queue_process zebra/zebra_rib.c:2320 #7 0x7ffff74701f7 in work_queue_run lib/workqueue.c:291 #8 0x7ffff7450e9c in thread_call lib/thread.c:1581 #9 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099 #10 0x55555561a578 in main zebra/main.c:455 #11 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308 previously allocated by thread T0 here: #0 0x7ffff768c037 in calloc (/lib/x86_64-linux-gnu/libasan.so.6+0xaa037) #1 0x7ffff739cb98 in qcalloc lib/memory.c:110 #2 0x555555712ace in zebra_rib_create_dest zebra/zebra_rib.c:2515 #3 0x555555712c6c in rib_link zebra/zebra_rib.c:2576 #4 0x555555712faa in rib_addnode zebra/zebra_rib.c:2607 #5 0x555555715bf0 in rib_add_multipath_nhe zebra/zebra_rib.c:3012 #6 0x555555715f56 in rib_add_multipath zebra/zebra_rib.c:3049 #7 0x55555571788b in rib_add zebra/zebra_rib.c:3327 #8 0x5555555e584a in connected_up zebra/connected.c:254 #9 0x5555555e42ff in connected_announce zebra/connected.c:94 #10 0x5555555e4fd3 in connected_update zebra/connected.c:195 #11 0x5555555e61ad in connected_add_ipv4 zebra/connected.c:340 #12 0x5555555f26f5 in netlink_interface_addr zebra/if_netlink.c:1213 #13 0x55555560f756 in netlink_information_fetch zebra/kernel_netlink.c:350 #14 0x555555612e49 in netlink_parse_info zebra/kernel_netlink.c:941 #15 0x55555560f9f1 in kernel_read zebra/kernel_netlink.c:402 FRRouting#16 0x7ffff7450e9c in thread_call lib/thread.c:1581 FRRouting#17 0x7ffff738eaf7 in frr_run lib/libfrr.c:1099 FRRouting#18 0x55555561a578 in main zebra/main.c:455 FRRouting#19 0x7ffff7079cc9 in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: heap-use-after-free zebra/rib.h:222 in re_list_const_first This is happening because we are using the dest pointer after a call into rib_gc_dest. In process_subq_route, we call rib_process() and if the dest is deleted dest pointer is now garbage. We must reload the dest pointer in this case. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fixes the valgrind error we were seeing on startup due to initializing the msg header struct: ``` ==2534283== Thread 3 zebra_dplane: ==2534283== Syscall param recvmsg(msg) points to uninitialised byte(s) ==2534283== at 0x4D616DD: recvmsg (in /usr/lib64/libpthread-2.31.so) ==2534283== by 0x43107C: netlink_recv_msg (kernel_netlink.c:744) ==2534283== by 0x4330E4: nl_batch_read_resp (kernel_netlink.c:1070) ==2534283== by 0x431D12: nl_batch_send (kernel_netlink.c:1201) ==2534283== by 0x431E8B: kernel_update_multi (kernel_netlink.c:1369) ==2534283== by 0x46019B: kernel_dplane_process_func (zebra_dplane.c:3979) ==2534283== by 0x45EB7F: dplane_thread_loop (zebra_dplane.c:4368) ==2534283== by 0x493F5CC: thread_call (thread.c:1585) ==2534283== by 0x48D3450: fpt_run (frr_pthread.c:303) ==2534283== by 0x48D3D41: frr_pthread_inner (frr_pthread.c:156) ==2534283== by 0x4D56431: start_thread (in /usr/lib64/libpthread-2.31.so) ==2534283== by 0x4E709D2: clone (in /usr/lib64/libc-2.31.so) ==2534283== Address 0x85cd850 is on thread 3's stack ==2534283== in frame #2, created by nl_batch_read_resp (kernel_netlink.c:1051) ==2534283== ==2534283== Syscall param recvmsg(msg.msg_control) points to unaddressable byte(s) ==2534283== at 0x4D616DD: recvmsg (in /usr/lib64/libpthread-2.31.so) ==2534283== by 0x43107C: netlink_recv_msg (kernel_netlink.c:744) ==2534283== by 0x4330E4: nl_batch_read_resp (kernel_netlink.c:1070) ==2534283== by 0x431D12: nl_batch_send (kernel_netlink.c:1201) ==2534283== by 0x431E8B: kernel_update_multi (kernel_netlink.c:1369) ==2534283== by 0x46019B: kernel_dplane_process_func (zebra_dplane.c:3979) ==2534283== by 0x45EB7F: dplane_thread_loop (zebra_dplane.c:4368) ==2534283== by 0x493F5CC: thread_call (thread.c:1585) ==2534283== by 0x48D3450: fpt_run (frr_pthread.c:303) ==2534283== by 0x48D3D41: frr_pthread_inner (frr_pthread.c:156) ==2534283== by 0x4D56431: start_thread (in /usr/lib64/libpthread-2.31.so) ==2534283== by 0x4E709D2: clone (in /usr/lib64/libc-2.31.so) ==2534283== Address 0xa0 is not stack'd, malloc'd or (recently) free'd ==2534283== ``` Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The fields in the broadcast/p2p union struct in an isis circuit are initialized when the circuit goes up, but currently this step is skipped if the interface is passive. This can create problems if the circuit type (referred to as network type in the config) changes from broadcast to point-to-point. We can end up with the p2p neighbor pointer pointing at some garbage left by the broadcast struct in the union, which would then cause a segfault the first time we would dereference it - for example when building the lsp, or computing the SPF tree. compressed backtrace of a possible crash: #0 0x0000555555579a9c in lsp_build at frr/isisd/isis_lsp.c:1114 #1 0x000055555557a516 in lsp_regenerate at frr/isisd/isis_lsp.c:1301 #2 0x000055555557aa25 in lsp_refresh at frr/isisd/isis_lsp.c:1381 #3 0x00007ffff7b2622c in thread_call at frr/lib/thread.c:1549 #4 0x00007ffff7ad6df4 in frr_run at frr/lib/libfrr.c:1098 #5 0x000055555556b67f in main at frr/isisd/isis_main.c:272 isis_lsp.c: 1112 case CIRCUIT_T_P2P: { 1113 struct isis_adjacency *nei = circuit->u.p2p.neighbor; 1114 if (nei && nei->adj_state == ISIS_ADJ_UP Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
We are using data after it has been freed and handed back to the OS. Address Sanitizer output: error 23-Nov-2020 18:53:57 ERROR: AddressSanitizer: heap-use-after-free on address 0x631000024838 at pc 0x55f825998f58 bp 0x7fffa5b0f5b0 sp 0x7fffa5b0f5a0 error 23-Nov-2020 18:53:57 READ of size 4 at 0x631000024838 thread T0 error 23-Nov-2020 18:53:57 #0 0x55f825998f57 in lde_imsg_compose_parent_sync ldpd/lde.c:226 error 23-Nov-2020 18:53:57 #1 0x55f8259ca9ed in vlog ldpd/log.c:48 error 23-Nov-2020 18:53:57 #2 0x55f8259cb1c8 in log_info ldpd/log.c:102 error 23-Nov-2020 18:53:57 #3 0x55f82599e841 in lde_shutdown ldpd/lde.c:208 error 23-Nov-2020 18:53:57 #4 0x55f8259a2703 in lde_dispatch_parent ldpd/lde.c:666 error 23-Nov-2020 18:53:57 #5 0x55f825ac3815 in thread_call lib/thread.c:1681 error 23-Nov-2020 18:53:57 #6 0x55f825998d5e in lde ldpd/lde.c:160 error 23-Nov-2020 18:53:57 #7 0x55f82598a289 in main ldpd/ldpd.c:320 error 23-Nov-2020 18:53:57 #8 0x7fe3f749db96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96) error 23-Nov-2020 18:53:57 #9 0x55f825982579 in _start (/usr/lib/frr/ldpd+0xb3579) error 23-Nov-2020 18:53:57 error 23-Nov-2020 18:53:57 0x631000024838 is located 65592 bytes inside of 65632-byte region [0x631000014800,0x631000024860) error 23-Nov-2020 18:53:57 freed by thread T0 here: error 23-Nov-2020 18:53:57 #0 0x7fe3f8a4d7a8 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xde7a8) error 23-Nov-2020 18:53:57 #1 0x55f82599e830 in lde_shutdown ldpd/lde.c:206 error 23-Nov-2020 18:53:57 #2 0x55f8259a2703 in lde_dispatch_parent ldpd/lde.c:666 error 23-Nov-2020 18:53:57 #3 0x55f825ac3815 in thread_call lib/thread.c:1681 error 23-Nov-2020 18:53:57 #4 0x55f825998d5e in lde ldpd/lde.c:160 error 23-Nov-2020 18:53:57 #5 0x55f82598a289 in main ldpd/ldpd.c:320 error 23-Nov-2020 18:53:57 #6 0x7fe3f749db96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96) error 23-Nov-2020 18:53:57 error 23-Nov-2020 18:53:57 previously allocated by thread T0 here: error 23-Nov-2020 18:53:57 #0 0x7fe3f8a4dd28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28) error 23-Nov-2020 18:53:57 #1 0x55f825998cb7 in lde ldpd/lde.c:151 error 23-Nov-2020 18:53:57 #2 0x55f82598a289 in main ldpd/ldpd.c:320 error 23-Nov-2020 18:53:57 #3 0x7fe3f749db96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96) error 23-Nov-2020 18:53:57 The fix is to put this in global space. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
error 26-Nov-2020 14:35:02 ERROR: AddressSanitizer: heap-use-after-free on address 0x631000024838 at pc 0x55cefae977e9 bp 0x7ffdd3546860 sp 0x7ffdd3546850 error 26-Nov-2020 14:35:02 READ of size 4 at 0x631000024838 thread T0 error 26-Nov-2020 14:35:02 #0 0x55cefae977e8 in ldpe_imsg_compose_parent_sync ldpd/ldpe.c:256 error 26-Nov-2020 14:35:02 #1 0x55cefae9ab13 in vlog ldpd/log.c:53 error 26-Nov-2020 14:35:02 #2 0x55cefae9b21f in log_info ldpd/log.c:102 error 26-Nov-2020 14:35:02 #3 0x55cefae96eae in ldpe_shutdown ldpd/ldpe.c:237 error 26-Nov-2020 14:35:02 #4 0x55cefae99254 in ldpe_dispatch_main ldpd/ldpe.c:585 error 26-Nov-2020 14:35:02 #5 0x55cefaf93875 in thread_call lib/thread.c:1681 error 26-Nov-2020 14:35:02 #6 0x55cefae97304 in ldpe ldpd/ldpe.c:136 error 26-Nov-2020 14:35:02 #7 0x55cefae5a2e2 in main ldpd/ldpd.c:322 error 26-Nov-2020 14:35:02 #8 0x7f4ef0c33b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96) error 26-Nov-2020 14:35:02 #9 0x55cefae525e9 in _start (/usr/lib/frr/ldpd+0xb35e9) error 26-Nov-2020 14:35:02 error 26-Nov-2020 14:35:02 0x631000024838 is located 65592 bytes inside of 65632-byte region [0x631000014800,0x631000024860) error 26-Nov-2020 14:35:02 freed by thread T0 here: error 26-Nov-2020 14:35:02 #0 0x7f4ef21e37a8 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xde7a8) error 26-Nov-2020 14:35:02 #1 0x55cefae96e91 in ldpe_shutdown ldpd/ldpe.c:234 error 26-Nov-2020 14:35:02 #2 0x55cefae99254 in ldpe_dispatch_main ldpd/ldpe.c:585 error 26-Nov-2020 14:35:02 #3 0x55cefaf93875 in thread_call lib/thread.c:1681 error 26-Nov-2020 14:35:02 #4 0x55cefae97304 in ldpe ldpd/ldpe.c:136 error 26-Nov-2020 14:35:02 #5 0x55cefae5a2e2 in main ldpd/ldpd.c:322 error 26-Nov-2020 14:35:02 #6 0x7f4ef0c33b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96) error 26-Nov-2020 14:35:02 error 26-Nov-2020 14:35:02 previously allocated by thread T0 here: error 26-Nov-2020 14:35:02 #0 0x7f4ef21e3d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28) error 26-Nov-2020 14:35:02 #1 0x55cefae9725d in ldpe ldpd/ldpe.c:127 error 26-Nov-2020 14:35:02 #2 0x55cefae5a2e2 in main ldpd/ldpd.c:322 error 26-Nov-2020 14:35:02 #3 0x7f4ef0c33b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96) Clean this problem up in the same way as the previous commit Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Sample output - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> root@torm-11:mgmt:~# net show bgp l2vpn evpn route rd 27.0.0.16:3 EVPN type-1 prefix: [1]:[ESI]:[EthTag]:[IPlen]:[VTEP-IP] EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC] EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP] EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP] EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP] BGP routing table entry for 27.0.0.16:3:[1]:[0]:[03:44:38:39:ff:ff:01:00:00:01]:[32]:[0.0.0.0] Paths: (4 available, best #2) Advertised to non peer-group peers: spine-1(swp1) spine-1(swp2) spine-2(swp3) spine-2(swp4) 4435 5551 27.0.0.16 from spine-1(swp2) (27.0.0.13) Origin IGP, valid, external Extended Community: RT:5551:1009 ET:8 Last update: Thu Sep 3 21:01:53 2020 4435 5551 27.0.0.16 from spine-1(swp1) (27.0.0.13) Origin IGP, valid, external, bestpath-from-AS 4435, best (Router ID) Extended Community: RT:5551:1009 ET:8 Last update: Thu Sep 3 21:01:53 2020 4435 5551 27.0.0.16 from spine-2(swp3) (27.0.0.14) Origin IGP, valid, external Extended Community: RT:5551:1009 ET:8 Last update: Thu Sep 3 21:01:53 2020 4435 5551 27.0.0.16 from spine-2(swp4) (27.0.0.14) Origin IGP, valid, external Extended Community: RT:5551:1009 ET:8 Last update: Thu Sep 3 21:01:53 2020 Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
This is useful to go back in the past and check when was that prefix appeared, changed, etc. ``` exit1-debian-9# show ip bgp 172.16.16.1/32 BGP routing table entry for 172.16.16.1/32, version 6 Paths: (2 available, best #2, table default) Advertised to non peer-group peers: home-spine1.donatas.net(192.168.0.2) home-spine1.donatas.net(2a02:bbd::2) 65030 192.168.0.2 from home-spine1.donatas.net(2a02:bbd::2) (172.16.16.1) Origin incomplete, metric 0, valid, external Last update: Thu Apr 8 20:15:25 2021 65030 192.168.0.2 from home-spine1.donatas.net(192.168.0.2) (172.16.16.1) Origin incomplete, metric 0, valid, external, best (Neighbor IP) Last update: Thu Apr 8 20:15:25 2021 exit1-debian-9# ``` Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Show alias name instead of numerical value in `show bgp <prefix>. E.g.: ``` root@exit1-debian-9:~/frr# vtysh -c 'sh run' | grep 'bgp community alias' bgp community alias 65001:123 community-1 bgp community alias 65001:123:1 lcommunity-1 root@exit1-debian-9:~/frr# ``` ``` exit1-debian-9# sh ip bgp 172.16.16.1/32 BGP routing table entry for 172.16.16.1/32, version 21 Paths: (2 available, best #2, table default) Advertised to non peer-group peers: 65030 192.168.0.2 from home-spine1.donatas.net(192.168.0.2) (172.16.16.1) Origin incomplete, metric 0, valid, external, best (Neighbor IP) Community: 65001:12 65001:13 community-1 65001:65534 Large Community: lcommunity-1 65001:123:2 Last update: Fri Apr 16 12:51:27 2021 exit1-debian-9# ``` Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
``` exit1-debian-11# sh ip bgp 100.100.100.100/32 BGP routing table entry for 100.100.100.100/32, version 7 Paths: (2 available, best #2, table default) Advertised to non peer-group peers: home-spine1.donatas.net(192.168.0.2) 65002, (stale) 192.168.10.17 from donatas-pc(192.168.10.17) (0.0.0.0) Origin incomplete, valid, external Community: llgr-stale Last update: Thu Jan 13 08:58:08 2022 Time until Long-lived stale route deleted: 18 65001 192.168.0.2 from home-spine1.donatas.net(192.168.0.2) (2.2.2.2) Origin incomplete, metric 0, valid, external, best (First path received) Last update: Thu Jan 13 08:57:56 2022 ``` ``` ~# vtysh -c 'show ip bgp 100.100.100.100/32 json' | jq '."paths"[] | ."llgrSecondsRemaining"' 17 ``` Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
本PRは廃止された #1 を分割したものです. BGP SRv6 VPNv4のdplane側のみを抽出しています.
概要
FRR側でバックエンドとして利用しないと行けない部分.
zebraとしてsrv6をサポートする.
bgpdとしてのsrv6のサポートする.
show segment-routing-ipv6 tunsrc
locator
Test Env