OVS with a full-routing table utilizing 100% cpu #185

zejar · 2020-05-20T08:55:29Z

As the title says I'm running OpenVSwitch on a Debian box that is configured to be my router.
The box receives a full routing table (IPv6) via Bird2 and as soon as the full table starts coming in OVS will start utilizing 100% of my cpu.
I let it run overnight but 12 hours later it was still at 100%.
The program utilizing the cpu is "ovs-vswitchd".

This issue can be reproduced by setting up a BGP session with a full routing table and purely installing the package openvswitch-switch. No config needs to be done for OVS, it will consume the entire cpu when the full routing table is present on the machine and OVS is running.

Debian version: Linux rtr1 5.6.0-1-cloud-amd64 #1 SMP Debian 5.6.7-1 (2020-04-29) x86_64 GNU/Linux
Bird version: BIRD 2.0.7
OVS version: ovs-vswitchd (Open vSwitch) 2.13.0

The /var/log/openvswitch/ovs-vswitchd.log file shows this:

2020-05-20T08:52:45.118Z|00155|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:52:45.542Z|00797|timeval|WARN|Unreasonably long 1424ms poll interval (668ms user, 736ms system)
2020-05-20T08:52:45.542Z|00798|timeval|WARN|context switches: 0 voluntary, 444 involuntary
2020-05-20T08:52:46.560Z|00156|ovs_rcu(urcu3)|WARN|blocked 1001 ms waiting for main to quiesce
2020-05-20T08:52:47.037Z|00799|timeval|WARN|Unreasonably long 1477ms poll interval (749ms user, 699ms system)
2020-05-20T08:52:47.037Z|00800|timeval|WARN|faults: 256 minor, 0 major
2020-05-20T08:52:47.037Z|00801|timeval|WARN|context switches: 0 voluntary, 433 involuntary
2020-05-20T08:52:48.053Z|00157|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:52:48.507Z|00802|timeval|WARN|Unreasonably long 1454ms poll interval (576ms user, 849ms system)
2020-05-20T08:52:48.507Z|00803|timeval|WARN|context switches: 0 voluntary, 438 involuntary
2020-05-20T08:52:49.523Z|00158|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:52:50.060Z|00804|timeval|WARN|Unreasonably long 1536ms poll interval (651ms user, 784ms system)
2020-05-20T08:52:50.060Z|00805|timeval|WARN|faults: 256 minor, 0 major
2020-05-20T08:52:50.060Z|00806|timeval|WARN|context switches: 0 voluntary, 427 involuntary
2020-05-20T08:52:50.078Z|00807|poll_loop|INFO|Dropped 4 log messages in last 4 seconds (most recently, 1 seconds ago) due to excessive rate
2020-05-20T08:52:50.078Z|00808|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (95% CPU usage)
2020-05-20T08:52:51.077Z|00159|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:52:51.551Z|00809|timeval|WARN|Unreasonably long 1473ms poll interval (670ms user, 772ms system)
2020-05-20T08:52:51.551Z|00810|timeval|WARN|context switches: 0 voluntary, 442 involuntary
2020-05-20T08:52:52.567Z|00160|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:52:53.033Z|00811|timeval|WARN|Unreasonably long 1467ms poll interval (691ms user, 743ms system)
2020-05-20T08:52:53.033Z|00812|timeval|WARN|context switches: 0 voluntary, 444 involuntary
2020-05-20T08:52:54.050Z|00161|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:52:54.505Z|00813|timeval|WARN|Unreasonably long 1455ms poll interval (675ms user, 751ms system)
2020-05-20T08:52:54.505Z|00814|timeval|WARN|context switches: 0 voluntary, 410 involuntary
2020-05-20T08:52:55.523Z|00162|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:52:56.013Z|00815|timeval|WARN|Unreasonably long 1490ms poll interval (762ms user, 701ms system)
2020-05-20T08:52:56.013Z|00816|timeval|WARN|context switches: 0 voluntary, 454 involuntary
2020-05-20T08:52:56.029Z|00817|poll_loop|INFO|Dropped 3 log messages in last 4 seconds (most recently, 1 seconds ago) due to excessive rate
2020-05-20T08:52:56.029Z|00818|poll_loop|INFO|wakeup due to [POLLIN] on fd 16 (<->/var/run/openvswitch/db.sock) at ../lib/stream-fd.c:157 (97% CPU usage)
2020-05-20T08:52:57.029Z|00163|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:52:57.475Z|00819|timeval|WARN|Unreasonably long 1445ms poll interval (677ms user, 748ms system)
2020-05-20T08:52:57.475Z|00820|timeval|WARN|context switches: 0 voluntary, 440 involuntary
2020-05-20T08:52:58.492Z|00164|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:52:58.965Z|00821|timeval|WARN|Unreasonably long 1474ms poll interval (612ms user, 831ms system)
2020-05-20T08:52:58.965Z|00822|timeval|WARN|context switches: 0 voluntary, 434 involuntary
2020-05-20T08:52:59.986Z|00165|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:00.489Z|00823|timeval|WARN|Unreasonably long 1503ms poll interval (708ms user, 689ms system)
2020-05-20T08:53:00.489Z|00824|timeval|WARN|faults: 256 minor, 0 major
2020-05-20T08:53:00.489Z|00825|timeval|WARN|context switches: 0 voluntary, 401 involuntary
2020-05-20T08:53:01.786Z|00166|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:02.232Z|00826|timeval|WARN|Unreasonably long 1454ms poll interval (617ms user, 813ms system)
2020-05-20T08:53:02.232Z|00827|timeval|WARN|context switches: 0 voluntary, 425 involuntary
2020-05-20T08:53:02.236Z|00828|poll_loop|INFO|Dropped 5 log messages in last 6 seconds (most recently, 1 seconds ago) due to excessive rate
2020-05-20T08:53:02.237Z|00829|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (88% CPU usage)
2020-05-20T08:53:03.236Z|00167|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:03.670Z|00830|timeval|WARN|Unreasonably long 1434ms poll interval (672ms user, 742ms system)
2020-05-20T08:53:03.670Z|00831|timeval|WARN|context switches: 0 voluntary, 421 involuntary
2020-05-20T08:53:04.683Z|00168|ovs_rcu(urcu3)|WARN|blocked 1001 ms waiting for main to quiesce
2020-05-20T08:53:05.162Z|00832|timeval|WARN|Unreasonably long 1480ms poll interval (633ms user, 825ms system)
2020-05-20T08:53:05.162Z|00833|timeval|WARN|context switches: 0 voluntary, 437 involuntary
2020-05-20T08:53:06.182Z|00169|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:06.637Z|00834|timeval|WARN|Unreasonably long 1455ms poll interval (735ms user, 697ms system)
2020-05-20T08:53:06.637Z|00835|timeval|WARN|context switches: 0 voluntary, 410 involuntary
2020-05-20T08:53:07.654Z|00170|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:08.092Z|00836|timeval|WARN|Unreasonably long 1438ms poll interval (630ms user, 787ms system)
2020-05-20T08:53:08.092Z|00837|timeval|WARN|faults: 243 minor, 0 major
2020-05-20T08:53:08.092Z|00838|timeval|WARN|context switches: 0 voluntary, 402 involuntary
2020-05-20T08:53:08.109Z|00839|poll_loop|INFO|Dropped 4 log messages in last 4 seconds (most recently, 1 seconds ago) due to excessive rate
2020-05-20T08:53:08.110Z|00840|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (97% CPU usage)
2020-05-20T08:53:09.109Z|00171|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:09.599Z|00841|timeval|WARN|Unreasonably long 1490ms poll interval (702ms user, 718ms system)
2020-05-20T08:53:09.599Z|00842|timeval|WARN|context switches: 0 voluntary, 399 involuntary
2020-05-20T08:53:10.617Z|00172|ovs_rcu(urcu3)|WARN|blocked 1001 ms waiting for main to quiesce
2020-05-20T08:53:11.077Z|00843|timeval|WARN|Unreasonably long 1461ms poll interval (662ms user, 749ms system)
2020-05-20T08:53:11.078Z|00844|timeval|WARN|context switches: 0 voluntary, 394 involuntary
2020-05-20T08:53:12.093Z|00173|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:12.544Z|00845|timeval|WARN|Unreasonably long 1450ms poll interval (726ms user, 704ms system)
2020-05-20T08:53:12.544Z|00846|timeval|WARN|context switches: 0 voluntary, 403 involuntary
2020-05-20T08:53:13.560Z|00174|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:13.984Z|00847|timeval|WARN|Unreasonably long 1424ms poll interval (648ms user, 757ms system)
2020-05-20T08:53:13.984Z|00848|timeval|WARN|context switches: 0 voluntary, 399 involuntary
2020-05-20T08:53:14.002Z|00849|poll_loop|INFO|Dropped 4 log messages in last 4 seconds (most recently, 1 seconds ago) due to excessive rate
2020-05-20T08:53:14.002Z|00850|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (96% CPU usage)
2020-05-20T08:53:15.002Z|00175|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:15.419Z|00851|timeval|WARN|Unreasonably long 1417ms poll interval (699ms user, 701ms system)
2020-05-20T08:53:15.419Z|00852|timeval|WARN|context switches: 0 voluntary, 391 involuntary
2020-05-20T08:53:16.437Z|00176|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:16.874Z|00853|timeval|WARN|Unreasonably long 1437ms poll interval (623ms user, 796ms system)
2020-05-20T08:53:16.874Z|00854|timeval|WARN|context switches: 0 voluntary, 384 involuntary
2020-05-20T08:53:17.891Z|00177|ovs_rcu(urcu3)|WARN|blocked 1001 ms waiting for main to quiesce
2020-05-20T08:53:18.348Z|00855|timeval|WARN|Unreasonably long 1458ms poll interval (698ms user, 731ms system)
2020-05-20T08:53:18.348Z|00856|timeval|WARN|context switches: 0 voluntary, 427 involuntary
2020-05-20T08:53:19.366Z|00178|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:19.883Z|00857|timeval|WARN|Unreasonably long 1517ms poll interval (689ms user, 736ms system)
2020-05-20T08:53:19.883Z|00858|timeval|WARN|context switches: 0 voluntary, 402 involuntary
2020-05-20T08:53:20.901Z|00179|ovs_rcu(urcu3)|WARN|blocked 1001 ms waiting for main to quiesce
2020-05-20T08:53:21.362Z|00859|timeval|WARN|Unreasonably long 1462ms poll interval (683ms user, 750ms system)
2020-05-20T08:53:21.362Z|00860|timeval|WARN|faults: 128 minor, 0 major
2020-05-20T08:53:21.362Z|00861|timeval|WARN|context switches: 0 voluntary, 396 involuntary
2020-05-20T08:53:21.379Z|00862|poll_loop|INFO|Dropped 5 log messages in last 6 seconds (most recently, 1 seconds ago) due to excessive rate
2020-05-20T08:53:21.379Z|00863|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (97% CPU usage)
2020-05-20T08:53:22.380Z|00180|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:22.820Z|00864|timeval|WARN|Unreasonably long 1441ms poll interval (612ms user, 808ms system)
2020-05-20T08:53:22.820Z|00865|timeval|WARN|context switches: 0 voluntary, 387 involuntary
2020-05-20T08:53:23.838Z|00181|ovs_rcu(urcu3)|WARN|blocked 1001 ms waiting for main to quiesce
2020-05-20T08:53:24.339Z|00866|timeval|WARN|Unreasonably long 1501ms poll interval (670ms user, 810ms system)
2020-05-20T08:53:24.339Z|00867|timeval|WARN|context switches: 0 voluntary, 422 involuntary
2020-05-20T08:53:25.356Z|00182|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:25.861Z|00868|timeval|WARN|Unreasonably long 1505ms poll interval (744ms user, 734ms system)
2020-05-20T08:53:25.861Z|00869|timeval|WARN|context switches: 0 voluntary, 413 involuntary
2020-05-20T08:53:25.861Z|00870|coverage|INFO|Dropped 39 log messages in last 59 seconds (most recently, 2 seconds ago) due to excessive rate
2020-05-20T08:53:25.861Z|00871|coverage|INFO|Skipping details of duplicate event coverage for hash=b67da075
2020-05-20T08:53:26.880Z|00183|ovs_rcu(urcu3)|WARN|blocked 1001 ms waiting for main to quiesce
2020-05-20T08:53:27.294Z|00872|timeval|WARN|Unreasonably long 1415ms poll interval (647ms user, 749ms system)
2020-05-20T08:53:27.294Z|00873|timeval|WARN|context switches: 0 voluntary, 380 involuntary
2020-05-20T08:53:27.312Z|00874|poll_loop|INFO|Dropped 4 log messages in last 4 seconds (most recently, 1 seconds ago) due to excessive rate
2020-05-20T08:53:27.312Z|00875|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (98% CPU usage)
2020-05-20T08:53:28.312Z|00184|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:28.738Z|00876|timeval|WARN|Unreasonably long 1426ms poll interval (692ms user, 717ms system)
2020-05-20T08:53:28.738Z|00877|timeval|WARN|context switches: 0 voluntary, 394 involuntary
2020-05-20T08:53:29.758Z|00185|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:30.334Z|00878|timeval|WARN|Unreasonably long 1577ms poll interval (693ms user, 783ms system)
2020-05-20T08:53:30.334Z|00879|timeval|WARN|context switches: 0 voluntary, 400 involuntary
2020-05-20T08:53:31.351Z|00186|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:31.779Z|00880|timeval|WARN|Unreasonably long 1428ms poll interval (663ms user, 747ms system)
2020-05-20T08:53:31.779Z|00881|timeval|WARN|context switches: 0 voluntary, 394 involuntary
2020-05-20T08:53:32.797Z|00187|ovs_rcu(urcu3)|WARN|blocked 1001 ms waiting for main to quiesce
2020-05-20T08:53:33.281Z|00882|timeval|WARN|Unreasonably long 1484ms poll interval (711ms user, 751ms system)
2020-05-20T08:53:33.281Z|00883|timeval|WARN|context switches: 0 voluntary, 396 involuntary
2020-05-20T08:53:33.298Z|00884|poll_loop|INFO|Dropped 3 log messages in last 4 seconds (most recently, 1 seconds ago) due to excessive rate
2020-05-20T08:53:33.298Z|00885|poll_loop|INFO|wakeup due to [POLLIN] on fd 16 (<->/var/run/openvswitch/db.sock) at ../lib/stream-fd.c:157 (96% CPU usage)
2020-05-20T08:53:34.298Z|00188|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:34.752Z|00886|timeval|WARN|Unreasonably long 1453ms poll interval (629ms user, 805ms system)
2020-05-20T08:53:34.752Z|00887|timeval|WARN|context switches: 0 voluntary, 398 involuntary
2020-05-20T08:53:35.771Z|00189|ovs_rcu(urcu3)|WARN|blocked 1001 ms waiting for main to quiesce
2020-05-20T08:53:36.252Z|00888|timeval|WARN|Unreasonably long 1482ms poll interval (625ms user, 837ms system)
2020-05-20T08:53:36.252Z|00889|timeval|WARN|context switches: 0 voluntary, 417 involuntary
2020-05-20T08:53:37.270Z|00190|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:37.689Z|00890|timeval|WARN|Unreasonably long 1419ms poll interval (637ms user, 764ms system)
2020-05-20T08:53:37.689Z|00891|timeval|WARN|context switches: 0 voluntary, 390 involuntary
2020-05-20T08:53:38.707Z|00191|ovs_rcu(urcu3)|WARN|blocked 1001 ms waiting for main to quiesce
2020-05-20T08:53:39.127Z|00892|timeval|WARN|Unreasonably long 1421ms poll interval (689ms user, 713ms system)
2020-05-20T08:53:39.127Z|00893|timeval|WARN|context switches: 0 voluntary, 397 involuntary
2020-05-20T08:53:39.146Z|00894|poll_loop|INFO|Dropped 4 log messages in last 6 seconds (most recently, 1 seconds ago) due to excessive rate
2020-05-20T08:53:39.146Z|00895|poll_loop|INFO|wakeup due to 18-ms timeout at ../ofproto/ofproto-dpif.c:1940 (99% CPU usage)
2020-05-20T08:53:40.241Z|00192|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:40.643Z|00896|timeval|WARN|Unreasonably long 1405ms poll interval (655ms user, 735ms system)
2020-05-20T08:53:40.643Z|00897|timeval|WARN|context switches: 0 voluntary, 384 involuntary
2020-05-20T08:53:41.646Z|00193|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:42.091Z|00898|timeval|WARN|Unreasonably long 1445ms poll interval (684ms user, 743ms system)
2020-05-20T08:53:42.091Z|00899|timeval|WARN|context switches: 0 voluntary, 400 involuntary
2020-05-20T08:53:43.106Z|00194|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:43.575Z|00900|timeval|WARN|Unreasonably long 1470ms poll interval (699ms user, 751ms system)
2020-05-20T08:53:43.575Z|00901|timeval|WARN|context switches: 0 voluntary, 393 involuntary
2020-05-20T08:53:44.594Z|00195|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:45.037Z|00902|timeval|WARN|Unreasonably long 1444ms poll interval (587ms user, 834ms system)
2020-05-20T08:53:45.038Z|00903|timeval|WARN|context switches: 0 voluntary, 408 involuntary
2020-05-20T08:53:45.055Z|00904|poll_loop|INFO|Dropped 5 log messages in last 6 seconds (most recently, 1 seconds ago) due to excessive rate
2020-05-20T08:53:45.055Z|00905|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (97% CPU usage)
2020-05-20T08:53:46.055Z|00196|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:46.508Z|00906|timeval|WARN|Unreasonably long 1453ms poll interval (680ms user, 747ms system)
2020-05-20T08:53:46.508Z|00907|timeval|WARN|context switches: 0 voluntary, 402 involuntary
2020-05-20T08:53:47.524Z|00197|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:47.954Z|00908|timeval|WARN|Unreasonably long 1430ms poll interval (653ms user, 760ms system)
2020-05-20T08:53:47.954Z|00909|timeval|WARN|context switches: 0 voluntary, 394 involuntary
2020-05-20T08:53:48.973Z|00198|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:49.479Z|00910|timeval|WARN|Unreasonably long 1506ms poll interval (744ms user, 701ms system)
2020-05-20T08:53:49.479Z|00911|timeval|WARN|context switches: 0 voluntary, 416 involuntary
2020-05-20T08:53:50.496Z|00199|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:50.989Z|00912|timeval|WARN|Unreasonably long 1493ms poll interval (708ms user, 717ms system)
2020-05-20T08:53:50.989Z|00913|timeval|WARN|context switches: 0 voluntary, 387 involuntary
2020-05-20T08:53:51.005Z|00914|poll_loop|INFO|Dropped 4 log messages in last 4 seconds (most recently, 1 seconds ago) due to excessive rate
2020-05-20T08:53:51.005Z|00915|poll_loop|INFO|wakeup due to [POLLIN] on fd 18 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (97% CPU usage)
2020-05-20T08:53:52.005Z|00200|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:52.433Z|00916|timeval|WARN|Unreasonably long 1427ms poll interval (652ms user, 755ms system)
2020-05-20T08:53:52.433Z|00917|timeval|WARN|context switches: 0 voluntary, 394 involuntary
2020-05-20T08:53:53.453Z|00201|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce
2020-05-20T08:53:53.893Z|00918|timeval|WARN|Unreasonably long 1440ms poll interval (688ms user, 727ms system)
2020-05-20T08:53:53.893Z|00919|timeval|WARN|context switches: 0 voluntary, 405 involuntary
2020-05-20T08:53:54.909Z|00202|ovs_rcu(urcu3)|WARN|blocked 1001 ms waiting for main to quiesce
2020-05-20T08:53:55.340Z|00920|timeval|WARN|Unreasonably long 1431ms poll interval (665ms user, 744ms system)
2020-05-20T08:53:55.340Z|00921|timeval|WARN|context switches: 0 voluntary, 402 involuntary
2020-05-20T08:53:56.356Z|00203|ovs_rcu(urcu3)|WARN|blocked 1000 ms waiting for main to quiesce

The text was updated successfully, but these errors were encountered:

AswiniViswanathan · 2021-07-15T05:10:34Z

I am also seeing this issue can anyone point some solution

blp · 2021-07-15T18:47:18Z

It looks like there are lots of changes in the routing table, which isn't too surprising given that the system is running BGP. OVS tries to keep up with these. I'm surprised it's this expensive.

If you stop the BGP daemon temporarily, does OVS CPU usage drop to near-zero? If so, then that would confirm the root of the issue and we can look into how to avoid the high CPU for this case.

ddominet · 2023-09-27T15:53:45Z

It looks like there are lots of changes in the routing table, which isn't too surprising given that the system is running BGP. OVS tries to keep up with these. I'm surprised it's this expensive.

If you stop the BGP daemon temporarily, does OVS CPU usage drop to near-zero? If so, then that would confirm the root of the issue and we can look into how to avoid the high CPU for this case.

I have the same issue, bird running BGP, and server having a full table in kernel is indeed the culprit.
How can we make openvswitch behave normally in such scenario?

ddominet · 2023-09-27T15:56:29Z

@zejar have you found a solution to this problem?

zejar · 2023-09-27T16:06:37Z

@ddominet I have not. Instead I switched to "regular" Linux bridges.

igsilya · 2023-09-27T16:11:43Z

If it is not necessary for OVS to be aware of these routes, the solution is to move BGP daemon into a separate network namespace.

ddominet · 2023-09-27T16:27:09Z

@igsilya

I'm using it with kolla ansible openstack, BGP is there to provide a connectivity to a prefix, that will need to be avaliable within openstack. Is it something that will work with separate namespaces? Might be too separated. If it was a separate routing table than i could just policy route it. In this case i don't think so

igsilya · 2023-09-27T16:41:26Z

Unfortunately, I'm not a big BGP or OpenStack expert to tell if it will work.

FWIW, Here is a link to discussion on why OVS is consuming a lot of CPU in this scenario: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052092.html

wilkmar · 2023-10-19T06:05:11Z

After reading the discussion that @igsilya shared link to in his last comment,
I was able to recreate the issue on pure Ubuntu 20.04.6 LTS, kernel 5.4.0-162-generic, openvswitch-switch 2.13.8-0ubuntu1.2, 1 vcpu, without running BGB daemon at all. This is only related to the massive routing table update stream. One of the triggers may be BGB full table sync, the other example is as follows (without BGB/Bird running):

sudo apt install openvswitch-switch
for i in {10..25}; do for k in {0..20}; do for j in {1..254}; do sudo ip route add 10.1$i.$k.$j/32 via ; done; done; done

^this loop will continue for some time, don't need to wait for it to complete to see the symptoms in the logs or in htop/top.

wait a few minutes and keep watching the /var/log/openvswitch/ovs-vswitchd.log
sudo grep long /var/log/openvswitch/ovs-vswitchd.log | sort -n -k3 | tail
2023-10-18T15:08:01.173Z|00282|timeval|WARN|Unreasonably long 1518ms poll interval (337ms user, 313ms system)
2023-10-18T15:08:05.703Z|00289|timeval|WARN|Unreasonably long 1521ms poll interval (287ms user, 368ms system)
2023-10-18T15:07:57.995Z|00278|timeval|WARN|Unreasonably long 1523ms poll interval (296ms user, 367ms system)
2023-10-18T15:07:42.543Z|00255|timeval|WARN|Unreasonably long 1539ms poll interval (355ms user, 309ms system)
2023-10-18T15:07:26.212Z|00230|timeval|WARN|Unreasonably long 1542ms poll interval (258ms user, 404ms system)
2023-10-18T15:08:02.744Z|00285|timeval|WARN|Unreasonably long 1547ms poll interval (299ms user, 365ms system)
2023-10-18T15:08:10.185Z|00295|timeval|WARN|Unreasonably long 1551ms poll interval (268ms user, 398ms system)
2023-10-18T15:07:59.629Z|00280|timeval|WARN|Unreasonably long 1614ms poll interval (307ms user, 428ms system)
2023-10-18T15:07:38.103Z|00246|timeval|WARN|Unreasonably long 1648ms poll interval (338ms user, 376ms system)
2023-10-18T15:08:12.263Z|00297|timeval|WARN|Unreasonably long 2050ms poll interval (374ms user, 497ms system)

In this example all routing tables entries sum up to 14k+:
sudo ip r l table 0 | wc -l
14650

f0o · 2024-03-14T07:52:54Z

bump?

Any news?

ddominet · 2024-03-14T08:47:51Z

Unfortunately OVS team doesn’t seem to take those issues seriously. And that to me is a big one, or to anyone working in ISP environment. Kind Regards, Dominik W dniu czw., 14.03.2024 o 08:53 Daniel Preussker ***@***.***> napisał(a):

…

bump? Any news? — Reply to this email directly, view it on GitHub <#185 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASHDJ2NKJI63DS4PFPNFKZTYYFJO3AVCNFSM4NFXOQR2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJZGY3TMNBYGY4Q> . You are receiving this because you were mentioned.Message ID: ***@***.***>

igsilya · 2024-03-14T09:52:26Z

As I said, the issue doesn't have a good solution, only compromises. The reason is that Linux kernel routing change notifications are not a reliable source of information. Any application that tries to incrementally maintain the state of the routing table based on these notifications is implementing a huge amount of heuristics to guess which notification is a good one and which is not. And they are never fully correct, meaning old already removed entries may linger in a system or some updates may not be delivered.
See:
https://bugzilla.redhat.com/show_bug.cgi?id=1337855
https://bugzilla.redhat.com/show_bug.cgi?id=1722728
thom311/libnl#226
thom311/libnl#224

The only approach that gives a correct view on a current state of a routing table is to dump it as a whole after each change and that is causing high CPU usage for obvious reasons if you have huge routing tables with a high rate of updates.

Most of the OVS users are not running BGP in the same network namespace with ovs-vswitchd, so it is not a problem for them. If you can change the architecture of your deployment to not run BGP in the same network namespace with OVS, that is still a preferred approach to fix the problem.

One approach we can take is to delay the dumping of the routing table by a certain amount of time. This should alleviate some of the load at a cost of not having the most up to date routing table for that amount of time (few seconds?). Is that a reasonable compromise for your setup?

BGP is there to provide a connectivity to a prefix, that will need to be avaliable within openstack

What exactly these routes are used for? Do you have an OVS tunnel that relies on these routes?

Alternative and not recommended way to resolve this issue is to just disable synchronization with the kernel routing table with a command-line argument we have for testing purposes: --disable-system-route. Some OVS functionality will stop working, most notably OVS will not be able to determine link state changes for OVS tunnel ports, because it will not be able to lookup the route and determine on which interface the traffic will go after encapsulation. The tunnels themselves will be working just fine, since they operate inside the kernel. Some other functionality like checking route table with ovs-appctl commands will not work as well, for obvious reasons. And this will completely break tunnels in userspace datapath, but I'm assuming you're using the Linux kernel datapath. I'm maybe missing a few things that will stop working, but I listed the most notable I remember.
This option is not officially supported in any form, but you can try it, if it is the only option for your setup.

AswiniViswanathan · 2024-03-14T10:47:26Z

Have a solution which worked for my environment.
While running the ovsdb process try adding this
Daemon Option-
--detach
Runs ovsdb-server as a background process. The process forks,
and in the child it starts a new session, closes the standard
file descriptors (which has the side effect of disabling logging
to the console), and changes its current directory to the root
(unless --no-chdir is specified). After the child completes its
initialization, the parent exits. ovsdb-server detaches only
after it starts listening on all configured remotes. At this
point, all standalone and active-backup databases are ready for
use. Clustered databases only become ready for use after they
finish joining their clusters (which could have already happened
in previous runs of ovsdb-server).
More information on this link:https://www.openvswitch.org//support/dist-docs/ovsdb-server.1.html
@igsilya @ddominet @wilkmar @zejar

igsilya · 2024-03-14T10:54:29Z

@AswiniViswanathan I don't think the --detach option on ovsdb process is related to this issue.

AswiniViswanathan · 2024-03-14T11:34:06Z

@igsilya Yes its related to utilization,
I faced the below issue and after adding the detach option it did fix the problem.
2020-05-20T08:53:06.637Z|00834|timeval|WARN|Unreasonably long 1455ms poll interval (735ms user, 697ms system)

Give it a try.

igsilya · 2024-03-14T11:41:26Z

@AswiniViswanathan it's great that it worked for you somehow, but that makes no sense to me. This option has nothing to do with routing or CPU usage and you're suggesting to add it to the process that is not the problem with this issue. And I'm pretty sure that other people on this thread already use this option as it is a default way to run OVS services.

AswiniViswanathan · 2024-03-14T12:03:17Z

@igsilya okay. Anyway lets see if it helps anyone else.

f0o · 2024-03-14T12:42:04Z

@igsilya I just finished moving the BGP routing table into it's own VRF but the issue persists.

the default routing table is now effectively empty and the interfaces inside the OVS bridges are all part of their own dedicated VRF that only contains the on-link subnets and a default route.

BGP with full tables is in it's own routing table with ~2.4m entries.

It seems that OVS doesnt care at all which routing table is used or am I missing something obvious?

igsilya · 2024-03-14T12:45:02Z

@f0o yes, OVS monitors all tables. You need a separate network namespace.
There was a suggestion in the past to only dump the default one, we may explore this option.

f0o · 2024-03-14T12:51:03Z

@igsilya Gotcha, I foolishly assumed vrf would follow the same logic as netns.

Let's see if I can replicate this whole thing with netns instead!

f0o · 2024-03-14T14:38:56Z

@igsilya Couldn't get it all to work with netns so I took the "easy" way and tried with --disable-system-route - however that parameter does not change the situation.

I verified ovs-vswitchd was actually started with it:

ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --disable-system-route --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach

igsilya · 2024-03-14T14:58:31Z

@f0o , yeah, my bad. Looking through the code, OVS will not update internal routing table if this option enabled, but it will still dump the whole routing table from the kernel. So, yeah, it removes some of the work, but clearly not enough. This is clearly not an intended use case for this option.

I'm working on a fix to only track changes from the main routing table, will post for review hopefully this week. It should help if you configure BGP with its own VRF. But for now, it seems, only a network namespace will work.

f0o · 2024-03-14T15:01:20Z

@igsilya fair enough, I'm really excited for your VRF fix 😅 - Maybe to make it more modular, have a parameter that you can use to select a table instead of just defaulting to the default/local one. Could be useful sometimes.

I'll continue to work towards a netns solution just to get a solution to into this issue that might work for others without waiting for now packages to be released.

f0o · 2024-03-14T16:34:28Z

@igsilya Although I can confirm that using netns solves the issue, there is no good way to "leak" routes across like there is with VRFs - meaning that moving traffic from OVS into the BGP netns is tedious at best and require linknets adding a lot more overhead than it should but a simple left-pocket -> right-pocket routing action.

So I cant accept netns as a viable solution unfortunately

igsilya · 2024-03-15T13:16:01Z

@f0o there will be a cost of running in the same network namespace. Linux kernel doesn't provide a mechanism to receive routes for specific tables only. That means that for every BGP update ovs-vswitchd will still receive a notification. It will be parsed and discarded. This should not be very expensive to do, but it may take a bit of time.
Furthermore, whenever there is a normal route update in the main/local/default table, OVS will have to dump all the routes from all the tables (no filtering supported by the kernel), parse them and choose the relevant ones. That will still be expensive to do. The impact will depend on how frequently the VRF will update the main routing table by leaking the routes.

Edit: There might be a way to filter out the full dump.

Currently, ovs-vswitchd is subscribed to all the routing changes in the kernel. On each change, it marks the internal routing table cache as invalid, then resets it and dumps all the routes from the kernel from scratch. The reason for that is kernel routing updates not being reliable in a sense that it's hard to tell which route is getting removed or modified. Userspace application has to track the order in which route entries are dumped from the kernel. Updates can get lost or even duplicated and the kernel doesn't provide a good mechanism to distinguish one route from another. To my knowledge, dumping all the routes from a kernel after each change is the only way to keep the cache consistent. Some more info can be found in the following never addressed issues: https://bugzilla.redhat.com/1337860 https://bugzilla.redhat.com/1337855 It seems to be believed that NetworkManager "mostly" does incremental updates right. But it is still not completely correct, will re-dump the whole table in certain cases, and it takes a huge amount of very complicated code to do the accounting and route comparisons. Going back to ovs-vswitchd, it currently dumps routes from all the routing tables. If it will get conflicting routes from multiple tables, the cache will not be useful. The routing cache in userspace is primarily used for checking the egress port for tunneled traffic and this way also detecting link state changes for a tunnel port. For userspace datapath it is used for actual routing of the packet after sending to a native tunnel. With kernel datapath we don't really have a mechanism to know which routing table will actually be used by the kernel after encapsulation, so our lookups on a cache may be incorrect because of this as well. So, unless all the relevant routes are in the standard tables, the lookup in userspace route cache is unreliable. Luckily, most setups are not using any complicated routing in non-standard tables that OVS has to be aware of. It is possible, but unlikely, that standard routing tables are completely empty while some other custom table is not, and all the OVS tunnel traffic is directed to that table. That would be the only scenario where dumping non-standard tables would make sense. But it seems like this kind of setup will likely need a way to tell OVS from which table the routes should be taken, or we'll need to dump routing rules and keep a separate cache for each table, so we can first match on rules and then lookup correct routes in a specific table. I'm not sure if trying to implement all that is justified. For now, stop considering routes from non-standard tables to avoid mixing different tables together and also wasting CPU resources. This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is running on a same host and in a same network namespace with OVS using its own custom routing table. Unfortunately, there seems to be no way to tell the kernel to send updates only for particular tables. So, we'll still receive and parse all of them. But they will not result in a full cache invalidation in most cases. Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps. So, we can make use of it and dump only standard tables when we get a relevant route update. NETLINK_GET_STRICT_CHK has to be enabled on the socket for filtering to work. There is no reason to not enable it by default, if supported. It is not used outside of NETLINK_ROUTE. Fixes: f0e167f ("route-table: Handle route updates more robustly.") Fixes: ea83a2f ("lib: Show tunnel egress interface in ovsdb") Reported-at: openvswitch/ovs-issues#185 Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: 0-day Robot <robot@bytheb.org>

f0o · 2024-03-18T17:51:08Z

@igsilya I'm running into an odd issue and I wonder if it's connected to this one.

I cant seem to manage any bridges with ovs-vsctl anymore.

I keep getting the error:
...00002|fatal_signal|WARN|terminating with signal 14 (Alarm clock)

worth to note, all ovs-vsctl commands are through ansible and with a default timeout of 5s.

//EDIT:

Increasing the timeout worked but I also notice a very very big amount of these errors in OVN:
2024-03-18T18:08:50.183Z|00159|rconn(ovn_statctrl2)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Protocol error)

And I believe that's also caused by the full-tables here...

The moment I dropped this router out of the IBGP mesh the errors stopped and ports were being added etc...

So this 100% CPU utilization is not just an annoyance because it's pegging a core but it actually immobilizes OVS entirely!

igsilya · 2024-03-18T18:48:32Z

@f0o I assume ovs-vswitchd reports long poll intervals, what are the largest numbers you see there? If they are in 2.5+ seconds (2.5 before connection is accpeted + potentially 2.5 on the next iteration before the reply is sent), then yes, processes that do not wait longer than 5 seconds may disconnect. Some database operations may potentially wait for more than 2 poll intervals.

For OVN, I assume you're using OVN 23.09 older than v23.09.1. The statctrl thread had an inactivity probe set to 5 seconds before commit ovn-org/ovn@bbd07439b . Update to latest v23.09.3 should fix the problem.

f0o · 2024-03-18T19:42:07Z

@igsilya Thanks for the insight!

I went ahead and built openvswitch-3.2.1 with the patch from 01f7582584354cf087924170723dd0838d8b34f3 and I can confidently say that it is no longer burning CPU nor being unresponsive. All traffic is flowing fine and ports/flows are being added/removed swiftly!

I guess I'm your lab-bunny now! 🙈

igsilya · 2024-03-18T20:09:09Z

@f0o Thanks a lot for testing! Now we just need to wait for some code review. Hopefully, it won't take too long.

If you want to reply to the original patch email with a Tested-by: Name <email> tag, we can include it in the commit message before applying. Assuming you're not subscribed to ovs-dev list, you may reply by importing an mbox from patchwork to your email client or pressing on the mailto link at the top of the page in archive. leaving the tag here in a github comment is also fine.
But, of course, all that is completely optional. :)

f0o · 2024-03-19T08:56:06Z

I just ran a few tests by adding/removing flows, ports and bridges like a maniac and verifying that traffic flows while both routers remain in IBGP and EBGP with multiple full tables in different VRFs.

Everything works like a breeze! ovs-vswitchd chills at 2% cpu and from the logs of OVS/OVN everything is operational and normal.

I tried adding the mbox or using the mailto but all my mail clients strip away the Reply-To header "because of safety". So I'm unable to stamp it there.

So please take my informal stamp of testing and approval here :)

ddominet · 2024-03-19T10:03:52Z

Great Job Team! Sorry for my harsh words previously, But happy that it’s fixed. Kind Regards, Dominik W dniu wt., 19.03.2024 o 09:56 Daniel Preussker ***@***.***> napisał(a):

…

I just ran a few tests by adding/removing flows, ports and bridges like a maniac and verifying that traffic flows while both routers remain in IBGP and EBGP with multiple full tables in different VRFs. Everything works like a breeze! ovs-vswitchd chills at 2% cpu and from the logs of OVS/OVN everything is operational and normal. I tried adding the mbox or using the mailto but all my mail clients strip away the Reply-To header "because of safety". So I'm unable to stamp it there. So please take my informal stamp of testing and approval here :) — Reply to this email directly, view it on GitHub <#185 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASHDJ2PHFZHLTE6WGKI7SETYY74TZAVCNFSM4NFXOQR2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBQGYZTONZQGA3Q> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Currently, ovs-vswitchd is subscribed to all the routing changes in the kernel. On each change, it marks the internal routing table cache as invalid, then resets it and dumps all the routes from the kernel from scratch. The reason for that is kernel routing updates not being reliable in a sense that it's hard to tell which route is getting removed or modified. Userspace application has to track the order in which route entries are dumped from the kernel. Updates can get lost or even duplicated and the kernel doesn't provide a good mechanism to distinguish one route from another. To my knowledge, dumping all the routes from a kernel after each change is the only way to keep the cache consistent. Some more info can be found in the following never addressed issues: https://bugzilla.redhat.com/1337860 https://bugzilla.redhat.com/1337855 It seems to be believed that NetworkManager "mostly" does incremental updates right. But it is still not completely correct, will re-dump the whole table in certain cases, and it takes a huge amount of very complicated code to do the accounting and route comparisons. Going back to ovs-vswitchd, it currently dumps routes from all the routing tables. If it will get conflicting routes from multiple tables, the cache will not be useful. The routing cache in userspace is primarily used for checking the egress port for tunneled traffic and this way also detecting link state changes for a tunnel port. For userspace datapath it is used for actual routing of the packet after sending to a native tunnel. With kernel datapath we don't really have a mechanism to know which routing table will actually be used by the kernel after encapsulation, so our lookups on a cache may be incorrect because of this as well. So, unless all the relevant routes are in the standard tables, the lookup in userspace route cache is unreliable. Luckily, most setups are not using any complicated routing in non-standard tables that OVS has to be aware of. It is possible, but unlikely, that standard routing tables are completely empty while some other custom table is not, and all the OVS tunnel traffic is directed to that table. That would be the only scenario where dumping non-standard tables would make sense. But it seems like this kind of setup will likely need a way to tell OVS from which table the routes should be taken, or we'll need to dump routing rules and keep a separate cache for each table, so we can first match on rules and then lookup correct routes in a specific table. I'm not sure if trying to implement all that is justified. For now, stop considering routes from non-standard tables to avoid mixing different tables together and also wasting CPU resources. This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is running on a same host and in a same network namespace with OVS using its own custom routing table. Unfortunately, there seems to be no way to tell the kernel to send updates only for particular tables. So, we'll still receive and parse all of them. But they will not result in a full cache invalidation in most cases. Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps. So, we can make use of it and dump only standard tables when we get a relevant route update. NETLINK_GET_STRICT_CHK has to be enabled on the socket for filtering to work. There is no reason to not enable it by default, if supported. It is not used outside of NETLINK_ROUTE. Fixes: f0e167f ("route-table: Handle route updates more robustly.") Fixes: ea83a2f ("lib: Show tunnel egress interface in ovsdb") Reported-at: openvswitch/ovs-issues#185 Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

Currently, ovs-vswitchd is subscribed to all the routing changes in the kernel. On each change, it marks the internal routing table cache as invalid, then resets it and dumps all the routes from the kernel from scratch. The reason for that is kernel routing updates not being reliable in a sense that it's hard to tell which route is getting removed or modified. Userspace application has to track the order in which route entries are dumped from the kernel. Updates can get lost or even duplicated and the kernel doesn't provide a good mechanism to distinguish one route from another. To my knowledge, dumping all the routes from a kernel after each change is the only way to keep the cache consistent. Some more info can be found in the following never addressed issues: https://bugzilla.redhat.com/1337860 https://bugzilla.redhat.com/1337855 It seems to be believed that NetworkManager "mostly" does incremental updates right. But it is still not completely correct, will re-dump the whole table in certain cases, and it takes a huge amount of very complicated code to do the accounting and route comparisons. Going back to ovs-vswitchd, it currently dumps routes from all the routing tables. If it will get conflicting routes from multiple tables, the cache will not be useful. The routing cache in userspace is primarily used for checking the egress port for tunneled traffic and this way also detecting link state changes for a tunnel port. For userspace datapath it is used for actual routing of the packet after sending to a native tunnel. With kernel datapath we don't really have a mechanism to know which routing table will actually be used by the kernel after encapsulation, so our lookups on a cache may be incorrect because of this as well. So, unless all the relevant routes are in the standard tables, the lookup in userspace route cache is unreliable. Luckily, most setups are not using any complicated routing in non-standard tables that OVS has to be aware of. It is possible, but unlikely, that standard routing tables are completely empty while some other custom table is not, and all the OVS tunnel traffic is directed to that table. That would be the only scenario where dumping non-standard tables would make sense. But it seems like this kind of setup will likely need a way to tell OVS from which table the routes should be taken, or we'll need to dump routing rules and keep a separate cache for each table, so we can first match on rules and then lookup correct routes in a specific table. I'm not sure if trying to implement all that is justified. For now, stop considering routes from non-standard tables to avoid mixing different tables together and also wasting CPU resources. This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is running on a same host and in a same network namespace with OVS using its own custom routing table. Unfortunately, there seems to be no way to tell the kernel to send updates only for particular tables. So, we'll still receive and parse all of them. But they will not result in a full cache invalidation in most cases. Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps. So, we can make use of it and dump only standard tables when we get a relevant route update. NETLINK_GET_STRICT_CHK has to be enabled on the socket for filtering to work. There is no reason to not enable it by default, if supported. It is not used outside of NETLINK_ROUTE. Fixes: f0e167f ("route-table: Handle route updates more robustly.") Fixes: ea83a2f ("lib: Show tunnel egress interface in ovsdb") Reported-at: openvswitch/ovs-issues#185 Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: 0-day Robot <robot@bytheb.org>

Currently, ovs-vswitchd is subscribed to all the routing changes in the kernel. On each change, it marks the internal routing table cache as invalid, then resets it and dumps all the routes from the kernel from scratch. The reason for that is kernel routing updates not being reliable in a sense that it's hard to tell which route is getting removed or modified. Userspace application has to track the order in which route entries are dumped from the kernel. Updates can get lost or even duplicated and the kernel doesn't provide a good mechanism to distinguish one route from another. To my knowledge, dumping all the routes from a kernel after each change is the only way to keep the cache consistent. Some more info can be found in the following never addressed issues: https://bugzilla.redhat.com/1337860 https://bugzilla.redhat.com/1337855 It seems to be believed that NetworkManager "mostly" does incremental updates right. But it is still not completely correct, will re-dump the whole table in certain cases, and it takes a huge amount of very complicated code to do the accounting and route comparisons. Going back to ovs-vswitchd, it currently dumps routes from all the routing tables. If it will get conflicting routes from multiple tables, the cache will not be useful. The routing cache in userspace is primarily used for checking the egress port for tunneled traffic and this way also detecting link state changes for a tunnel port. For userspace datapath it is used for actual routing of the packet after sending to a native tunnel. With kernel datapath we don't really have a mechanism to know which routing table will actually be used by the kernel after encapsulation, so our lookups on a cache may be incorrect because of this as well. So, unless all the relevant routes are in the standard tables, the lookup in userspace route cache is unreliable. Luckily, most setups are not using any complicated routing in non-standard tables that OVS has to be aware of. It is possible, but unlikely, that standard routing tables are completely empty while some other custom table is not, and all the OVS tunnel traffic is directed to that table. That would be the only scenario where dumping non-standard tables would make sense. But it seems like this kind of setup will likely need a way to tell OVS from which table the routes should be taken, or we'll need to dump routing rules and keep a separate cache for each table, so we can first match on rules and then lookup correct routes in a specific table. I'm not sure if trying to implement all that is justified. For now, stop considering routes from non-standard tables to avoid mixing different tables together and also wasting CPU resources. This fixes a high CPU usage in ovs-vswitchd in case a BGP daemon is running on a same host and in a same network namespace with OVS using its own custom routing table. Unfortunately, there seems to be no way to tell the kernel to send updates only for particular tables. So, we'll still receive and parse all of them. But they will not result in a full cache invalidation in most cases. Linux kernel v4.20 introduced filtering support for RTM_GETROUTE dumps. So, we can make use of it and dump only standard tables when we get a relevant route update. NETLINK_GET_STRICT_CHK has to be enabled on the socket for filtering to work. There is no reason to not enable it by default, if supported. It is not used outside of NETLINK_ROUTE. Fixes: f0e167f ("route-table: Handle route updates more robustly.") Fixes: ea83a2f ("lib: Show tunnel egress interface in ovsdb") Reported-at: openvswitch/ovs-issues#185 Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052091.html Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

igsilya · 2024-03-22T22:46:03Z

@f0o , @ddominet , the patch is applied now to all branches down to 2.17. Will be part of the next set of stable releases. Not sure when distributions will pick those up.

With the change you should be able to run BGP in a separate VRF without significant impact on OVS CPU usage. But running with the main routing table will still be problematic.

Closing this issue for now.

f0o · 2024-03-24T11:16:36Z

Amazing job @igsilya! Thank you so much!

igsilya closed this as completed Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OVS with a full-routing table utilizing 100% cpu #185

OVS with a full-routing table utilizing 100% cpu #185

zejar commented May 20, 2020 •

edited

AswiniViswanathan commented Jul 15, 2021

blp commented Jul 15, 2021

ddominet commented Sep 27, 2023

ddominet commented Sep 27, 2023

zejar commented Sep 27, 2023

igsilya commented Sep 27, 2023

ddominet commented Sep 27, 2023 •

edited

igsilya commented Sep 27, 2023

wilkmar commented Oct 19, 2023 •

edited

f0o commented Mar 14, 2024

ddominet commented Mar 14, 2024 via email

igsilya commented Mar 14, 2024

AswiniViswanathan commented Mar 14, 2024

igsilya commented Mar 14, 2024

AswiniViswanathan commented Mar 14, 2024

igsilya commented Mar 14, 2024 •

edited

AswiniViswanathan commented Mar 14, 2024

f0o commented Mar 14, 2024

igsilya commented Mar 14, 2024

f0o commented Mar 14, 2024

f0o commented Mar 14, 2024

igsilya commented Mar 14, 2024

f0o commented Mar 14, 2024

f0o commented Mar 14, 2024

igsilya commented Mar 15, 2024 •

edited

f0o commented Mar 18, 2024 •

edited

igsilya commented Mar 18, 2024

f0o commented Mar 18, 2024

igsilya commented Mar 18, 2024

f0o commented Mar 19, 2024

ddominet commented Mar 19, 2024 via email

igsilya commented Mar 22, 2024

f0o commented Mar 24, 2024

OVS with a full-routing table utilizing 100% cpu #185

OVS with a full-routing table utilizing 100% cpu #185

Comments

zejar commented May 20, 2020 • edited

AswiniViswanathan commented Jul 15, 2021

blp commented Jul 15, 2021

ddominet commented Sep 27, 2023

ddominet commented Sep 27, 2023

zejar commented Sep 27, 2023

igsilya commented Sep 27, 2023

ddominet commented Sep 27, 2023 • edited

igsilya commented Sep 27, 2023

wilkmar commented Oct 19, 2023 • edited

f0o commented Mar 14, 2024

ddominet commented Mar 14, 2024 via email

igsilya commented Mar 14, 2024

AswiniViswanathan commented Mar 14, 2024

igsilya commented Mar 14, 2024

AswiniViswanathan commented Mar 14, 2024

igsilya commented Mar 14, 2024 • edited

AswiniViswanathan commented Mar 14, 2024

f0o commented Mar 14, 2024

igsilya commented Mar 14, 2024

f0o commented Mar 14, 2024

f0o commented Mar 14, 2024

igsilya commented Mar 14, 2024

f0o commented Mar 14, 2024

f0o commented Mar 14, 2024

igsilya commented Mar 15, 2024 • edited

f0o commented Mar 18, 2024 • edited

igsilya commented Mar 18, 2024

f0o commented Mar 18, 2024

igsilya commented Mar 18, 2024

f0o commented Mar 19, 2024

ddominet commented Mar 19, 2024 via email

igsilya commented Mar 22, 2024

f0o commented Mar 24, 2024

zejar commented May 20, 2020 •

edited

ddominet commented Sep 27, 2023 •

edited

wilkmar commented Oct 19, 2023 •

edited

igsilya commented Mar 14, 2024 •

edited

igsilya commented Mar 15, 2024 •

edited

f0o commented Mar 18, 2024 •

edited