Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ofctrl: Fix the assert seen when flood removing flows.
In one of the scaled deployments, ovn-controller is asserting with the below stack trace *** (gdb) bt 0 raise () from /lib64/libc.so.6 1 abort () from /lib64/libc.so.6 2 ovs_abort_valist ("%s: assertion %s failed in %s()") at lib/util.c:419 3 vlog_abort_valist ("%s: assertion %s failed in %s()") at lib/vlog.c:1249 4 vlog_abort ("%s: assertion %s failed in %s()") at lib/vlog.c:1263 5 ovs_assert_failure (where="controller/ofctrl.c:1198", function="flood_remove_flows_for_sb_uuid", condition="ovs_list_is_empty(&f->list_node)") at lib/util.c:86 6 flood_remove_flows_for_sb_uuid (sb_uuid=...538, flood_remove_nodes=...ed0) at controller/ofctrl.c:1205 7 flood_remove_flows_for_sb_uuid (sb_uuid=...898, flood_remove_nodes=...ed0) at controller/ofctrl.c:1230 8 flood_remove_flows_for_sb_uuid (sb_uuid=...bf0, flood_remove_nodes=...ed0) at controller/ofctrl.c:1230 9 ofctrl_flood_remove_flows (flood_remove_nodes=...ed0) at controller/ofctrl.c:1250 10 lflow_handle_changed_ref (ref_type=REF_TYPE_PORTGROUP, ref_name= "5564_pg_64...bac") at controller/lflow.c:612 11 _flow_output_resource_ref_handler (ref_type=REF_TYPE_PORTGROUP) at controller/ovn-controller.c:2181 12 engine_compute () at lib/inc-proc-eng.c:306 13 engine_run_node (recompute_allowed=true) at lib/inc-proc-eng.c:352 14 engine_run (recompute_allowed=true) at lib/inc-proc-eng.c:377 15 main () at controller/ovn-controller.c:2794 *** This assertion is seen when a port group gets updated and it is referenced by many logical flows (with conj actions). The function ofctrl_flood_remove_flows(), calls flood_remove_flows_for_sb_uuid() for each sb uuid in the hmap - flood_remove_nodes using HMAP_FOR_EACH (flood_remove_nodes). flood_remove_flows_for_sb_uuid() also takes the hmap 'flood_remove_nodes' as an argument and it inserts few items into it when it has to call itself recursively. When an item is inserted, its possible that the hmap may get expanded. And if this happens, the HMAP_FOR_EACH () skips few entries causing some of the desired flows not getting cleared. Later when ofctrl_add_or_append_flow() is called, there would be multiple 'struct sb_flow_ref' references for the same desired flow. And this causes the above assertion later when the same port group gets updated. This patch fixes this issue by cloning the hmap 'flood_remove_nodes' and using it to iterate the flood remove nodes. Also a test case is added to cover this scenario. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1928012 Fixes: 580aea7 ("ovn-controller: Fix conjunction handling with incremental processing.") Suggested-by: Ilya Maximetes <i.maximets@ovn.org> Acked-by: Ilya Maximetes <i.maximets@ovn.org> Signed-off-by: Numan Siddique <numans@ovn.org>
- Loading branch information