Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBO Unit test flaky #3506

Closed
tssurya opened this issue Mar 28, 2023 · 6 comments · Fixed by #3651
Closed

HBO Unit test flaky #3506

tssurya opened this issue Mar 28, 2023 · 6 comments · Fixed by #3651
Assignees

Comments

@tssurya
Copy link
Member

tssurya commented Mar 28, 2023

2023-03-28T08:17:35.8585452Z �[91m�[1m• Failure [2.045 seconds]�[0m
2023-03-28T08:17:35.8586006Z Hybrid Overlay Node Linux Operations
2023-03-28T08:17:35.8586995Z �[90m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/hybrid-overlay/pkg/controller/node_linux_test.go:264�[0m
2023-03-28T08:17:35.8587825Z   �[91m�[1msets up tunnels for Windows nodes [It]�[0m
2023-03-28T08:17:35.8589731Z   �[90m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/testing/testing.go:18�[0m
2023-03-28T08:17:35.8590614Z 
2023-03-28T08:17:35.8592552Z   �[91mTimed out after 2.001s.
2023-03-28T08:17:35.8593016Z   Executed commands do not match expected commands!
2023-03-28T08:17:35.8593725Z   Executed unexpected command /fake-bin/ovs-ofctl dump-flows --no-stats br-ext table=20
2023-03-28T08:17:35.8594838Z   
2023-03-28T08:17:35.8595722Z   [00]   /fake-bin/ovn-nbctl --timeout=15 get logical_switch mynode other-config:subnet
2023-03-28T08:17:35.8596788Z   [01]   /fake-bin/ovs-vsctl --timeout=15 --may-exist add-br br-ext -- set Bridge br-ext fail_mode=secure -- set Interface br-ext mtu_request=1400
2023-03-28T08:17:35.8597873Z   [02]   /fake-bin/ovs-vsctl --timeout=15 --if-exists get interface br-ext mac_in_use
2023-03-28T08:17:35.8598769Z   [03]   /fake-bin/ovs-vsctl --timeout=15 set bridge br-ext other-config:hwaddr=10:11:12:13:14:15
2023-03-28T08:17:35.8600279Z   [04]   /fake-bin/ovs-vsctl --timeout=15 --may-exist add-port br-int int -- --may-exist add-port br-ext ext -- set Interface int type=patch options:peer=ext external-ids:iface-id=int-mynode -- set Interface ext type=patch options:peer=int
2023-03-28T08:17:35.8601488Z   [05]   /fake-bin/ovs-vsctl --timeout=15 --may-exist add-port br-ext ext-vxlan -- set interface ext-vxlan type=vxlan options:remote_ip="flow" options:key="flow" options:dst_port=4789
2023-03-28T08:17:35.8602174Z   [06]   /fake-bin/ovs-ofctl dump-flows --no-stats br-ext table=20
2023-03-28T08:17:35.8602733Z   [07]   /fake-bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows br-ext -
2023-03-28T08:17:35.8603264Z   [08]   /fake-bin/ovs-ofctl dump-flows --no-stats br-ext table=20
2023-03-28T08:17:35.8603809Z   [09]   /fake-bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows br-ext -
2023-03-28T08:17:35.8604335Z   [10] ! /fake-bin/ovs-ofctl dump-flows --no-stats br-ext table=20
2023-03-28T08:17:35.8605368Z   [11] ! /fake-bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows br-ext -
2023-03-28T08:17:35.8605934Z   
2023-03-28T08:17:35.8606146Z   Expected
2023-03-28T08:17:35.8606642Z       <bool>: false
2023-03-28T08:17:35.8606913Z   to be true�[0m
2023-03-28T08:17:35.8607064Z 
2023-03-28T08:17:35.8607572Z   /home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/hybrid-overlay/pkg/controller/node_linux_test.go:591

See https://github.com/ovn-org/ovn-kubernetes/actions/runs/4540645068/jobs/8001816447?pr=3505

@zeeke
Copy link
Contributor

zeeke commented Apr 12, 2023

Another instance of the flake here [1]

2023-04-11T20:06:20.8505970Z �[91m�[1m• Failure [2.032 seconds]�[0m
2023-04-11T20:06:20.8506496Z Hybrid Overlay Node Linux Operations
2023-04-11T20:06:20.8507226Z �[90m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/hybrid-overlay/pkg/controller/node_linux_test.go:264�[0m
2023-04-11T20:06:20.8507859Z   �[91m�[1msets up local linux pod [It]�[0m
2023-04-11T20:06:20.8508552Z   �[90m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/testing/testing.go:18�[0m
2023-04-11T20:06:20.8508910Z 
2023-04-11T20:06:20.8509198Z   �[91mTimed out after 2.001s.
2023-04-11T20:06:20.8509643Z   Executed commands do not match expected commands!
2023-04-11T20:06:20.8510316Z   Executed unexpected command /fake-bin/ovs-ofctl dump-flows --no-stats br-ext table=20
2023-04-11T20:06:20.8510720Z   
2023-04-11T20:06:20.8511365Z   [00]   /fake-bin/ovn-nbctl --timeout=15 get logical_switch mynode other-config:subnet
2023-04-11T20:06:20.8512141Z   [01]   /fake-bin/ovs-vsctl --timeout=15 --may-exist add-br br-ext -- set Bridge br-ext fail_mode=secure -- set Interface br-ext mtu_request=1400
2023-04-11T20:06:20.8512829Z   [02]   /fake-bin/ovs-vsctl --timeout=15 --if-exists get interface br-ext mac_in_use
2023-04-11T20:06:20.8513810Z   [03]   /fake-bin/ovs-vsctl --timeout=15 set bridge br-ext other-config:hwaddr=10:11:12:13:14:15
2023-04-11T20:06:20.8515596Z   [04]   /fake-bin/ovs-vsctl --timeout=15 --may-exist add-port br-int int -- --may-exist add-port br-ext ext -- set Interface int type=patch options:peer=ext external-ids:iface-id=int-mynode -- set Interface ext type=patch options:peer=int
2023-04-11T20:06:20.8516925Z   [05]   /fake-bin/ovs-vsctl --timeout=15 --may-exist add-port br-ext ext-vxlan -- set interface ext-vxlan type=vxlan options:remote_ip="flow" options:key="flow" options:dst_port=4789
2023-04-11T20:06:20.8517694Z   [06]   /fake-bin/ovs-ofctl dump-flows --no-stats br-ext table=20
2023-04-11T20:06:20.8518768Z   [07]   /fake-bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows br-ext -
2023-04-11T20:06:20.8519293Z   [08]   /fake-bin/ovs-ofctl dump-flows --no-stats br-ext table=20
2023-04-11T20:06:20.8519796Z   [09]   /fake-bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows br-ext -
2023-04-11T20:06:20.8520263Z   [10] ! /fake-bin/ovs-ofctl dump-flows --no-stats br-ext table=20
2023-04-11T20:06:20.8520725Z   [11] ! /fake-bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows br-ext -
2023-04-11T20:06:20.8520994Z   
2023-04-11T20:06:20.8521182Z   Expected
2023-04-11T20:06:20.8521397Z       <bool>: false
2023-04-11T20:06:20.8521648Z   to be true�[0m
2023-04-11T20:06:20.8521777Z 
2023-04-11T20:06:20.8522262Z   /home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/hybrid-overlay/pkg/controller/node_linux_test.go:514

[1] https://github.com/ovn-org/ovn-kubernetes/actions/runs/4670010801/jobs/8269215240?pr=3328

@tssurya
Copy link
Member Author

tssurya commented Apr 14, 2023

@tssurya
Copy link
Member Author

tssurya commented May 8, 2023

https://github.com/ovn-org/ovn-kubernetes/actions/runs/4913009061/jobs/8772692637?pr=3575 another flake

2023-05-08T08:40:48.3520181Z   [00]   /fake-bin/ovn-nbctl --timeout=15 get logical_switch mynode other-config:subnet
2023-05-08T08:40:48.3521125Z   [01]   /fake-bin/ovs-vsctl --timeout=15 --may-exist add-br br-ext -- set Bridge br-ext fail_mode=secure -- set Interface br-ext mtu_request=1400
2023-05-08T08:40:48.3522311Z   [02]   /fake-bin/ovs-vsctl --timeout=15 --if-exists get interface br-ext mac_in_use
2023-05-08T08:40:48.3523172Z   [03]   /fake-bin/ovs-vsctl --timeout=15 set bridge br-ext other-config:hwaddr=10:11:12:13:14:15
2023-05-08T08:40:48.3524581Z   [04]   /fake-bin/ovs-vsctl --timeout=15 --may-exist add-port br-int int -- --may-exist add-port br-ext ext -- set Interface int type=patch options:peer=ext external-ids:iface-id=int-mynode -- set Interface ext type=patch options:peer=int
2023-05-08T08:40:48.3525830Z   [05]   /fake-bin/ovs-vsctl --timeout=15 --may-exist add-port br-ext ext-vxlan -- set interface ext-vxlan type=vxlan options:remote_ip="flow" options:key="flow" options:dst_port=4789
2023-05-08T08:40:48.3526863Z   [06]   /fake-bin/ovs-ofctl dump-flows --no-stats br-ext table=20
2023-05-08T08:40:48.3527692Z   [07]   /fake-bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows br-ext -
2023-05-08T08:40:48.3528646Z   [08]   /fake-bin/ovs-ofctl dump-flows --no-stats br-ext table=20
2023-05-08T08:40:48.3529381Z   [09]   /fake-bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows br-ext -
2023-05-08T08:40:48.3530278Z   [10] ! /fake-bin/ovs-ofctl dump-flows --no-stats br-ext table=20
2023-05-08T08:40:48.3530970Z   [11] ! /fake-bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows br-ext -
2023-05-08T08:40:48.3531657Z   
2023-05-08T08:40:48.3532160Z   Expected
2023-05-08T08:40:48.3532771Z       <bool>: false
2023-05-08T08:40:48.3533231Z   to be true�[0m
2023-05-08T08:40:48.3533747Z 

@tssurya
Copy link
Member Author

tssurya commented May 23, 2023

/assign @flavio-fernandes

@flavio-fernandes flavio-fernandes self-assigned this May 23, 2023
@flavio-fernandes
Copy link
Contributor

flavio-fernandes commented May 23, 2023

Found an easy way of reproducing this issue!
Simply put a break-point here:

annotations := createNodeAnnotationsForSubnet(thisNodeSubnet)

Comment out OR duplicate call to addSyncFlows:
And start debugging Unit Test using this launch.json file below.
Wait a few seconds; it will be enough time to make the extra commands happen and cause the test to fail.

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Debug OVN Cluster Tests test function",
            "type": "go",
            "request": "launch",
            "mode": "test",
            "program": "./hybrid-overlay/pkg/controller",
            "buildFlags": "-mod=vendor",
            "args": [
                "-test.run",
                "TestHybridOverlayControllerSuite",
                "-ginkgo.focus",
                "node updates itself, windows tunnel and pod flows when distributed router IP is updated",
            ],

        },
    ]
}

flavio-fernandes added a commit to flavio-fernandes/ovn-kubernetes that referenced this issue May 24, 2023
Add support in FakeExec to handle cases where the expected command
may happen more than once. This was causing CI flakes for a
Hybrid Overlay tests where periodically generated events would render
duplicates that were not accounted for, but were benign to the test
itself.

Reported-at: https://issues.redhat.com/browse/OCPBUGS-13979
Fixes: ovn-org#3506
Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
flavio-fernandes added a commit to flavio-fernandes/ovn-kubernetes that referenced this issue May 24, 2023
Add support in FakeExec to handle cases where the expected command
may happen more than once. This was causing CI flakes for
Hybrid Overlay tests where periodically generated events would render
duplicates that were not accounted for, but were benign to the test
itself.

Reported-at: https://issues.redhat.com/browse/OCPBUGS-13979
Fixes: ovn-org#3506
Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
flavio-fernandes added a commit to flavio-fernandes/ovn-kubernetes that referenced this issue May 24, 2023
The test had to be changed to account for additional fake commands that
may happen during its execution. More specifically, xxx would invoke
yyy, which invokes the following commands:

    ZZZ

Reported-at: https://issues.redhat.com/browse/OCPBUGS-13979
Fixes: ovn-org#3506
Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
flavio-fernandes added a commit to flavio-fernandes/ovn-kubernetes that referenced this issue May 24, 2023
The test had to be changed to account for additional fake commands that
may happen during its execution. More specifically, an async callback
to ReplaceOFFlows() would invoke these 2 fake commands during the
test:

    /fake-bin/ovs-ofctl dump-flows --no-stats br-ext table=20
    /fake-bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows br-ext -

The origin of this callback is located in:

    func (n *NodeController) RunFlowSync(stopCh <-chan struct{})

More details on the stack trace is available in:

    ovn-org#3506

Reported-at: https://issues.redhat.com/browse/OCPBUGS-13979
Fixes: ovn-org#3506
Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
@flavio-fernandes
Copy link
Contributor

Details on the issue that caused this flake is located here: #3610 (comment)

flavio-fernandes added a commit to flavio-fernandes/ovn-kubernetes that referenced this issue May 24, 2023
The test had to be changed to account for additional fake commands that
may happen during its execution. More specifically, an async callback
to ReplaceOFFlows() would invoke these 2 fake commands during the
test:

    /fake-bin/ovs-ofctl dump-flows --no-stats br-ext table=20
    /fake-bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows br-ext -

The origin of this callback is located in:

    func (n *NodeController) RunFlowSync(stopCh <-chan struct{})

More details on the stack trace is available in:

    ovn-org#3506

Reported-at: https://issues.redhat.com/browse/OCPBUGS-13979
Fixes: ovn-org#3506
Signed-off-by: Flavio Fernandes <flaviof@redhat.com>
JacobTanenbaum added a commit to JacobTanenbaum/ovn-kubernetes that referenced this issue Jun 6, 2023
testing

make the timeout for the periodic syncing of the ovs flows so high
during testing that there will be no impact and this should eliminate
testing flakes

Reported as: https://issues.redhat.com/browse/OCPBUGS-13979
Fixes: ovn-org#3506

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
JacobTanenbaum added a commit to JacobTanenbaum/ovn-kubernetes that referenced this issue Jun 6, 2023
testing

make the timeout for the periodic syncing of the ovs flows so high
during testing that there will be no impact and this should eliminate
testing flakes

Reported as: https://issues.redhat.com/browse/OCPBUGS-13979
Fixes: ovn-org#3506

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
JacobTanenbaum added a commit to JacobTanenbaum/ovn-kubernetes that referenced this issue Jun 8, 2023
testing

make the timeout for the periodic syncing of the ovs flows so high
during testing that there will be no impact and this should eliminate
testing flakes

Reported as: https://issues.redhat.com/browse/OCPBUGS-13979
Fixes: ovn-org#3506

Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment