Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSSM-3276: Add metallb installer for clusters without external IPs #580

Merged
merged 9 commits into from
Jun 1, 2023

Conversation

jewertow
Copy link
Member

No description provided.

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented May 29, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the size/L label May 29, 2023
pkg/metallb/install.go Outdated Show resolved Hide resolved
Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
@jewertow jewertow marked this pull request as ready for review May 30, 2023 09:34
@@ -34,7 +34,7 @@ func ExecuteWithEnvAndInput(t test.TestHelper, env []string, cmd string, input s
t.T().Helper()
output, err := execShellCommand(cmd, env, input)
if err != nil {
t.Fatalf("Command failed: %s\n%serror: %s", cmd, appendNewLine(output), err)
t.Logf("Command failed: %s\n%serror: %s", cmd, appendNewLine(output), err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now prevents tests from failing when the command fails. If you need to check the command's output even if the command fails, we need to find a better way to achieve this. We do have some other places where we expect the command to fail, so this is clearly a valid use-case. We just need to find a good way to tell this function whether to fail on command failures or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command is useless for now. I don't understand it's purpose. It's not the first time when I struggle with fatal errors that cannot be retried.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be useless? It's being used in basically all tests. In 99% of cases, if the command fails, the test should fail immediately and not on the next assertion. Not failing fast is one of the worst things in software, as it almost always leads to a lot of wasted time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because for some errors kubectl returns error and for others it does not. For example, when I execute kubectl -n metallb-system... and the namespace does not exist, I get fatal error. If the namespace exist, but a deployment does not exist, I don't get error. 🤯

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not able to verify all the tests for now, so I will revert this change and will use exec.Command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=== Failed
=== FAIL: pkg/tests/ossm-federation  (0.00s)
    test_helper_setup.go:60: Check if MetalLB operator already exists
    test_helper_setup.go:64: FATAL: Command failed: oc -n metallb-system get deployments/metallb-operator-controller-manager
        Error from server (NotFound): namespaces "metallb-system" not found
        error: exit status 1
panic: FailNow

Copy link
Contributor

@luksa luksa May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the shell.Execute("command || true") pattern for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@luksa luksa self-requested a review May 30, 2023 09:49
Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
@jewertow jewertow requested a review from fjglira May 30, 2023 15:54
Copy link
Collaborator

@fjglira fjglira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm my only question is: This oeprator version will work with OCP version from 4.9 to 4.13? Or we need to add something to validate that this works first against every version supported that we have?

func checkIfMetalLbOperatorExists(t test.TestHelper) bool {
t.Log("Check if MetalLB operator already exists")
// pattern "cmd || true" is used to avoid getting fatal error
output := shell.Execute(t, fmt.Sprintf("oc get deployments -n %s metallb-operator-controller-manager || true", ns.MetalLB))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output := shell.Execute(t, fmt.Sprintf("oc get deployments -n %s metallb-operator-controller-manager || true", ns.MetalLB))
output := shell.Executef(t, "oc get deployments -n %s metallb-operator-controller-manager || true", ns.MetalLB)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with oc.ResourceExists().

func checkIfMetalLbControllerExists(t test.TestHelper) bool {
t.Log("Check if MetalLB controller already exists")
// pattern "cmd || true" is used to avoid getting fatal error
output := shell.Execute(t, fmt.Sprintf("oc get deployments -n %s controller || true", ns.MetalLB))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output := shell.Execute(t, fmt.Sprintf("oc get deployments -n %s controller || true", ns.MetalLB))
output := shell.Executef(t, "oc get deployments -n %s controller || true", ns.MetalLB)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with oc.ResourceExists().

func checkIfIPAddressPoolExists(t test.TestHelper) bool {
t.Log("Check if MetalLB controller already exists")
// pattern "cmd || true" is used to avoid getting fatal error
output := shell.Execute(t, fmt.Sprintf("oc get ipaddresspools -n %s worker-internal-ips || true", ns.MetalLB))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output := shell.Execute(t, fmt.Sprintf("oc get ipaddresspools -n %s worker-internal-ips || true", ns.MetalLB))
output := shell.Executef(t, "oc get ipaddresspools -n %s worker-internal-ips || true", ns.MetalLB)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with oc.ResourceExists().

oc.ApplyString(t, ns.MetalLB, metallbOperator)
retry.UntilSuccess(t, func(t test.TestHelper) {
// pattern "cmd || true" is used to avoid getting fatal error
shell.Execute(t, fmt.Sprintf("oc get deployments -n %s metallb-operator-controller-manager || true", ns.MetalLB),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work properly when the oc passed to this function is not the default oc.

And this is exactly why I said in the other PR that using shell.Execute to run a custom oc command is not okay.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I didn't find this, because I was testing missing operator, MetalLB and IPAddressPool on the cluster for which default kubeconfig is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to pass kubeconfigs to each of these commands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about creating oc.ResourceExists(t, ns, kind, name)? Seems like we need this in several places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jewertow
Copy link
Member Author

This oeprator version will work with OCP version from 4.9 to 4.13?

I guess it will work only on 4.12... I was only testing it on 4.12 for now, but I am aware of this issue. I'm going to check versions for other OCP versions tomorrow, because I have to download crc 4.9-4.13 and check what metallb versions are available. Then I will add startingCSV: {{ .MetalLbCsv }} in the metallb-operator.yaml

@luksa
Copy link
Contributor

luksa commented May 30, 2023

BTW why do we even need a LoadBalancer for gateway api tests? Can't we use the cluster IP of the service created by the Gateway?

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
@jewertow
Copy link
Member Author

BTW why do we even need a LoadBalancer for gateway api tests?

Because the deployment controller creates LoadBalancer services for Gateway objects by default.

Can't we use the cluster IP of the service created by the Gateway?

I think we can, but we need load balancer anyway. I also believe that LoadBalancer services are more future proof, as router will be deprecated at some point in the future, so it's worth to test it.

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
@jewertow jewertow requested review from luksa and fjglira May 31, 2023 20:48
oc.ApplyTemplateString(t, ns.MetalLB, metallbOperator, map[string]string{"Version": metallbVersion})
retry.UntilSuccess(t, func(t test.TestHelper) {
if !oc.ResourceExists(t, ns.MetalLB, "deployments", "metallb-operator-controller-manager") {
t.Log("metallb-operator-controller-manager not found - waiting until exists")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this retry work is need to use instead of t.Log a t.Fatal because the retry needs a failure to make the retry, right now will log this only and continue with the rest of the steps

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

oc.ApplyString(t, ns.MetalLB, metallb)
retry.UntilSuccess(t, func(t test.TestHelper) {
if !oc.ResourceExists(t, ns.MetalLB, "deployments", "controller") {
t.Log("MetalLB controller not found - waiting until exists")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, to make this retry work is need to use t.Fatal to force the failure and make the retry

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

func createAddressPool(t test.TestHelper, oc oc.OC) {
t.Log("Check if MetalLB controller already exists")
if oc.ResourceExists(t, ns.MetalLB, "ipaddresspools", "worker-internal-ips") {
t.Log("IPAddressPool already exists - skip applying IPAddressPool")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, to make this retry work is need to use t.Fatal to force the failure and make the retry

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not retried :)

Signed-off-by: Jacek Ewertowski <jewertow@redhat.com>
@jewertow jewertow requested a review from fjglira June 1, 2023 10:19
@openshift-merge-robot openshift-merge-robot merged commit 8cbbfe0 into maistra:main Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants