Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.13] OCPBUGS-28253: Use DSR load balancing in kube-proxy #2042

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 3 additions & 2 deletions docs/byoh-instance-pre-requisites.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@

The following pre-requisites must be fulfilled in order to add a Windows BYOH node.
* The instance must be on the same network as the Linux worker nodes in the cluster.
* Port 22 must be open and running [an SSH server](https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_install_firstuse).
* Port 10250 must be open in order for log collection to function.
* Port 22 must allow inbound TCP traffic and be running [an SSH server](https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_install_firstuse).
* Port 9182 must allow inbound TCP traffic in order for node and pod metrics collection to function.
* Port 10250 must allow inbound TCP traffic in order for log collection to function.
* An administrator user is present with the [private key used in the secret](/README.md#create-a-private-key-secret) set as an authorized SSH key.
* The hostname of the instance must follow the [RFC 1123](https://datatracker.ietf.org/doc/html/rfc1123) DNS label standard:
* Contain only lowercase alphanumeric characters or '-'.
Expand Down
6 changes: 3 additions & 3 deletions docs/vsphere-golden-image.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,13 +93,13 @@ In case no firewall rule exist, you must create it by running the following Powe
New-NetFirewallRule -DisplayName 'OpenSSH Server (sshd)' -LocalPort 22 -Enabled True -Direction Inbound -Protocol TCP -Action Allow
```

## 4. Set up incoming connection for container logs
## 4. Set up incoming connection for container logs and metrics

Create a new firewall rule in the Windows VM to allow incoming connections for container logs, usually
on TCP port `10250` by running the following PowerShell command:
Create new firewall rules in the Windows VM to allow incoming connections for container logs and metrics:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would it be good to still call out the "usually on TCP port x" here?


```powershell
New-NetFirewallRule -DisplayName "ContainerLogsPort" -LocalPort 10250 -Enabled True -Direction Inbound -Protocol TCP -Action Allow -EdgeTraversalPolicy Allow
New-NetFirewallRule -DisplayName "WindowsExporter" -LocalPort 9182 -Enabled True -Direction Inbound -Protocol TCP -Action Allow -EdgeTraversalPolicy Allow
```

## 5. Install Windows OS updates
Expand Down
3 changes: 2 additions & 1 deletion docs/vsphere_ci/scripts/install-firewall-rules.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
# USAGE
# ./install-firewall-rules.ps1

# create firewall rule to allow Container Logs on port 10250
# Allow incoming connections for container logs and metrics
New-NetFirewallRule -DisplayName "ContainerLogsPort" -LocalPort 10250 -Enabled True -Direction Inbound -Protocol TCP -Action Allow -EdgeTraversalPolicy Allow
New-NetFirewallRule -DisplayName "WindowsExporter" -LocalPort 9182 -Enabled True -Direction Inbound -Protocol TCP -Action Allow -EdgeTraversalPolicy Allow

# success
exit 0
2 changes: 2 additions & 0 deletions pkg/secrets/secrets.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ func GenerateUserData(platformType oconfig.PlatformType, publicKey ssh.PublicKey

// generateUserDataWithPubKey returns the Windows user data for the given pubKey
func generateUserDataWithPubKey(pubKey string) string {
windowsExporterPort := "9182"
return `function Get-RandomPassword {
Add-Type -AssemblyName 'System.Web'
return [System.Web.Security.Membership]::GeneratePassword(16, 2)
Expand All @@ -82,6 +83,7 @@ func generateUserDataWithPubKey(pubKey string) string {
$firewallRuleName = "ContainerLogsPort"
$containerLogsPort = "10250"
New-NetFirewallRule -DisplayName $firewallRuleName -Direction Inbound -Action Allow -Protocol TCP -LocalPort $containerLogsPort -EdgeTraversalPolicy Allow
New-NetFirewallRule -DisplayName "WindowsExporter" -Direction Inbound -Action Allow -Protocol TCP -LocalPort "` + windowsExporterPort + `" -EdgeTraversalPolicy Allow
Set-Service -Name sshd -StartupType 'Automatic'
Start-Service sshd
(Get-Content -path C:\ProgramData\ssh\sshd_config) | ForEach-Object {
Expand Down
4 changes: 2 additions & 2 deletions pkg/services/services.go
Original file line number Diff line number Diff line change
Expand Up @@ -129,9 +129,9 @@ func hybridOverlayConfiguration(vxlanPort string, debug bool) servicescm.Service
// kubeProxyConfiguration returns the Service definition for kube-proxy
func kubeProxyConfiguration(debug bool) servicescm.Service {
sanitizedSubnetAnnotation := strings.ReplaceAll(nodeconfig.HybridOverlaySubnet, ".", "\\.")
cmd := fmt.Sprintf("%s -log-file=%s %s --windows-service --proxy-mode=kernelspace --feature-gates=WinOverlay=true "+
cmd := fmt.Sprintf("%s -log-file=%s %s --windows-service --proxy-mode=kernelspace --feature-gates=WinOverlay=true,WinDSR=true "+
"--hostname-override=NODE_NAME --kubeconfig=%s --cluster-cidr=NODE_SUBNET "+
"--network-name=%s --source-vip=ENDPOINT_IP --enable-dsr=false", windows.KubeLogRunnerPath, windows.KubeProxyLog,
"--network-name=%s --source-vip=ENDPOINT_IP --enable-dsr=true", windows.KubeLogRunnerPath, windows.KubeProxyLog,
windows.KubeProxyPath, windows.KubeconfigPath, windows.OVNKubeOverlayNetwork)
// Set log level
cmd = fmt.Sprintf("%s %s", cmd, klogVerbosityArg(debug))
Expand Down
15 changes: 8 additions & 7 deletions test/e2e/create_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,6 @@ const (
// vmConfigurationTime is the maximum amount of time expected for a Windows VM to be fully configured and ready for WMCO
// after the hardware is provisioned.
vmConfigurationTime = 10 * time.Minute

machineApproverNamespace = "openshift-cluster-machine-approver"
machineApproverDeployment = "machine-approver"
machineApproverPodSelector = "app=machine-approver"
)

func creationTestSuite(t *testing.T) {
Expand Down Expand Up @@ -339,9 +335,7 @@ func (tc *testContext) disableClusterMachineApprover() error {
// Scale the Cluster Machine Approver Deployment to 0
// This is required for testing BYOH CSR approval feature so that BYOH instances
// CSR's are not approved by Cluster Machine Approver
expectedPodCount := int32(0)
return tc.scaleDeployment(machineApproverNamespace, machineApproverDeployment, machineApproverPodSelector,
&expectedPodCount)
return tc.scaleMachineApprover(0)
}

// setPowerShellDefaultShell changes the instance backed by the given Machine to have a default SSH shell of PowerShell
Expand Down Expand Up @@ -591,3 +585,10 @@ func (tc *testContext) scaleDeployment(namespace, name, selector string, expecte
}
return nil
}

// scaleMachineApprover scales the machine-approver deployment to the given replica count
func (tc *testContext) scaleMachineApprover(replicas int) error {
replicaCount := int32(replicas)
return tc.scaleDeployment("openshift-cluster-machine-approver", "machine-approver", "app=machine-approver",
&replicaCount)
}
2 changes: 1 addition & 1 deletion test/e2e/network_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -500,7 +500,7 @@ func (tc *testContext) createWindowsServerDeployment(name string, command []stri
deploymentsClient := tc.client.K8s.AppsV1().Deployments(tc.workloadNamespace)
replicaCount := int32(1)
// affinity being nil is a hint that the caller does not care which nodes the pods are deployed to
if affinity == nil {
if affinity == nil && volumes == nil {
replicaCount = int32(3)
}
windowsServerImage := tc.getWindowsServerContainerImage()
Expand Down
46 changes: 34 additions & 12 deletions test/e2e/storage_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package e2e

import (
"context"
"fmt"
"log"
"testing"

Expand All @@ -10,6 +11,7 @@ import (
"github.com/stretchr/testify/require"
core "k8s.io/api/core/v1"
meta "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/types"

"github.com/openshift/windows-machine-config-operator/pkg/metadata"
Expand Down Expand Up @@ -59,20 +61,9 @@ func testStorage(t *testing.T) {
}()
}
pvcVolumeSource := &core.PersistentVolumeClaimVolumeSource{ClaimName: pvc.GetName()}
selectedNode := &gc.allNodes()[0]
affinity, err := getAffinityForNode(selectedNode)
require.NoError(t, err)
if inTreeUpgrade {
patch, err := metadata.GenerateAddPatch(map[string]string{storageTestLabel: "true"}, nil)
require.NoError(t, err)
_, err = tc.client.K8s.CoreV1().Nodes().Patch(context.TODO(), selectedNode.GetName(), types.JSONPatchType, patch,
meta.PatchOptions{})
require.NoError(t, err, "error labeling node for upgrade test")
}

// The deployment will not come to ready if the volume is not able to be attached to the pod. If the deployment is
// successful, storage is working as expected.
winServerDeployment, err := tc.deployWindowsWebServer("win-webserver-storage-test", affinity, pvcVolumeSource)
winServerDeployment, err := tc.deployWindowsWebServer("win-webserver-storage-test", nil, pvcVolumeSource)
assert.NoError(t, err)
if err == nil && !skipWorkloadDeletion {
defer func() {
Expand All @@ -82,4 +73,35 @@ func testStorage(t *testing.T) {
}
}()
}
if inTreeUpgrade {
err = tc.labelPodsNode(winServerDeployment.Spec.Selector.MatchLabels, map[string]string{storageTestLabel: "true"})
require.NoError(t, err)
}
}

// labelPodNode labels the Node which has the pod with matchLabels scheduled to it. Throws an error if more than one pod
// matches the labels.
func (tc *testContext) labelPodsNode(matchLabels map[string]string, labelsToApply map[string]string) error {
if matchLabels == nil {
return fmt.Errorf("nill matchLabels")
}
podList, err := tc.client.K8s.CoreV1().Pods(tc.workloadNamespace).List(context.TODO(), meta.ListOptions{
LabelSelector: labels.Set(matchLabels).String()})
if err != nil {
return fmt.Errorf("error listing pods: %w", err)
}
if len(podList.Items) != 1 {
return fmt.Errorf("expected 1 matching pod, instead found %d: %v", len(podList.Items), podList.Items)
}
nodeName := podList.Items[0].Spec.NodeName
if nodeName == "" {
return fmt.Errorf("pod not scheduled to a Node")
}
patch, err := metadata.GenerateAddPatch(labelsToApply, nil)
if err != nil {
return err
}
_, err = tc.client.K8s.CoreV1().Nodes().Patch(context.TODO(), nodeName, types.JSONPatchType, patch,
meta.PatchOptions{})
return err
}
6 changes: 6 additions & 0 deletions test/e2e/upgrade_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,12 @@ func (tc *testContext) deployWindowsWorkloadAndTester() (func(), error) {
func TestUpgrade(t *testing.T) {
tc, err := NewTestContext()
require.NoError(t, err)

// In the case that upgrading a Machine node require the deletion of the VM, bootstrap CSRs will need to be approved
// Ensure the machine approver is scaled, as it may not be depending on the order of tests ran
err = tc.scaleMachineApprover(1)
require.NoError(t, err)

err = tc.waitForConfiguredWindowsNodes(int32(numberOfMachineNodes), false, false)
require.NoError(t, err, "timed out waiting for Windows Machine nodes")
err = tc.waitForConfiguredWindowsNodes(int32(numberOfBYOHNodes), false, true)
Expand Down
6 changes: 2 additions & 4 deletions test/e2e/validation_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -766,10 +766,8 @@ func (tc *testContext) testCSRApproval(t *testing.T) {
}
}

// Scale the Cluster Machine Approver deployment back to 1.
expectedPodCount := int32(1)
err := tc.scaleDeployment(machineApproverNamespace, machineApproverDeployment, machineApproverPodSelector,
&expectedPodCount)
// Revert changes to the cluster machine approver
err := tc.scaleMachineApprover(1)
require.NoError(t, err, "failed to scale up Cluster Machine Approver pods")
}

Expand Down