Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: apiserver.k8s.io/v1beta1
kind: EgressSelectorConfiguration
egressSelections:
- name: "cluster"
connection:
proxyProtocol: "HTTPConnect"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: Any reasons to choose HTTPConnect over gRPC, which is should be (theoretically) faster? From docs, it said 👇

# This controls the protocol between the API Server and the Konnectivity
# server. Supported values are "GRPC" and "HTTPConnect". There is no
# end user visible difference between the two modes. You need to set the
# Konnectivity server to work in the same mode.
proxyProtocol: GRPC

My guess is that it's fine to use gRPC? If so, we need to adjust --mode=grpc in konnectivity-server-pod.yaml

transport:
uds:
udsName: "/etc/kubernetes/config/konnectivity-server.socket"
- name: "controlplane"
connection:
proxyProtocol: "Direct"
- name: "etcd"
connection:
proxyProtocol: "Direct"
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: v1
kind: Secret
metadata:
name: konnectivity-agent-certs
namespace: openshift-bootstrap-konnectivity
labels:
app: konnectivity-agent
openshift.io/bootstrap-only: "true"
type: Opaque
data:
tls.crt: ${KONNECTIVITY_AGENT_CERT_BASE64}
tls.key: ${KONNECTIVITY_AGENT_KEY_BASE64}
ca.crt: ${KONNECTIVITY_CA_CERT_BASE64}
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: konnectivity-agent
namespace: openshift-bootstrap-konnectivity
labels:
app: konnectivity-agent
openshift.io/bootstrap-only: "true"
spec:
selector:
matchLabels:
app: konnectivity-agent
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 10%
Comment on lines +13 to +16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these pods only run during bootstrap and will "never?" get updated so we can just ignore this setting, right 🤔?

Besides, I guess 10% of 3 control plane node is ~ 1 node; thus, it is equivalent to maxUnavailable: 1, which is already the default that k8s set (according to docs).

template:
metadata:
labels:
app: konnectivity-agent
spec:
hostNetwork: true
dnsPolicy: Default
priorityClassName: system-node-critical
tolerations:
- operator: Exists
containers:
- name: konnectivity-agent
image: ${KONNECTIVITY_IMAGE}
command:
- /usr/bin/proxy-agent
Comment on lines +27 to +31
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should give this agent container a resource request so that it won't be the first get evicted if node is under pressure (theoretically).

As reference, Hypershift sets the following values 👀

args:
- --logtostderr=true
- --ca-cert=/etc/konnectivity/ca.crt
- --agent-cert=/etc/konnectivity/tls.crt
- --agent-key=/etc/konnectivity/tls.key
- --proxy-server-host=${BOOTSTRAP_NODE_IP}
- --proxy-server-port=8091
- --health-server-port=2041
- --agent-identifiers=default-route=true
- --keepalive-time=30s
- --probe-interval=5s
- --sync-interval=5s
- --sync-interval-cap=30s
livenessProbe:
httpGet:
path: /healthz
port: 2041
initialDelaySeconds: 10
periodSeconds: 10
volumeMounts:
- name: konnectivity-certs
mountPath: /etc/konnectivity
readOnly: true
volumes:
- name: konnectivity-certs
secret:
secretName: konnectivity-agent-certs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: kubecontrolplane.config.openshift.io/v1
kind: KubeAPIServerConfig
apiServerArguments:
egress-selector-config-file:
- "/etc/kubernetes/config/egress-selector-config.yaml"
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: v1
kind: Namespace
metadata:
name: openshift-bootstrap-konnectivity
labels:
openshift.io/bootstrap-only: "true"
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
apiVersion: v1
kind: Pod
metadata:
name: konnectivity-server
namespace: kube-system
labels:
app: konnectivity-server
spec:
hostNetwork: true
priorityClassName: system-node-critical
containers:
- name: konnectivity-server
image: ${KONNECTIVITY_IMAGE}
command:
- /usr/bin/proxy-server
Comment on lines +12 to +15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should give this server container a resource request so that it won't be the first get evicted if node is under pressure (theoretically).

As reference, Hypershift sets the following values 👀

args:
- --logtostderr=true
- --cluster-cert=/etc/konnectivity/server.crt
- --cluster-key=/etc/konnectivity/server.key
- --cluster-ca-cert=/etc/konnectivity/ca.crt
- --uds-name=/etc/kubernetes/bootstrap-configs/konnectivity-server.socket
- --server-port=0
- --agent-port=8091
- --health-port=2041
- --mode=http-connect
- --proxy-strategies=destHost,defaultRoute
- --keepalive-time=30s
- --frontend-keepalive-time=30s
livenessProbe:
httpGet:
path: /healthz
port: 2041
initialDelaySeconds: 10
periodSeconds: 10
volumeMounts:
- name: config-dir
mountPath: /etc/kubernetes/bootstrap-configs
- name: konnectivity-certs
mountPath: /etc/konnectivity
readOnly: true
volumes:
- name: config-dir
hostPath:
path: /etc/kubernetes/bootstrap-configs
type: DirectoryOrCreate
- name: konnectivity-certs
hostPath:
path: /opt/openshift/tls/konnectivity
type: Directory
13 changes: 12 additions & 1 deletion data/data/bootstrap/files/usr/local/bin/bootkube.sh.template
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ set -euoE pipefail ## -E option will cause functions to inherit trap
. /usr/local/bin/bootstrap-cluster-gather.sh
# shellcheck source=bootstrap-verify-api-server-urls.sh
. /usr/local/bin/bootstrap-verify-api-server-urls.sh
# shellcheck source=konnectivity.sh.template
. /usr/local/bin/konnectivity.sh

mkdir --parents /etc/kubernetes/{manifests,bootstrap-configs,bootstrap-manifests}

Expand Down Expand Up @@ -245,6 +247,8 @@ then
record_service_stage_success
fi

konnectivity_setup

if [ ! -f kube-apiserver-bootstrap.done ]
then
record_service_stage_start "kube-apiserver-bootstrap"
Expand All @@ -269,9 +273,12 @@ then
--infra-config-file=/assets/manifests/cluster-infrastructure-02-config.yml \
--rendered-manifest-files=/assets/manifests \
--payload-version=$VERSION \
--operand-kubernetes-version="${KUBERNETES_VERSION}"
--operand-kubernetes-version="${KUBERNETES_VERSION}" \
--config-override-files=/assets/konnectivity-config-override.yaml

cp kube-apiserver-bootstrap/config /etc/kubernetes/bootstrap-configs/kube-apiserver-config.yaml
# Copy egress selector config to bootstrap-configs where KAS can read it
cp /opt/openshift/egress-selector-config.yaml /etc/kubernetes/bootstrap-configs/egress-selector-config.yaml
cp kube-apiserver-bootstrap/bootstrap-manifests/* bootstrap-manifests/
cp kube-apiserver-bootstrap/manifests/* manifests/

Expand Down Expand Up @@ -566,6 +573,8 @@ then
record_service_stage_success
fi

konnectivity_manifests

REQUIRED_PODS="openshift-kube-apiserver/kube-apiserver,openshift-kube-scheduler/openshift-kube-scheduler,openshift-kube-controller-manager/kube-controller-manager,openshift-cluster-version/cluster-version-operator"
if [ "$BOOTSTRAP_INPLACE" = true ]
then
Expand Down Expand Up @@ -651,6 +660,8 @@ if [ ! -f api-int-dns-check.done ]; then
fi
fi

konnectivity_cleanup

# Workaround for https://github.com/opencontainers/runc/pull/1807
touch /opt/openshift/.bootkube.done
echo "bootkube.service complete"
55 changes: 55 additions & 0 deletions data/data/bootstrap/files/usr/local/bin/konnectivity-certs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/usr/bin/env bash
set -euo pipefail

# Generate Konnectivity certificates with a self-signed CA (1-day validity).
# These are needed for mTLS between the Konnectivity server and agents
# during the bootstrap phase.
#
# Usage: konnectivity-certs.sh <bootstrap-node-ip>

BOOTSTRAP_NODE_IP="${1:?Usage: konnectivity-certs.sh <bootstrap-node-ip>}"

KONNECTIVITY_CERT_DIR=/opt/openshift/tls/konnectivity
mkdir -p "${KONNECTIVITY_CERT_DIR}"

echo "Generating Konnectivity certificates in ${KONNECTIVITY_CERT_DIR}..."

# Generate self-signed Konnectivity CA
openssl req -x509 -newkey rsa:2048 -nodes \
-keyout "${KONNECTIVITY_CERT_DIR}/ca.key" \
-out "${KONNECTIVITY_CERT_DIR}/ca.crt" \
-days 1 \
-subj "/CN=konnectivity-signer/O=openshift"

# Server certificate for agent endpoint (needs bootstrap IP as SAN)
openssl req -new -newkey rsa:2048 -nodes \
-keyout "${KONNECTIVITY_CERT_DIR}/server.key" \
-out "${KONNECTIVITY_CERT_DIR}/server.csr" \
-subj "/CN=konnectivity-server/O=openshift"

openssl x509 -req -in "${KONNECTIVITY_CERT_DIR}/server.csr" \
-CA "${KONNECTIVITY_CERT_DIR}/ca.crt" \
-CAkey "${KONNECTIVITY_CERT_DIR}/ca.key" \
-CAcreateserial \
-out "${KONNECTIVITY_CERT_DIR}/server.crt" \
-days 1 \
-extfile <(printf "extendedKeyUsage=serverAuth\nsubjectAltName=IP:%s" "${BOOTSTRAP_NODE_IP}")

# Agent client certificate (shared by all agents)
openssl req -new -newkey rsa:2048 -nodes \
-keyout "${KONNECTIVITY_CERT_DIR}/agent.key" \
-out "${KONNECTIVITY_CERT_DIR}/agent.csr" \
-subj "/CN=konnectivity-agent/O=openshift"

openssl x509 -req -in "${KONNECTIVITY_CERT_DIR}/agent.csr" \
-CA "${KONNECTIVITY_CERT_DIR}/ca.crt" \
-CAkey "${KONNECTIVITY_CERT_DIR}/ca.key" \
-CAcreateserial \
-out "${KONNECTIVITY_CERT_DIR}/agent.crt" \
-days 1 \
-extfile <(printf "extendedKeyUsage=clientAuth")

# Clean up CSR files
rm -f "${KONNECTIVITY_CERT_DIR}"/*.csr

echo "Konnectivity certificates generated successfully."
71 changes: 71 additions & 0 deletions data/data/bootstrap/files/usr/local/bin/konnectivity.sh.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/usr/bin/env bash
# Konnectivity bootstrap functions.
# Sourced by bootkube.sh — do not execute directly.

# konnectivity_setup detects the bootstrap node IP, generates certificates,
# and creates the konnectivity server static pod manifest.
konnectivity_setup() {
# Detect bootstrap node IP at runtime using the default route source address.
# Konnectivity agents use this to connect back to the bootstrap server.
{{- if .UseIPv6ForNodeIP }}
BOOTSTRAP_NODE_IP=$(ip -6 -j route get 2001:4860:4860::8888 | jq -r '.[0].prefsrc')
{{- else }}
BOOTSTRAP_NODE_IP=$(ip -j route get 1.1.1.1 | jq -r '.[0].prefsrc')
{{- end }}
Comment on lines +10 to +14
Copy link
Member

@tthvo tthvo Mar 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also honour the field .BootstrapNodeIP if set via the environment variable OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP, right?

Tracing back to the commit, it may be necessary for assisted installer 🤔?

bootstrapNodeIP := os.Getenv("OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP")
if bootstrapNodeIP != "" && net.ParseIP(bootstrapNodeIP) == nil {
logrus.Warnf("OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP must have valid ip address, given %s. Skipping it", bootstrapNodeIP)
bootstrapNodeIP = ""
}

BootstrapNodeIP string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we can do something like 👇 WDYT?

{{- if .BootstrapNodeIP }}                                                                                                                                                 
      # Use explicitly configured bootstrap node IP                                                                                                                          
      BOOTSTRAP_NODE_IP="{{.BootstrapNodeIP}}"
      echo "Using configured bootstrap node IP: ${BOOTSTRAP_NODE_IP}"
{{- else }}
      # Detect bootstrap node IP at runtime using the default route source address.
      # Konnectivity agents use this to connect back to the bootstrap server.
  {{- if .UseIPv6ForNodeIP }}
      BOOTSTRAP_NODE_IP=$(ip -6 -j route get 2001:4860:4860::8888 | jq -r '.[0].prefsrc')
  {{- else }}
      BOOTSTRAP_NODE_IP=$(ip -j route get 1.1.1.1 | jq -r '.[0].prefsrc')
  {{- end }}
      echo "Detected bootstrap node IP: ${BOOTSTRAP_NODE_IP}"
{{- end }}

echo "Detected bootstrap node IP: ${BOOTSTRAP_NODE_IP}"

if [ ! -f konnectivity-certs.done ]; then
record_service_stage_start "konnectivity-certs"
/usr/local/bin/konnectivity-certs.sh "${BOOTSTRAP_NODE_IP}"
touch konnectivity-certs.done
record_service_stage_success
fi

if [ ! -f konnectivity-server-bootstrap.done ]; then
record_service_stage_start "konnectivity-server-bootstrap"
echo "Creating Konnectivity server static pod manifest..."
export KONNECTIVITY_IMAGE=$(image_for apiserver-network-proxy)
envsubst < /opt/openshift/konnectivity-server-pod.yaml > /etc/kubernetes/manifests/konnectivity-server-pod.yaml
touch konnectivity-server-bootstrap.done
record_service_stage_success
fi
}

# konnectivity_manifests creates the agent namespace, secret, and daemonset
# manifests for cluster deployment.
konnectivity_manifests() {
if [ ! -f konnectivity-agent-manifest.done ]; then
record_service_stage_start "konnectivity-agent-manifest"
echo "Creating Konnectivity agent manifests..."

KONNECTIVITY_CERT_DIR=/opt/openshift/tls/konnectivity

cp /opt/openshift/konnectivity-namespace.yaml manifests/konnectivity-namespace.yaml

export KONNECTIVITY_AGENT_CERT_BASE64=$(base64 -w0 "${KONNECTIVITY_CERT_DIR}/agent.crt")
export KONNECTIVITY_AGENT_KEY_BASE64=$(base64 -w0 "${KONNECTIVITY_CERT_DIR}/agent.key")
export KONNECTIVITY_CA_CERT_BASE64=$(base64 -w0 "${KONNECTIVITY_CERT_DIR}/ca.crt")
envsubst < /opt/openshift/konnectivity-agent-certs-secret.yaml > manifests/konnectivity-agent-certs.yaml

export BOOTSTRAP_NODE_IP
envsubst < /opt/openshift/konnectivity-agent-daemonset.yaml > manifests/konnectivity-agent-daemonset.yaml

touch konnectivity-agent-manifest.done
record_service_stage_success
fi
}

# konnectivity_cleanup removes bootstrap konnectivity resources by deleting
# the namespace (cascading to DaemonSet and Secret) and the server static pod.
konnectivity_cleanup() {
if [ ! -f konnectivity-cleanup.done ]; then
record_service_stage_start "konnectivity-cleanup"
echo "Cleaning up bootstrap konnectivity resources..."
oc delete namespace openshift-bootstrap-konnectivity \
--kubeconfig=/opt/openshift/auth/kubeconfig \
--ignore-not-found=true || true
Comment on lines +64 to +66
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
oc delete namespace openshift-bootstrap-konnectivity \
--kubeconfig=/opt/openshift/auth/kubeconfig \
--ignore-not-found=true || true
oc delete namespace openshift-bootstrap-konnectivity \
--kubeconfig=/opt/openshift/auth/kubeconfig \
--ignore-not-found=true

I guess we should fail if the cleanup somehow failed (except not-found) right? Otherwise, resources will be left behind and can potentially "break" the end openshift cluster?

rm -f /etc/kubernetes/manifests/konnectivity-server-pod.yaml
touch konnectivity-cleanup.done
record_service_stage_success
fi
}
7 changes: 7 additions & 0 deletions pkg/asset/manifests/aws/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,13 @@ func GenerateClusterAssets(ic *installconfig.InstallConfig, clusterID *installco
ToPort: 10259,
SourceSecurityGroupRoles: []capa.SecurityGroupRole{"controlplane", "node"},
},
{
Description: "Konnectivity agent traffic from cluster nodes",
Protocol: capa.SecurityGroupProtocolTCP,
FromPort: 8091,
ToPort: 8091,
SourceSecurityGroupRoles: []capa.SecurityGroupRole{"controlplane", "node"},
},
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add remove this rule when destroying bootstrap, right? This probably means patching the awscluster CR and waiting for the rule to disappear...

💡 Another idea: since this is scoped to only bootstrap node, the installer may pre-create a security group specifically for bootstrap with this rule? This SG can be attached via AdditionalSecurityGroups.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I just saw #10344 (comment) so we do need to clean up the rule :D

Description: BootstrapSSHDescription,
Protocol: capa.SecurityGroupProtocolTCP,
Expand Down
11 changes: 11 additions & 0 deletions pkg/asset/manifests/azure/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,17 @@ func GenerateClusterAssets(installConfig *installconfig.InstallConfig, clusterID
Destination: ptr.To("*"),
Action: capz.SecurityRuleActionAllow,
},
{
Name: "konnectivity_in",
Protocol: capz.SecurityGroupProtocolTCP,
Direction: capz.SecurityRuleDirectionInbound,
Priority: 103,
SourcePorts: ptr.To("*"),
DestinationPorts: ptr.To("8091"),
Source: ptr.To(source),
Destination: ptr.To("*"),
Action: capz.SecurityRuleActionAllow,
},
{
Name: fmt.Sprintf("%s_ssh_in", clusterID.InfraID),
Protocol: capz.SecurityGroupProtocolTCP,
Expand Down
18 changes: 18 additions & 0 deletions pkg/asset/manifests/ibmcloud/securitygroups.go
Original file line number Diff line number Diff line change
Expand Up @@ -421,6 +421,24 @@ func buildControlPlaneSecurityGroup(infraID string) capibmcloud.VPCSecurityGroup
},
},
},
{
// Konnectivity
Action: capibmcloud.VPCSecurityGroupRuleActionAllow,
Direction: capibmcloud.VPCSecurityGroupRuleDirectionInbound,
Source: &capibmcloud.VPCSecurityGroupRulePrototype{
PortRange: &capibmcloud.VPCSecurityGroupPortRange{
MaximumPort: 8091,
MinimumPort: 8091,
},
Protocol: capibmcloud.VPCSecurityGroupRuleProtocolTCP,
Remotes: []capibmcloud.VPCSecurityGroupRuleRemote{
{
RemoteType: capibmcloud.VPCSecurityGroupRuleRemoteTypeSG,
SecurityGroupName: clusterWideSGNamePtr,
},
},
},
},
},
}
}
Expand Down
Loading