diff --git a/modules/telco-core-about-the-telco-core-cluster-use-model.adoc b/modules/telco-core-about-the-telco-core-cluster-use-model.adoc index 0164f253dd98..43311b35539a 100644 --- a/modules/telco-core-about-the-telco-core-cluster-use-model.adoc +++ b/modules/telco-core-about-the-telco-core-cluster-use-model.adoc @@ -6,20 +6,19 @@ [id="telco-core-about-the-telco-core-cluster-use-model_{context}"] = About the telco core cluster use model -The telco core cluster use model is designed for clusters that run on commodity hardware. +The telco core cluster use model is designed for clusters running on commodity hardware. Telco core clusters support large scale telco applications including control plane functions like signaling, aggregation, session border controller (SBC), and centralized data plane functions such as 5G user plane functions (UPF). Telco core cluster functions require scalability, complex networking support, resilient software-defined storage, and support performance requirements that are less stringent and constrained than far-edge RAN deployments. Networking requirements for telco core functions vary widely across a range of networking features and performance points. IPv6 is a requirement and dual-stack is common. Some functions need maximum throughput and transaction rate and require support for user-plane DPDK networking. -Other functions use more typical cloud-native patterns and can rely on OVN-Kubernetes, kernel networking, and load balancing. +Other functions use typical cloud-native patterns and can rely on OVN-Kubernetes, kernel networking, and load balancing. Telco core clusters are configured as standard with three control plane and one or more worker nodes configured with the stock (non-RT) kernel. In support of workloads with varying networking and performance requirements, you can segment worker nodes by using `MachineConfigPool` custom resources (CR), for example, for non-user data plane or high-throughput use cases. In support of required telco operational features, core clusters have a standard set of Day 2 OLM-managed Operators installed. - .Telco core RDS cluster service-based architecture and networking topology image::openshift-5g-core-cluster-architecture-networking.png[5G core cluster showing a service-based architecture with overlaid networking topology] diff --git a/modules/telco-core-additional-storage-solutions.adoc b/modules/telco-core-additional-storage-solutions.adoc index 2cd3aa02a04a..f2fb2dbc2798 100644 --- a/modules/telco-core-additional-storage-solutions.adoc +++ b/modules/telco-core-additional-storage-solutions.adoc @@ -4,6 +4,7 @@ :_mod-docs-content-type: REFERENCE [id="telco-core-additional-storage-solutions_{context}"] + = Additional storage solutions You can use other storage solutions to provide persistent storage for telco core clusters. The configuration and integration of these solutions is outside the scope of the reference design specifications (RDS). diff --git a/modules/telco-core-agent-based-installer.adoc b/modules/telco-core-agent-based-installer.adoc index 633a954f361f..aaf34067c376 100644 --- a/modules/telco-core-agent-based-installer.adoc +++ b/modules/telco-core-agent-based-installer.adoc @@ -3,6 +3,7 @@ // * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc :_mod-docs-content-type: REFERENCE + [id="telco-core-agent-based-installer_{context}"] = Agent-based Installer @@ -10,24 +11,21 @@ New in this release:: * No reference design updates in this release. Description:: +The recommended method for Telco Core cluster installation is using Red Hat Advanced Cluster Management. +The Agent Based Installer (ABI) is a separate installation flow for Openshift in environments without existing infrastructure for running cluster deployments. +Use the ABI to install {product-title} on bare-metal servers without requiring additional servers or VMs for managing the installation, but does not provide ongoing lifecycle management, monitoring or automations. +The ABI can be run on any system for example, from a laptop to generate an ISO installation image. +The ISO is used as the installation media for the cluster control plane nodes. +You can monitor the progress by using the ABI from any system with network connectivity to the control plane node's API interfaces. + --- -Telco core clusters can be installed using the Agent-based Installer. -This method allows you to install {product-title} on bare-metal servers without requiring additional servers or VMs for managing the installation. -The Agent-based Installer can be run on any system (for example, from a laptop) to generate an ISO installation image. -The ISO is used as the installation media for the cluster supervisor nodes. -Progress can be monitored using the Agent-based Installer from any system with network connectivity to the supervisor node's API interfaces. - -Agent-based Installer supports the following: +ABI supports the following: -* Installation from declarative CRs. -* Installation in disconnected environments. -* Installation without the use of additional servers to support installation, for example, the bastion node. --- +* Installation from declarative CRs +* Installation in disconnected environments +* No additional servers required to support installation, for example, the bastion node is no longer needed Limits and requirements:: * Disconnected installation requires a registry with all required content mirrored and reachable from the installed host. Engineering considerations:: -* Networking configuration should be applied as NMState configuration during installation. Day 2 networking configuration using the NMState Operator is not supported. - +* Networking configuration should be applied as NMState configuration during installation as opposed to Day 2 configuration using the NMState Operator. \ No newline at end of file diff --git a/modules/telco-core-application-workloads.adoc b/modules/telco-core-application-workloads.adoc index a2bbfd301a17..af2fe90d0c68 100644 --- a/modules/telco-core-application-workloads.adoc +++ b/modules/telco-core-application-workloads.adoc @@ -13,14 +13,16 @@ Typically, pods that run high performance or latency sensitive CNFs by using use When creating pod configurations that require exclusive CPUs, be aware of the potential implications of hyper-threaded systems. Pods should request multiples of 2 CPUs when the entire core (2 hyper-threads) must be allocated to the pod. + Pods running network functions that do not require high throughput or low latency networking should be scheduled with best-effort or burstable QoS pods and do not require dedicated or isolated CPU cores. Engineering considerations:: -+ --- -Use the following information to plan telco core workloads and cluster resources: -* As of {product-title} 4.19, cgroup v1 is no longer supported and has been removed. All workloads must now be compatible with cgroup v2. For more information, see link:https://www.redhat.com/en/blog/rhel-9-changes-context-red-hat-openshift-workloads[Red Hat Enterprise Linux 9 changes in the context of Red Hat OpenShift workloads]. +Plan telco core workloads and cluster resources by using the following information: + +* As of {product-title} 4.19, `cgroup v1` is no longer supported and has been removed. +All workloads must now be compatible with `cgroup v2`. +For more information, see link:https://www.redhat.com/en/blog/rhel-9-changes-context-red-hat-openshift-workloads[Red Hat Enterprise Linux 9 changes in the context of Red Hat OpenShift workloads]. * CNF applications should conform to the latest version of https://redhat-best-practices-for-k8s.github.io/guide/[Red Hat Best Practices for Kubernetes]. * Use a mix of best-effort and burstable QoS pods as required by your applications. ** Use guaranteed QoS pods with proper configuration of reserved or isolated CPUs in the `PerformanceProfile` CR that configures the node. @@ -28,11 +30,8 @@ Use the following information to plan telco core workloads and cluster resources ** Best effort and burstable pods are not guaranteed exclusive CPU use. Workloads can be preempted by other workloads, operating system daemons, or kernel tasks. * Use exec probes sparingly and only when no other suitable option is available. -** Do not use exec probes if a CNF uses CPU pinning. -Use other probe implementations, for example, `httpGet` or `tcpSocket`. -** When you need to use exec probes, limit the exec probe frequency and quantity. -The maximum number of exec probes must be kept below 10, and the frequency must not be set to less than 10 seconds. -** You can use startup probes, because they do not use significant resources at steady-state operation. -The limitation on exec probes applies primarily to liveness and readiness probes. -Exec probes cause much higher CPU usage on management cores compared to other probe types because they require process forking. --- \ No newline at end of file +** Do not use exec probes if a CNF uses CPU pinning. Use other probe implementations, for example, `httpGet` or `tcpSocket`. +** When you need to use exec probes, limit the exec probe frequency and quantity. The maximum number of exec probes must be kept below 10, and the frequency must not be set to less than 10 seconds. +** You can use startup probes, because they do not use significant resources at steady-state operation. The limitation on exec probes applies primarily to liveness and readiness probes. Exec probes cause much higher CPU usage on management cores compared to other probe types because they require process forking. +* Use pre-stop hooks to allow the application workload to perform required actions before pod disruption, such as during an upgrade or node maintenance. The hooks enable a pod to save state to persistent storage, offload traffic from a Service, or signal other Pods. + diff --git a/modules/telco-core-cluster-common-use-model-engineering-considerations.adoc b/modules/telco-core-cluster-common-use-model-engineering-considerations.adoc index 02b8c1e53350..fcf831587d29 100644 --- a/modules/telco-core-cluster-common-use-model-engineering-considerations.adoc +++ b/modules/telco-core-cluster-common-use-model-engineering-considerations.adoc @@ -8,9 +8,10 @@ * Cluster workloads are detailed in "Application workloads". * Worker nodes should run on either of the following CPUs: -** Intel 3rd Generation Xeon (IceLake) CPUs or newer when supported by {product-title}, or CPUs with the silicon security bug (Spectre and similar) mitigations turned off. -Skylake and older CPUs can experience 40% transaction performance drops when Spectre and similar mitigations are enabled. When Skylake and older CPUs change power states, this can cause latency. -** AMD EPYC Zen 4 CPUs (Genoa, Bergamo). +** Intel 3rd Generation Xeon (IceLake) CPUs or better when supported by {product-title}, or CPUs with the silicon security bug (Spectre and similar) mitigations turned off. +Skylake and older CPUs can experience 40% transaction performance drops when Spectre and similar mitigations are enabled. +** AMD EPYC Zen 4 CPUs (Genoa, Bergamo) or AMD EPYC Zen 5 CPUs (Turin) when supported by {product-title}. +** Intel Sierra Forest CPUs when supported by the {product-title}. ** IRQ balancing is enabled on worker nodes. The `PerformanceProfile` CR sets the `globallyDisableIrqLoadBalancing` parameter to a value of `false`. Guaranteed QoS pods are annotated to ensure isolation as described in "CPU partitioning and performance tuning". @@ -23,6 +24,7 @@ Guaranteed QoS pods are annotated to ensure isolation as described in "CPU parti * The balance between power management and maximum performance varies between machine config pools in the cluster. The following configurations should be consistent for all nodes in a machine config pools group. + ** Cluster scaling. See "Scalability" for more information. ** Clusters should be able to scale to at least 120 nodes. @@ -35,7 +37,6 @@ For a cluster configured according to the reference configuration running a simu ** The NICs used for non-DPDK network traffic should be configured to use at most 32 RX/TX queues. ** Nodes with large numbers of pods or other resources might require additional reserved CPUs. The remaining CPUs are available for user workloads. - + [NOTE] ==== diff --git a/modules/telco-core-cluster-network-operator.adoc b/modules/telco-core-cluster-network-operator.adoc index 9db0746f909c..ffb5f4a65903 100644 --- a/modules/telco-core-cluster-network-operator.adoc +++ b/modules/telco-core-cluster-network-operator.adoc @@ -7,18 +7,20 @@ = Cluster Network Operator New in this release:: -* No reference design updates in this release + +* No reference design updates in this release Description:: -+ --- -The Cluster Network Operator (CNO) deploys and manages the cluster network components including the default OVN-Kubernetes network plugin during cluster installation. The CNO allows for configuring primary interface MTU settings, OVN gateway configurations to use node routing tables for pod egress, and additional secondary networks such as MACVLAN. +The Cluster Network Operator (CNO) deploys and manages the cluster network components including the default OVN-Kubernetes network plugin during cluster installation. +The CNO allows configuration of primary interface MTU settings, OVN gateway modes to use node routing tables for pod egress, and additional secondary networks such as MACVLAN. ++ In support of network traffic separation, multiple network interfaces are configured through the CNO. -Traffic steering to these interfaces is configured through static routes applied by using the NMState Operator. To ensure that pod traffic is properly routed, OVN-K is configured with the `routingViaHost` option enabled. This setting uses the kernel routing table and the applied static routes rather than OVN for pod egress traffic. - +Traffic steering to these interfaces is configured through static routes applied by using the NMState Operator. +To ensure that pod traffic is properly routed, OVN-K is configured with the `routingViaHost` option enabled. +This setting uses the kernel routing table and the applied static routes rather than OVN for pod egress traffic. ++ The Whereabouts CNI plugin is used to provide dynamic IPv4 and IPv6 addressing for additional pod network interfaces without the use of a DHCP server. --- Limits and requirements:: * OVN-Kubernetes is required for IPv6 support. @@ -29,15 +31,23 @@ MTU size up to 8900 is supported. This handler allows a third-party module to process incoming packets before the host processes them, and only one such handler can be registered per network interface. Since both MACVLAN and IPVLAN need to register their own `rx_handler` to function, they conflict and cannot coexist on the same interface. Review the source code for more details: + ** https://elixir.bootlin.com/linux/v6.10.2/source/drivers/net/ipvlan/ipvlan_main.c#L82[linux/v6.10.2/source/drivers/net/ipvlan/ipvlan_main.c#L82] ** https://elixir.bootlin.com/linux/v6.10.2/source/drivers/net/macvlan.c#L1260[linux/v6.10.2/source/drivers/net/macvlan.c#L1260] + * Alternative NIC configurations include splitting the shared NIC into multiple NICs or using a single dual-port NIC, though they have not been tested and validated. * Clusters with single-stack IP configuration are not validated. -* The `reachabilityTotalTimeoutSeconds` parameter in the `Network` CR configures the `EgressIP` node reachability check total timeout in seconds. -The recommended value is `1` second. +* EgressIP +** EgressIP failover time depends on the `reachabilityTotalTimeoutSeconds` parameter in the `Network` CR. +This parameter determines the frequency of probes used to detect when the selected egress node is unreachable. +The recommended value of this parameter is `1` second. +** When EgressIP is configured with multiple egress nodes, the failover time is expected to be on the order of seconds or longer. +** On nodes with additional network interfaces EgressIP traffic will egress through the interface on which the EgressIP address has been assigned. +See the "Configuring an egress IP address". * Pod-level SR-IOV bonding mode must be set to `active-backup` and a value in `miimon` must be set (`100` is recommended). Engineering considerations:: -* Pod egress traffic is handled by kernel routing table using the `routingViaHost` option. + +* Pod egress traffic is managed by kernel routing table using the `routingViaHost` option. Appropriate static routes must be configured in the host. diff --git a/modules/telco-core-common-baseline-model.adoc b/modules/telco-core-common-baseline-model.adoc index 21480d1d6960..72ce348ddf6e 100644 --- a/modules/telco-core-common-baseline-model.adoc +++ b/modules/telco-core-common-baseline-model.adoc @@ -10,20 +10,28 @@ The following configurations and use models are applicable to all telco core use The telco core use cases build on this common baseline of features. Cluster topology:: -Telco core clusters conform to the following requirements: -* High availability control plane (three or more control plane nodes) -* Non-schedulable control plane nodes -* Multiple machine config pools +The telco core reference design supports two distinct cluster configuration variants: + +* A non-schedulable control plane variant, where user workloads are strictly prohibited from running on master nodes. + +* A schedulable control plane variant, which allows for user workloads to run on master nodes to optimize resource utilization. This variant is only applicable to bare-metal control plane nodes and must be configured at installation time. ++ +All clusters, regardless of the variant, must conform to the following requirements: + +* A highly available control plane consisting of three or more nodes. + +* The use of multiple machine config pools. Storage:: -Telco core use cases require persistent storage as provided by {rh-storage}. +Telco core use cases require highly available persistent storage as provided by an external storage solution. +{rh-storage} might be used to manage access to the external storage. Networking:: Telco core cluster networking conforms to the following requirements: * Dual stack IPv4/IPv6 (IPv4 primary). -* Fully disconnected – clusters do not have access to public networking at any point in their lifecycle. +* Fully disconnected - clusters do not have access to public networking at any point in their lifecycle. * Supports multiple networks. Segmented networking provides isolation between operations, administration and maintenance (OAM), signaling, and storage traffic. * Cluster network type is OVN-Kubernetes as required for IPv6 support. @@ -43,6 +51,5 @@ User plane networking runs in cloud-native network functions (CNFs). Service Mesh:: Telco CNFs can use Service Mesh. -All telco core clusters require a Service Mesh implementation. +Telco core clusters typically include a Service Mesh implementation. The choice of implementation and configuration is outside the scope of this specification. - diff --git a/modules/telco-core-cpu-partitioning-and-performance-tuning.adoc b/modules/telco-core-cpu-partitioning-and-performance-tuning.adoc index b9eda4b6a7df..6334284fa3c7 100644 --- a/modules/telco-core-cpu-partitioning-and-performance-tuning.adoc +++ b/modules/telco-core-cpu-partitioning-and-performance-tuning.adoc @@ -7,7 +7,10 @@ = CPU partitioning and performance tuning New in this release:: -* No reference design updates in this release. +* Disable RPS - resource use for pod networking should be accounted for on application CPUs +* Better isolation of control plane on schedulable control-plane nodes +* Support for schedulable control-plane in the NUMA Resources Operator +* Additional guidance on upgrade for Telco Core clusters Description:: CPU partitioning improves performance and reduces latency by separating sensitive workloads from general-purpose tasks, interrupts, and driver work queues. @@ -21,17 +24,21 @@ Limits and requirements:: * The set of reserved and isolated cores must include all CPU cores. * Core 0 of each NUMA node must be included in the reserved CPU set. * Low latency workloads require special configuration to avoid being affected by interrupts, kernel scheduler, or other parts of the platform. + For more information, see "Creating a performance profile". Engineering considerations:: -* As of {product-title} 4.19, `cgroup v1` is no longer supported and has been removed. All workloads must now be compatible with `cgroup v2`. For more information, see link:https://www.redhat.com/en/blog/rhel-9-changes-context-red-hat-openshift-workloads[Red Hat Enterprise Linux 9 changes in the context of Red Hat OpenShift workloads](Red Hat Knowledgebase). -* The minimum reserved capacity (`systemReserved`) required can be found by following the guidance in the Red Hat Knowledgebase solution link:https://access.redhat.com/solutions/5843241[Which amount of CPU and memory are recommended to reserve for the system in OCP 4 nodes?] +* As of OpenShift 4.19, `cgroup v1` is no longer supported and has been removed. +All workloads must now be compatible with `cgroup v2`. +For more information, see link:https://www.redhat.com/en/blog/rhel-9-changes-context-red-hat-openshift-workloads[Red Hat Enterprise Linux 9 changes in the context of Red Hat OpenShift workloads]. +* The minimum reserved capacity (`systemReserved`) required can be found by following the guidance in link:https://access.redhat.com/solutions/5843241[Which amount of CPU and memory are recommended to reserve for the system in OCP 4 nodes?]. +* For schedulable control planes, the minimum recommended reserved capacity is at least 16 CPUs. * The actual required reserved CPU capacity depends on the cluster configuration and workload attributes. * The reserved CPU value must be rounded up to a full core (2 hyper-threads) alignment. * Changes to CPU partitioning cause the nodes contained in the relevant machine config pool to be drained and rebooted. * The reserved CPUs reduce the pod density, because the reserved CPUs are removed from the allocatable capacity of the {product-title} node. * The real-time workload hint should be enabled for real-time capable workloads. -** Applying the real time `workloadHint` setting results in the `nohz_full` kernel command line parameter being applied to improve performance of high performance applications. +** Applying the real-time `workloadHint` setting results in the `nohz_full` kernel command line parameter being applied to improve performance of high performance applications. When you apply the `workloadHint` setting, any isolated or burstable pods that do not have the `cpu-quota.crio.io: "disable"` annotation and a proper `runtimeClassName` value, are subject to CRI-O rate limiting. When you set the `workloadHint` parameter, be aware of the tradeoff between increased performance and the potential impact of CRI-O rate limiting. Ensure that required pods are correctly annotated. @@ -42,7 +49,9 @@ You do not need to reserve an additional CPU for handling high network throughpu * If workloads running on the cluster use kernel level networking, the RX/TX queue count for the participating NICs should be set to 16 or 32 queues if the hardware permits it. Be aware of the default queue count. With no configuration, the default queue count is one RX/TX queue per online CPU; which can result in too many interrupts being allocated. -* The irdma kernel module may result in the allocation of too many interrupt vectors on systems with high core counts. To prevent this condition the reference configuration excludes this kernel module from loading through a kernel commandline argument in the `PerformanceProfile`. Typically core workloads do not require this kernel module. +* The irdma kernel module might result in the allocation of too many interrupt vectors on systems with high core counts. +To prevent this condition the reference configuration excludes this kernel module from loading through a kernel commandline argument in the `PerformanceProfile` resource. +Typically Core workloads do not require this kernel module. + [NOTE] ==== diff --git a/modules/telco-core-crs-cluster-infrastructure.adoc b/modules/telco-core-crs-cluster-infrastructure.adoc index 28033079916e..454632446124 100644 --- a/modules/telco-core-crs-cluster-infrastructure.adoc +++ b/modules/telco-core-crs-cluster-infrastructure.adoc @@ -1,6 +1,6 @@ // Module included in the following assemblies: // -// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc +// * :_mod-docs-content-type: REFERENCE [id="cluster-infrastructure-crs_{context}"] diff --git a/modules/telco-core-crs-networking.adoc b/modules/telco-core-crs-networking.adoc index e461aa722f15..e84eb8b1dedb 100644 --- a/modules/telco-core-crs-networking.adoc +++ b/modules/telco-core-crs-networking.adoc @@ -1,7 +1,6 @@ // Module included in the following assemblies: // -// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc - +// * :_mod-docs-content-type: REFERENCE [id="networking-crs_{context}"] @@ -17,7 +16,7 @@ Load Balancer,`addr-pool.yaml`,Configures MetalLB to manage a pool of IP address Load Balancer,`bfd-profile.yaml`,"Configures bidirectional forwarding detection (BFD) with customized intervals, detection multiplier, and modes for quicker network fault detection and load balancing failover.",No Load Balancer,`bgp-advr.yaml`,"Defines a BGP advertisement resource for MetalLB, specifying how an IP address pool is advertised to BGP peers. This enables fine-grained control over traffic routing and announcements.",No Load Balancer,`bgp-peer.yaml`,"Defines a BGP peer in MetalLB, representing a BGP neighbor for dynamic routing.",No -Load Balancer,`community.yaml`,"Defines a MetalLB community, which groups one or more BGP communities under a named resource. Communities can be applied to BGP advertisements to control routing policies and change traffic routing.",Yes +Load Balancer,`community.yaml`,"Defines a MetalLB community, which groups one or more BGP communities under a named resource. Communities can be applied to BGP advertisements to control routing policies and change traffic routing.",No Load Balancer,`metallb.yaml`,Defines the MetalLB resource in the cluster.,No Load Balancer,`metallbNS.yaml`,Defines the metallb-system namespace in the cluster.,No Load Balancer,`metallbOperGroup.yaml`,Defines the Operator group for the MetalLB Operator.,No diff --git a/modules/telco-core-crs-node-configuration.adoc b/modules/telco-core-crs-node-configuration.adoc index c3a0fee4fad6..5136fe6d2bdb 100644 --- a/modules/telco-core-crs-node-configuration.adoc +++ b/modules/telco-core-crs-node-configuration.adoc @@ -1,7 +1,3 @@ -// Module included in the following assemblies: -// -// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc - // Module included in the following assemblies: // // * diff --git a/modules/telco-core-crs-resource-tuning.adoc b/modules/telco-core-crs-resource-tuning.adoc index 7e3b030e7fb0..c9840d8256dc 100644 --- a/modules/telco-core-crs-resource-tuning.adoc +++ b/modules/telco-core-crs-resource-tuning.adoc @@ -1,6 +1,6 @@ // Module included in the following assemblies: // -// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc +// * :_mod-docs-content-type: REFERENCE [id="resource-tuning-crs_{context}"] diff --git a/modules/telco-core-crs-scheduling.adoc b/modules/telco-core-crs-scheduling.adoc index eb089bb28ac1..b9bd7cde61e6 100644 --- a/modules/telco-core-crs-scheduling.adoc +++ b/modules/telco-core-crs-scheduling.adoc @@ -1,7 +1,3 @@ -// Module included in the following assemblies: -// -// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc - // Module included in the following assemblies: // // * diff --git a/modules/telco-core-crs-storage.adoc b/modules/telco-core-crs-storage.adoc index 41cb4f046d31..7945bba7caaf 100644 --- a/modules/telco-core-crs-storage.adoc +++ b/modules/telco-core-crs-storage.adoc @@ -1,6 +1,6 @@ // Module included in the following assemblies: // -// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc +// * :_mod-docs-content-type: REFERENCE [id="storage-crs_{context}"] @@ -10,8 +10,8 @@ [cols="4*", options="header", format=csv] |==== Component,Reference CR,Description,Optional -External ODF configuration,`01-rook-ceph-external-cluster-details.secret.yaml`,Defines a Secret resource containing base64-encoded configuration data for an external Ceph cluster in the openshift-storage namespace.,No +External ODF configuration,`01-rook-ceph-external-cluster-details.secret.yaml`,Defines a Secret resource containing base64-encoded configuration data for an external Ceph cluster in the `openshift-storage` namespace.,No External ODF configuration,`02-ocs-external-storagecluster.yaml`,Defines an OpenShift Container Storage (OCS) storage resource which configures the cluster to use an external storage back end.,No -External ODF configuration,`odfNS.yaml`,Creates the monitored openshift-storage namespace for the OpenShift Data Foundation Operator.,No -External ODF configuration,`odfOperGroup.yaml`,"Creates the Operator group in the openshift-storage namespace, allowing the {rh-storage} Operator to watch and manage resources.",No +External ODF configuration,`odfNS.yaml`,Creates the monitored `openshift-storage` namespace for the OpenShift Data Foundation Operator.,No +External ODF configuration,`odfOperGroup.yaml`,"Creates the Operator group in the `openshift-storage` namespace, allowing the OpenShift Data Foundation Operator to watch and manage resources.",No |==== diff --git a/modules/telco-core-deployment-planning.adoc b/modules/telco-core-deployment-planning.adoc index e7c196f4caeb..843fb2adf494 100644 --- a/modules/telco-core-deployment-planning.adoc +++ b/modules/telco-core-deployment-planning.adoc @@ -6,14 +6,15 @@ [id="telco-core-deployment-planning_{context}"] = Deployment planning -*Worker nodes and machine config pools* - `MachineConfigPools` (MCPs) custom resource (CR) enable the subdivision of worker nodes in telco core clusters into different node groups based on customer planning parameters. Careful deployment planning using MCPs is crucial to minimize deployment and upgrade time and, more importantly, to minimize interruption of telco-grade services during cluster upgrades. *Description* -Telco core clusters can use MCPs to split worker nodes into additional separate roles, for example, due to different hardware profiles. This allows custom tuning for each role and also plays a critical function in speeding up a telco core cluster deployment or upgrade. More importantly, multiple MCPs allow you to properly plan cluster upgrades across one or many maintenance windows. This is crucial because telco-grade services may otherwise be affected if careful planning is not considered. +Telco core clusters can use MachineConfigPools (MCPs) to split worker nodes into additional separate roles, for example, due to different hardware profiles. +This allows custom tuning for each role and also plays a critical function in speeding up a telco core cluster deployment or upgrade. +Multiple MCPs can be used to properly plan cluster upgrades across one or multiple maintenance windows. +This is crucial because telco-grade services might otherwise be affected if careful planning is not considered. During cluster upgrades, you can pause MCPs while you upgrade the control plane. See "Performing a canary rollout update" for more information. This ensures that worker nodes are not rebooted and running workloads remain unaffected until the MCP is unpaused. @@ -21,23 +22,28 @@ Using careful MCP planning, you can control the timing and order of which set of Before beginning the initial deployment, keep the following engineering considerations in mind regarding MCPs: -When using `PerformanceProfile` definitions, remember that each MCP must be linked to exactly one `PerformanceProfile` definition or tuned profile definition. -Consequently, even if the desired configuration is identical for multiple MCPs, each MCP still requires its own dedicated `PerformanceProfile` definition. +**PerformanceProfile and Tuned profile association:** + +When using PerformanceProfiles, remember that each Machine Config Pool (MCP) must be linked to exactly one PerformanceProfile or Tuned profile definition. +Consequently, even if the desired configuration is identical for multiple MCPs, each MCP still requires its own dedicated PerformanceProfile definition. + +**Planning your MCP labeling strategy:** -Plan your MCP labeling with an appropriate strategy to split your worker nodes depending on considerations such as: +Plan your MCP labeling with an appropriate strategy to split your worker nodes depending on parameters +such as: -* The worker node type: identifying a group of nodes with equivalent hardware profile, for example, workers for control plane Network Functions (NFs) and workers for user data plane NFs. +* The worker node type: identifying a group of nodes with equivalent hardware profile, for example workers for control plane Network Functions (NFs) and workers for user data plane NFs. * The number of worker nodes per worker node type. * The minimum number of MCPs required for an equivalent hardware profile is 1, but could be larger for larger clusters. - For example, you may design for more MCPs per hardware profile to support a more granular upgrade where a smaller percentage of the cluster capacity is affected with each step. -* The strategy for performing updates on nodes within an MCP is shaped by upgrade requirements and the chosen `maxUnavailable` value: +For example, you may design for more MCPs per hardware profile to support a more granular upgrade where a smaller percentage of the cluster capacity is affected with each step. +* The update strategy for nodes within an MCP is by upgrade requirements and the chosen `maxUnavailable` value: ** Number of maintenance windows allowed. ** Duration of a maintenance window. ** Total number of worker nodes. ** Desired `maxUnavailable` (number of nodes updated concurrently) for the MCP. -* CNF requirements for worker nodes, in terms of: +* CNF requirements for worker nodes, in terms of: ** Minimum availability per Pod required during an upgrade, configured with a pod disruption budget (PDB). PDBs are crucial to maintain telco service level Agreements (SLAs) during upgrades. For more information about PDB, see "Understanding how to use pod disruption budgets to specify the number of pods that must be up". ** Minimum true high availability required per Pod, such that each replica runs on separate hardware. ** Pod affinity and anti-affinity link: For more information about how to use pod affinity and anti-affinity, see "Placing pods relative to other pods using affinity and anti-affinity rules". -* Duration and frequency of upgrade maintenance windows during which telco-grade services may be affected. +* Duration and number of upgrade maintenance windows during which telco-grade services might be affected. diff --git a/modules/telco-core-disconnected-environment.adoc b/modules/telco-core-disconnected-environment.adoc index b362bee84919..3a98c4a27e59 100644 --- a/modules/telco-core-disconnected-environment.adoc +++ b/modules/telco-core-disconnected-environment.adoc @@ -9,19 +9,20 @@ New in this release:: * No reference design updates in this release. -Descrption:: +Description:: Telco core clusters are expected to be installed in networks without direct access to the internet. All container images needed to install, configure, and operate the cluster must be available in a disconnected registry. This includes {product-title} images, Day 2 OLM Operator images, and application workload images. The use of a disconnected environment provides multiple benefits, including: * Security - limiting access to the cluster -* Curated content – the registry is populated based on curated and approved updates for clusters +* Curated content - the registry is populated based on curated and approved updates for clusters Limits and requirements:: * A unique name is required for all custom `CatalogSource` resources. Do not reuse the default catalog names. Engineering considerations:: + * A valid time source must be configured as part of cluster installation diff --git a/modules/telco-core-kubelet-settings.adoc b/modules/telco-core-kubelet-settings.adoc new file mode 100644 index 000000000000..a18085a44f7b --- /dev/null +++ b/modules/telco-core-kubelet-settings.adoc @@ -0,0 +1,29 @@ +// Module included in the following assemblies: +// +// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc + +:_mod-docs-content-type: REFERENCE + +[id="telco-core-kubelet-settings_{context}"] += Kubelet Settings + +Some CNF workloads make use of sysctls which are not in the list of system-wide safe sysctls. Generally network sysctls are namespaced and can be enabled by using the `kubeletconfig.experimental` annotation in the PerformanceProfile as a string of JSON in the form `allowedUnsafeSysctls`. + +.Example snippet showing allowedUnsafeSysctls + +[source,yaml] +---- +apiVersion: performance.openshift.io/v2 +kind: PerformanceProfile +metadata: + name: {{ .metadata.name }} + annotations:kubeletconfig.experimental: | + {"allowedUnsafeSysctls":["net.ipv6.conf.all.accept_ra"]} +# ... +---- + +[NOTE] +==== +Although these are namespaced they may allow a pod to consume memory or other resources beyond any limits specified in the pod description. You must ensure that these sysctls do not exhaust platform resources. +==== + diff --git a/modules/telco-core-load-balancer.adoc b/modules/telco-core-load-balancer.adoc index 26ded6783742..01689511e832 100644 --- a/modules/telco-core-load-balancer.adoc +++ b/modules/telco-core-load-balancer.adoc @@ -7,7 +7,8 @@ = Load balancer New in this release:: -* No reference design updates in this release. + +* No reference design updates in this release. [IMPORTANT] ==== @@ -24,13 +25,13 @@ Selection and configuration of an external load balancer is outside the scope of When an external third-party load balancer is used, the integration effort must include enough analysis to ensure all performance and resource utilization requirements are met. Limits and requirements:: -* Stateful load balancing is not supported by MetalLB. -An alternate load balancer implementation must be used if this is a requirement for workload CNFs. +* Stateful load balancing is not supported by MetalLB. An alternate load balancer implementation must be used if this is a requirement for workload CNFs. * You must ensure that the external IP address is routable from clients to the host network for the cluster. Engineering considerations:: + * MetalLB is used in BGP mode only for telco core use models. -* For telco core use models, MetalLB is supported only when you set `routingViaHost=true` in the `ovnKubernetesConfig.gatewayConfig` specification of the OVN-Kubernetes network plugin. +* For telco core use models, MetalLB is supported only with the OVN-Kubernetes network provider used in local gateway mode. See `routingViaHost` in "Cluster Network Operator". * BGP configuration in MetalLB is expected to vary depending on the requirements of the network and peers. ** You can configure address pools with variations in addresses, aggregation length, auto assignment, and so on. diff --git a/modules/telco-core-logging.adoc b/modules/telco-core-logging.adoc index df043be1f98b..572188647bc6 100644 --- a/modules/telco-core-logging.adoc +++ b/modules/telco-core-logging.adoc @@ -7,6 +7,7 @@ = Logging New in this release:: + * No reference design updates in this release Description:: diff --git a/modules/telco-core-monitoring.adoc b/modules/telco-core-monitoring.adoc index ebe320a558a1..0792971fa70d 100644 --- a/modules/telco-core-monitoring.adoc +++ b/modules/telco-core-monitoring.adoc @@ -10,25 +10,20 @@ New in this release:: * No reference design updates in this release. Description:: -+ --- The Cluster Monitoring Operator (CMO) is included by default in {product-title} and provides monitoring (metrics, dashboards, and alerting) for the platform components and optionally user projects. - You can customize the default log retention period, custom alert rules, and so on. ++ +Configuration of the monitoring stack is done through a single string value in the cluster-monitoring-config ConfigMap. The reference tuning tuning merges content from two requirements: -The default handling of pod CPU and memory metrics, based on upstream Kubernetes and cAdvisor, makes a tradeoff favoring stale data over metric accuracy. -This leads to spikes in reporting, which can create false alerts, depending on the user-specified thresholds. - -{product-title} supports an opt-in Dedicated Service Monitor feature that creates an additional set of pod CPU and memory metrics which do not suffer from this behavior. -For more information, see the Red Hat Knowledgebase solution link:https://access.redhat.com/solutions/7012719[Dedicated Service Monitors - Questions and Answers]. - +* Prometheus configuration is extended to forward alerts to the ACM hub cluster for alert aggregation. +If desired this configuration can be extended to forward to additional locations. +* Prometheus retention period is reduced from the default. +The primary metrics storage is expected to be external to the cluster. +Metrics storage on the Core cluster is expected to be a backup to that central store and available for local troubleshooting purposes. ++ In addition to the default configuration, the following metrics are expected to be configured for telco core clusters: * Pod CPU and memory metrics and alerts for user workloads --- - -Limits and requirements:: -* You must enable the Dedicated Service Monitor feature for accurate representation of pod metrics. Engineering considerations:: * The Prometheus retention period is specified by the user. diff --git a/modules/telco-core-networking.adoc b/modules/telco-core-networking.adoc index acb603f8eb98..de18ad0cde7a 100644 --- a/modules/telco-core-networking.adoc +++ b/modules/telco-core-networking.adoc @@ -13,8 +13,7 @@ image::openshift-telco-core-rds-networking.png[Overview of the telco core refere New in this release:: -* Extend telco core validation with pod-level bonding. -* Support moving failed policy in resource injector to failed for SR-IOV operator. +* No reference design updates in this release [NOTE] ==== @@ -22,8 +21,7 @@ If you have custom `FRRConfiguration` CRs in the `metallb-system` namespace, you ==== Description:: -+ --- + * The cluster is configured for dual-stack IP (IPv4 and IPv6). * The validated physical network configuration consists of two dual-port NICs. One NIC is shared among the primary CNI (OVN-Kubernetes) and IPVLAN and MACVLAN traffic, while the second one is dedicated to SR-IOV VF-based pod traffic. @@ -37,14 +35,14 @@ They do not share the same base interface. For more information, see "Cluster Network Operator". * SR-IOV VFs are managed by the SR-IOV Network Operator. * To ensure consistent source IP addresses for pods behind a LoadBalancer Service, configure an `EgressIP` CR and specify the `podSelector` parameter. +EgressIP is further discussed in the "Cluster Network Operator" section. * You can implement service traffic separation by doing the following: .. Configure VLAN interfaces and specific kernel IP routes on the nodes using `NodeNetworkConfigurationPolicy` CRs. .. Create a MetalLB `BGPPeer` CR for each VLAN to establish peering with the remote BGP router. -.. Define a MetalLB `BGPAdvertisement` CR to specify which IP address pools should be advertised to a selected list of `BGPPeer` resources. -The following diagram illustrates how specific service IP addresses are advertised to the outside via specific VLAN interfaces. -Services routes are defined in `BGPAdvertisement` CRs and configured with values for `IPAddressPool1` and `BGPPeer1` fields. --- +.. Define a MetalLB `BGPAdvertisement` CR to specify which IP address pools should be advertised to a selected list of `BGPPeer` resources. The following diagram illustrates how specific service IP addresses are advertised externally through specific VLAN interfaces. Services routes are defined in `BGPAdvertisement` CRs and configured with values for `IPAddressPool1` and `BGPPeer1` fields. + .Telco core reference design MetalLB service separation image::openshift-telco-core-rds-metallb-service-separation.png[Telco core reference design MetalLB service separation] + diff --git a/modules/telco-core-node-configuration.adoc b/modules/telco-core-node-configuration.adoc index 3e4e66fe66da..a517b3cc7f1b 100644 --- a/modules/telco-core-node-configuration.adoc +++ b/modules/telco-core-node-configuration.adoc @@ -12,6 +12,7 @@ New in this release:: Limits and requirements:: * Analyze additional kernel modules to determine impact on CPU load, system performance, and ability to meet KPIs. + +-- .Additional kernel modules |==== |Feature|Description @@ -21,20 +22,20 @@ a|Install the following kernel modules by using `MachineConfig` CRs to provide e * sctp * ip_gre -* ip6_tables -* ip6t_REJECT -* ip6table_filter -* ip6table_mangle -* iptable_filter -* iptable_mangle -* iptable_nat -* xt_multiport -* xt_owner -* xt_REDIRECT -* xt_statistic -* xt_TCPMSS +* nf_tables +* nf_conntrack +* nft_ct +* nft_limit +* nft_log +* nft_nat +* nft_chain_nat +* nf_reject_ipv4 +* nf_reject_ipv6 +* nfnetlink_log |Container mount namespace hiding|Reduce the frequency of kubelet housekeeping and eviction monitoring to reduce CPU usage. Creates a container mount namespace, visible to kubelet/CRI-O, to reduce system mount scanning overhead. |Kdump enable|Optional configuration (enabled by default) |==== +-- + diff --git a/modules/telco-core-openshift-data-foundation.adoc b/modules/telco-core-openshift-data-foundation.adoc index 121aae4c9684..e71e66bec093 100644 --- a/modules/telco-core-openshift-data-foundation.adoc +++ b/modules/telco-core-openshift-data-foundation.adoc @@ -7,11 +7,10 @@ = {rh-storage} New in this release:: -* Clarification on internal compared to external mode and RDS recommendations. + +* No reference design updates in this release. Description:: -+ --- {rh-storage} is a software-defined storage service for containers. {rh-storage} can be deployed in one of two modes: * Internal mode, where {rh-storage} software components are deployed as software containers directly on the {product-title} cluster nodes, together with other containerized applications. @@ -25,7 +24,6 @@ For telco core clusters, storage support is provided by {rh-storage} storage ser * External Red Hat Ceph Storage clusters can be re-used by multiple {product-title} clusters deployed in the same region. {rh-storage} supports separation of storage traffic using secondary CNI networks. --- Limits and requirements:: * In an IPv4/IPv6 dual-stack networking environment, {rh-storage} uses IPv4 addressing. @@ -33,5 +31,5 @@ For more information, see link:https://docs.redhat.com/en/documentation/red_hat_ Engineering considerations:: * {rh-storage} network traffic should be isolated from other traffic on a dedicated network, for example, by using VLAN isolation. -* Workload requirements must be scoped before attaching multiple {product-title} clusters to an external {rh-storage} cluster to ensure sufficient throughput, bandwidth, and performance KPIs. +* Workload requirements must be scoped before attaching multiple {product-title} clusters to an external {rh-storage} cluster to ensure enough throughput, bandwidth, and performance KPIs. diff --git a/modules/telco-core-power-management.adoc b/modules/telco-core-power-management.adoc index 6a0db90f5243..28f86f611d20 100644 --- a/modules/telco-core-power-management.adoc +++ b/modules/telco-core-power-management.adoc @@ -7,6 +7,7 @@ = Power Management New in this release:: + * No reference design updates in this release Description:: @@ -15,10 +16,12 @@ The choice of power mode depends on the characteristics of the workloads running Configure the maximum latency for a low-latency pod by using the per-pod power management C-states feature. Limits and requirements:: + * Power configuration relies on appropriate BIOS configuration, for example, enabling C-states and P-states. Configuration varies between hardware vendors. Engineering considerations:: + * Latency: To ensure that latency-sensitive workloads meet requirements, you require a high-power or a per-pod power management configuration. Per-pod power management is only available for Guaranteed QoS pods with dedicated pinned CPUs. diff --git a/modules/telco-core-rds-product-version-use-model-overview.adoc b/modules/telco-core-rds-product-version-use-model-overview.adoc index cb4ae500832f..1ef72a2b3da3 100644 --- a/modules/telco-core-rds-product-version-use-model-overview.adoc +++ b/modules/telco-core-rds-product-version-use-model-overview.adoc @@ -8,4 +8,6 @@ The Telco core reference design specification (RDS) describes a platform that supports large-scale telco applications including control plane functions such as signaling and aggregation. It also includes some centralized data plane functions, for example, user plane functions (UPF). -These functions generally require scalability, complex networking support, resilient software-defined storage, and support performance requirements that are less stringent and constrained than far-edge deployments such as RAN. \ No newline at end of file +These functions generally require scalability, complex networking support, resilient software-defined storage, and support performance requirements that are less stringent and constrained than far-edge deployments such as RAN. + + diff --git a/modules/telco-core-scheduling.adoc b/modules/telco-core-scheduling.adoc index 3795707c7039..9f98a5fe3ea3 100644 --- a/modules/telco-core-scheduling.adoc +++ b/modules/telco-core-scheduling.adoc @@ -10,26 +10,23 @@ New in this release:: * No reference design updates in this release. Description:: -+ --- The scheduler is a cluster-wide component responsible for selecting the right node for a given workload. It is a core part of the platform and does not require any specific configuration in the common deployment scenarios. However, there are few specific use cases described in the following section. - ++ NUMA-aware scheduling can be enabled through the NUMA Resources Operator. For more information, see "Scheduling NUMA-aware workloads". --- Limits and requirements:: * The default scheduler does not understand the NUMA locality of workloads. It only knows about the sum of all free resources on a worker node. -This might cause workloads to be rejected when scheduled to a node with the topology manager policy set to single-numa-node or restricted. For more information, see "Topology Manager policies". -+ -For example, consider a pod requesting 6 CPUs and being scheduled to an empty node that has 4 CPUs per NUMA node. +This might cause workloads to be rejected when scheduled to a node with the topology manager policy set to single-numa-node or restricted. +For more information, see "Topology Manager policies".. +** For example, consider a pod requesting 6 CPUs and being scheduled to an empty node that has 4 CPUs per NUMA node. The total allocatable capacity of the node is 8 CPUs. The scheduler places the pod on the empty node. The node local admission fails, as there are only 4 CPUs available in each of the NUMA nodes. - -* All clusters with multi-NUMA nodes are required to use the NUMA Resources Operator. See "Installing the NUMA Resources Operator" for more information. +* All clusters with multi-NUMA nodes are required to use the NUMA Resources Operator. +See "Installing the NUMA Resources Operator" for more information. Use the `machineConfigPoolSelector` field in the `KubeletConfig` CR to select all nodes where NUMA aligned scheduling is required. * All machine config pools must have consistent hardware configuration. For example, all nodes are expected to have the same NUMA zone count. diff --git a/modules/telco-core-security.adoc b/modules/telco-core-security.adoc index 0afbcb873bb5..2a04c075ccb9 100644 --- a/modules/telco-core-security.adoc +++ b/modules/telco-core-security.adoc @@ -16,25 +16,26 @@ Telco customers are security conscious and require clusters to be hardened again In {product-title}, there is no single component or feature responsible for securing a cluster. Described below are various security oriented features and configurations for the use models covered in the telco core RDS. -* **SecurityContextConstraints (SCC)**: All workload pods should be run with `restricted-v2` or `restricted` SCC. -* **Seccomp**: All pods should run with the `RuntimeDefault` (or stronger) seccomp profile. -* **Rootless DPDK pods**: Many user-plane networking (DPDK) CNFs require pods to run with root privileges. +* SecurityContextConstraints (SCC): All workload pods should be run with `restricted-v2` or `restricted` SCC. +* Seccomp: All pods should run with the `RuntimeDefault` (or stronger) seccomp profile. +* Rootless DPDK pods: Many user-plane networking (DPDK) CNFs require pods to run with root privileges. With this feature, a conformant DPDK pod can be run without requiring root privileges. Rootless DPDK pods create a tap device in a rootless pod that injects traffic from a DPDK application to the kernel. -* **Storage**: The storage network should be isolated and non-routable to other cluster networks. +* Storage: The storage network should be isolated and non-routable to other cluster networks. See the "Storage" section for additional details. -See the Red Hat Knowledgebase solution article link:https://access.redhat.com/articles/7090422[Custom nftable firewall rules in OpenShift] for a supported method for implementing custom nftables firewall rules in {product-title} cluster nodes. This article is intended for cluster administrators who are responsible for managing network security policies in {product-title} environments. +See the Red Hat Knowledgebase solution article link:https://access.redhat.com/articles/7090422[Custom nftable firewall rules in {product-title}] for a supported method for implementing custom nftables firewall rules in {product-title} cluster nodes. This article is intended for cluster administrators who are responsible for managing network security policies in {product-title} environments. It is crucial to carefully consider the operational implications before deploying this method, including: -* **Early application**: The rules are applied at boot time, before the network is fully operational. +* Early application: The rules are applied at boot time, before the network is fully operational. Ensure the rules don't inadvertently block essential services required during the boot process. -* **Risk of misconfiguration**: Errors in your custom rules can lead to unintended consequences, potentially leading to performance impact or blocking legitimate traffic or isolating nodes. +* Risk of misconfiguration: Errors in your custom rules can lead to unintended consequences, potentially leading to performance impact or blocking legitimate traffic or isolating nodes. Thoroughly test your rules in a non-production environment before deploying them to your main cluster. -* **External endpoints**: {product-title} requires access to external endpoints to function. -For more information about the firewall allowlist, see "Configuring your firewall for {product-title}". Ensure that cluster nodes are permitted access to those endpoints. -* **Node reboot**: Unless node disruption policies are configured, applying the `MachineConfig` CR with the required firewall settings causes a node reboot. +* External endpoints: {product-title} requires access to external endpoints to function. +For more information about the firewall allowlist, see "Configuring your firewall for {product-title}". Ensure that cluster nodes are permitted access to those endpoints. Ensure that cluster nodes are permitted access to those endpoints. + +* Node reboot: Unless node disruption policies are configured, applying the `MachineConfig` CR with the required firewall settings causes a node reboot. Be aware of this impact and schedule a maintenance window accordingly. For more information, see "Using node disruption policies to minimize disruption from machine config changes". + [NOTE] @@ -42,11 +43,11 @@ Be aware of this impact and schedule a maintenance window accordingly. For more Node disruption policies are available in {product-title} 4.17 and later. ==== -* **Network flow matrix**: For more information about managing ingress traffic, see {product-title} network flow matrix. +* Network flow matrix: For more information about managing ingress traffic, see {product-title} network flow matrix. You can restrict ingress traffic to essential flows to improve network security. The matrix provides insights into base cluster services but excludes traffic generated by Day-2 Operators. -* **Cluster version updates and upgrades**: Exercise caution when updating or upgrading {product-title} clusters. +* Cluster version updates and upgrades: Exercise caution when updating or upgrading {product-title} clusters. Recent changes to the platform's firewall requirements might require adjustments to network port permissions. While the documentation provides guidelines, note that these requirements can evolve over time. To minimize disruptions, you should test any updates or upgrades in a staging environment before applying them in production. @@ -56,7 +57,7 @@ This helps you to identify and address potential compatibility issues related to Limits and requirements:: * Rootless DPDK pods requires the following additional configuration: ** Configure the `container_t` SELinux context for the tap plugin. -** Enable the `container_use_devices` SELinux boolean for the cluster host +** Enable the `container_use_devices` SELinux boolean for the cluster host. Engineering considerations:: * For rootless DPDK pod support, enable the SELinux `container_use_devices` boolean on the host to allow the tap device to be created. diff --git a/modules/telco-core-service-mesh.adoc b/modules/telco-core-service-mesh.adoc index 8a8cc2e12c15..70003df51861 100644 --- a/modules/telco-core-service-mesh.adoc +++ b/modules/telco-core-service-mesh.adoc @@ -4,11 +4,11 @@ :_mod-docs-content-type: REFERENCE [id="telco-core-service-mesh_{context}"] -= Service mesh += Service Mesh Description:: -Telco core cloud-native functions (CNFs) typically require a service mesh implementation. -Specific service mesh features and performance requirements are dependent on the application. -The selection of service mesh implementation and configuration is outside the scope of this documentation. -You must account for the impact of service mesh on cluster resource usage and performance, including additional latency introduced in pod networking, in your implementation. +Telco core cloud-native functions (CNFs) typically require a Service Mesh implementation. +Specific Service Mesh features and performance requirements are dependent on the application. +The selection of Service Mesh implementation and configuration is outside the scope of this documentation. +The implementation must account for the impact of Service Mesh on cluster resource usage and performance, including additional latency introduced in pod networking. diff --git a/modules/telco-core-signaling-workloads.adoc b/modules/telco-core-signaling-workloads.adoc index 13f286b2277b..0f123c5140f5 100644 --- a/modules/telco-core-signaling-workloads.adoc +++ b/modules/telco-core-signaling-workloads.adoc @@ -5,7 +5,9 @@ :_mod-docs-content-type: REFERENCE [id="telco-core-signaling-workloads_{context}"] = Signaling workloads - -Signaling workloads typically use SCTP, REST, gRPC or similar TCP or UDP protocols. +Signaling workloads typically use SCTP, REST, gRPC, or similar TCP or UDP protocols. Signaling workloads support hundreds of thousands of transactions per second (TPS) by using a secondary multus CNI configured as MACVLAN or SR-IOV interface. -These workloads can run in pods with either guaranteed or burstable QoS. \ No newline at end of file +These workloads can run in pods with either guaranteed or burstable QoS. + + + diff --git a/modules/telco-core-software-stack.adoc b/modules/telco-core-software-stack.adoc index e2432e2121b5..d607e2a53755 100644 --- a/modules/telco-core-software-stack.adoc +++ b/modules/telco-core-software-stack.adoc @@ -19,10 +19,10 @@ The Red{nbsp}Hat telco core {product-version} solution has been validated using |Component |Software version |{rh-rhacm-first} -|2.13 +|2.14 |{gitops-title} -|1.16 +|1.18 |Cluster Logging Operator |6.2 @@ -31,14 +31,17 @@ The Red{nbsp}Hat telco core {product-version} solution has been validated using |4.19 |SR-IOV Network Operator -|4.19 +|4.20 |MetalLB -|4.19 +|4.20 |NMState Operator -|4.19 +|4.20 |NUMA-aware scheduler -|4.19 -|==== \ No newline at end of file +|4.20 +|==== + +* {rh-rhacm-first} will be updated to 2.15 when the aligned {rh-rhacm-first} version is released. +* {rh-storage} will be updated to 4.20 when the aligned {rh-storage} version (4.20) is released. diff --git a/modules/telco-core-sr-iov.adoc b/modules/telco-core-sr-iov.adoc index f66e7c2db337..df8801d23082 100644 --- a/modules/telco-core-sr-iov.adoc +++ b/modules/telco-core-sr-iov.adoc @@ -7,7 +7,7 @@ = SR-IOV New in this release:: -* Support moving failed policy in resource injector to failed for SR-IOV operator +* No reference design updates in this release. Description:: SR-IOV enables physical functions (PFs) to be divided into multiple virtual functions (VFs). @@ -30,7 +30,7 @@ Engineering considerations:: * The `SriovOperatorConfig` CR must be explicitly created. This CR is included in the reference configuration policies, which causes it to be created during initial deployment. * NICs that do not support firmware updates with UEFI secure boot or kernel lockdown must be preconfigured with sufficient virtual functions (VFs) enabled to support the number of VFs required by the application workload. -For Mellanox NICs, you must disable the Mellanox vendor plugin in the SR-IOV Network Operator. For more information see, "Configuring an SR-IOV network device". +For Mellanox NICs, you must disable the Mellanox vendor plugin in the SR-IOV Network Operator. For more information see, "Configuring an SR-IOV network device". * To change the MTU value of a VF after the pod has started, do not configure the `SriovNetworkNodePolicy` MTU field. Instead, use the Kubernetes NMState Operator to set the MTU of the related PF. diff --git a/modules/telco-core-storage.adoc b/modules/telco-core-storage.adoc index af85f6d2a4f6..263e76be6800 100644 --- a/modules/telco-core-storage.adoc +++ b/modules/telco-core-storage.adoc @@ -7,6 +7,7 @@ = Storage New in this release:: + * No reference design updates in this release Description:: @@ -26,3 +27,4 @@ The storage network must not be reachable, or routable, from other cluster netwo Only nodes directly attached to the storage network should be allowed to gain access to it. ==== -- + diff --git a/modules/telco-core-topology-aware-lifecycle-manager.adoc b/modules/telco-core-topology-aware-lifecycle-manager.adoc index 0cc62899a8c7..d002ac239a5c 100644 --- a/modules/telco-core-topology-aware-lifecycle-manager.adoc +++ b/modules/telco-core-topology-aware-lifecycle-manager.adoc @@ -10,7 +10,7 @@ New in this release:: * No reference design updates in this release. Description:: -{cgu-operator} is an Operator which runs only on the hub cluster. +{cgu-operator} is an Operator that runs only on the hub cluster. {cgu-operator} manages how changes including cluster and Operator upgrades, configurations, and so on, are rolled out to managed clusters in the network. {cgu-operator} has the following core features: * Provides sequenced updates of cluster configurations and upgrades ({product-title} and Operators) as defined by cluster policies. @@ -24,4 +24,15 @@ Limits and requirements:: Engineering considerations:: * Only policies with the `ran.openshift.io/ztp-deploy-wave` annotation are applied by {cgu-operator} during initial cluster installation. * Any policy can be remediated by {cgu-operator} under control of a user created `ClusterGroupUpgrade` CR. +* Set the `MachineConfigPool` (`mcp`) CR `paused` field to true during a cluster upgrade maintenance window and set the `maxUnavailable` field to the maximum tolerable value. +This prevents multiple cluster node reboots during upgrade, which results in a shorter overall upgrade. +When you unpause the `mcp` CR, all the configuration changes are applied with a single reboot. ++ +[NOTE] +==== +During installation, custom `mcp` CRs can be paused along with setting `maxUnavailable` to 100% to improve installation times. +==== +* Orchestration of an upgrade, including {product-title}, day-2 OLM operators and custom configuration can be done using a `ClusterGroupUpgrade` (CGU) CR containing policies describing these updates. +** An EUS to EUS upgrade can be orchestrated using chained CGU CRs +** Control of MCP pause can be managed through policy in the CGU CRs for a full control plane and worker node rollout of upgrades. diff --git a/modules/telco-core-workloads-on-schedulable-control-planes.adoc b/modules/telco-core-workloads-on-schedulable-control-planes.adoc new file mode 100644 index 000000000000..bded99594988 --- /dev/null +++ b/modules/telco-core-workloads-on-schedulable-control-planes.adoc @@ -0,0 +1,36 @@ +// Module included in the following assemblies: +// +// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc + +:_mod-docs-content-type: REFERENCE + +[id="telco-core-workloads-on-schedulable-control-planes_{context}"] += Workloads on schedulable control planes + +Enabling workloads on control plane nodes:: + +You can enable schedulable control planes to run workloads on control plane nodes, utilizing idle CPU capacity on bare-metal machines for potential cost savings. This feature is only applicable to clusters with bare-metal control plane nodes. ++ +There are two distinct parts to this functionality: + +. Allowing workloads on control plane nodes: This feature can be configured after initial cluster installation, allowing you to enable it when you need to run workloads on those nodes. +. Enabling workload partitioning: This is a critical isolation measure that protects the control plane from interference by regular workloads, ensuring cluster stability and reliability. Workload partitioning must be configured during the initial "day zero" cluster installation and cannot be enabled later. + +If you plan to run workloads on your control plane nodes, you must first enable workload partitioning during the initial setup. You can then enable the schedulable control plane feature at a later time. + +Workload characterization and limitations:: + +You must test and verify workloads to ensure that applications do not interfere with core cluster functions. It is recommended that you start with lightweight containers that do not heavily load the CPU or networking. ++ +Certain workloads are not permitted on control plane nodes due to the risk to cluster stability. This includes any workload that reconfigures kernel arguments or system global sysctls, as this can lead to unpredictable outcomes for the cluster. ++ +To ensure stability, you must adhere to the following: + +* Make sure all non-trivial workloads have memory limits defined. This protects the control plane in case of a memory leak. +* Avoid excessively loading reserved CPUs, for example, by heavy use of exec probes. +* Avoid heavy kernel-based networking usage, as it can increase reserved CPU load through software networking components such as OVS. + +NUMA Resources Operator support:: + +The NUMA Resources Operator is supported for use on control plane nodes. Functional behavior of the Operator remains unchanged. + diff --git a/modules/telco-core-zones.adoc b/modules/telco-core-zones.adoc new file mode 100644 index 000000000000..b1ae961f9b2d --- /dev/null +++ b/modules/telco-core-zones.adoc @@ -0,0 +1,39 @@ +// Module included in the following assemblies: +// +// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc + +:_mod-docs-content-type: REFERENCE + +[id="telco-core-zones_{context}"] += Zones + +Designing the cluster to support disruption of multiple nodes simultaneously is critical for high availability (HA) and reduced upgrade times. +{product-title} and Kubernetes use the well known label `topology.kubernetes.io/zone` to create pools of nodes that are subject to a common failure domain. +Annotating nodes for topology (availability) zones allows high-availability workloads to spread such that each zone holds only one replica from a set of HA replicated pods. +With this spread the loss of a single zone will not violate HA constraints and minimum service availability will be maintained. +{product-title} and Kubernetes applies a default `TopologySpreadConstraint` to all replica constructs (`Service`, `ReplicaSet`, `StatefulSet` or `ReplicationController`) that spreads the replicas based on the `topology.kubernetes.io/zone` label. +This default allows zone based spread to apply without any change to your workload pod specs. + +Cluster upgrades typically result in node disruption as the underlying OS is updated. +In large clusters it is necessary to update multiple nodes concurrently to complete upgrades quickly and in as few maintenance windows as possible. +By using zones to ensure pod spread, an upgrade can be applied to all nodes in a zone simultaneously (assuming sufficient spare capacity) while maintaining high availability and service availability. +The recommended cluster design is to partition nodes into multiple MCPs based on the considerations earlier and label all nodes in a single MCP as a single zone which is distinct from zones attached to other MCPs. +Using this strategy all nodes in an MCP can be updated simultaneously. + +Lifecycle hooks (readiness, liveness, startup and pre-stop) play an important role in ensuring application availability. For upgrades in particular the pre-stop hook allows applications to take necessary steps to prepare for disruption before being evicted from the node. + +Limits and requirements:: +* The default TopologySpreadConstraints (TSC) only apply when an explicit TSC is not given. If your pods have explicit TSC ensure that spread based on zones is included. +* The cluster must have sufficient spare capacity to tolerate simultaneous update of an MCP. Otherwise the `maxUnavailable` of the MCP must be set to less than 100%. +* The ability to update all nodes in an MCP simultaneously further depends on workload design and ability to maintain required service levels with that level of disruption. + +Engineering Considerations:: +* Pod drain times can significantly impact node update times. Ensure the workload design allows pods to be drained quickly. +* PodDisruptionBudgets (PDB) are used to enforce high availability requirements. +** To guarantee continuous application availability, a cluster design must use enough separate zones to spread the workload's pods. +*** If pods are spread across sufficient zones, the loss of one zone won't take down more pods than permitted by the Pod Disruption Budget (PDB). +*** If pods are not adequately distributed—either due to too few zones or restrictive scheduling constraints—a zone failure will violate the PDB, causing an outage. +*** Furthermore, this poor distribution can force upgrades that typically run in parallel to execute slowly and sequentially (partial serialization) to avoid violating the PDB, significantly extending maintenance time. +** PDB with 0 disruptable pods will block node drain and require administrator intervention. This pattern should be avoided for fast and automated upgrades. + + diff --git a/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.adoc b/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.adoc index 25dc1f4163d5..68f33e710dfe 100644 --- a/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.adoc +++ b/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.adoc @@ -13,7 +13,7 @@ As a cluster administrator, you can configure the OVN-Kubernetes Container Netwo include::modules/nw-egress-ips-about.adoc[leveloffset=+1] ifndef::openshift-rosa[] -// Considerations for using an egress IP address on additional network interfaces +// Considerations for using an egress IP address on additional network interfaces include::modules/nw-egress-ips-multi-nic-considerations.adoc[leveloffset=+2] endif::openshift-rosa[] diff --git a/scalability_and_performance/telco-core-rds.adoc b/scalability_and_performance/telco-core-rds.adoc index 7fa43ed7e0d3..14012b9906f2 100644 --- a/scalability_and_performance/telco-core-rds.adoc +++ b/scalability_and_performance/telco-core-rds.adoc @@ -21,6 +21,8 @@ include::modules/telco-core-common-baseline-model.adoc[leveloffset=+1] include::modules/telco-core-deployment-planning.adoc[leveloffset=+1] +include::modules/telco-core-zones.adoc[leveloffset=+1] + [role="_additional-resources"] .Additional resources @@ -45,6 +47,7 @@ The following sections describe the various {product-title} components and confi include::modules/telco-core-cpu-partitioning-and-performance-tuning.adoc[leveloffset=+2] +include::modules/telco-core-workloads-on-schedulable-control-planes.adoc[leveloffset=+2] [role="_additional-resources"] .Additional resources @@ -73,6 +76,7 @@ include::modules/telco-core-cluster-network-operator.adoc[leveloffset=+3] .Additional resources * xref:../networking/networking_operators/cluster-network-operator.adoc#nw-cluster-network-operator_cluster-network-operator[Cluster Network Operator] +* xref:../networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.adoc#configuring-egress-ips-ovn[Configuring an egress IP address] include::modules/telco-core-load-balancer.adoc[leveloffset=+3] @@ -187,6 +191,8 @@ include::modules/telco-core-node-configuration.adoc[leveloffset=+2] include::modules/telco-core-host-firmware-and-boot-loader-configuration.adoc[leveloffset=+2] +include::modules/telco-core-kubelet-settings.adoc[leveloffset=+2] + include::modules/telco-core-disconnected-environment.adoc[leveloffset=+2] [role="_additional-resources"] @@ -194,6 +200,7 @@ include::modules/telco-core-disconnected-environment.adoc[leveloffset=+2] * xref:../disconnected/updating/index.adoc#about-disconnected-updates[About cluster updates in a disconnected environment] +* xref:../nodes/containers/nodes-containers-sysctls.adoc#nodes-containers-sysctls[Using sysctl in containers] include::modules/telco-core-agent-based-installer.adoc[leveloffset=+2]