introduce profiles adding SingleReplica (SNO) topology on AWS #24

mtulio · 2023-02-06T04:18:26Z

Introduce to SingleReplica (SNO) deployment in AWS with a cheaper instance (m6id.xlarge 16GiB RAM, 4vCPU) using ultra-higher disk performance to increase the stability of API, decreasing the resource usage of disk-intensive components, like monitoring stack.

The containers with no persistent storage (ephemeral) may lose the data when the EC2 is stopped (optimized to the dev environment).

The other improvement on the disk layout is to use a dedicated volume for ETCD and increase its stability as it's not sharing IOPS with system and cluster components.

Next step to decrease the costs:

Support Spot instances to decrease to 1/3 the compute costs: Support Spot Instances on AWS provider #25

mtulio · 2023-02-07T03:48:21Z

A couple of metrics indicate the low-disk perfoamance and the impact on the API.

API target group health metric:

Single Node EBS when running single/shared disk:

mtulio · 2023-02-07T03:56:35Z

Performance after spliting the disks into different mount points:

$ tail -n 3 sno-run-etcdfio-*.txt
==> sno-run-etcdfio-root.txt <==
--------------------------------------------------------------------------------------------------------------------------------------------------------
INFO: 99th percentile of fsync is 3948544 ns
INFO: 99th percentile of the fsync is within the recommended threshold: - 20 ms, the disk can be used to host etcd

==> sno-run-etcdfio-varlibcontainers.txt <==
--------------------------------------------------------------------------------------------------------------------------------------------------------
INFO: 99th percentile of fsync is 114176 ns
INFO: 99th percentile of the fsync is within the recommended threshold: - 20 ms, the disk can be used to host etcd

==> sno-run-etcdfio-varlibetcd.txt <==
--------------------------------------------------------------------------------------------------------------------------------------------------------
INFO: 99th percentile of fsync is 5996544 ns
INFO: 99th percentile of the fsync is within the recommended threshold: - 20 ms, the disk can be used to host etcd

Top pods before

$ head -n5 ~/opct/results/opct-sno-aws/info-top-pods-before.txt 
NAMESPACE                                          NAME                                                                    CPU(cores)   MEMORY(bytes)   
openshift-kube-apiserver                           kube-apiserver-ip-10-0-51-31                                            199m         2094Mi          
openshift-monitoring                               prometheus-k8s-0                                                        106m         1900Mi          
openshift-etcd                                     etcd-ip-10-0-51-31                                                      99m          505Mi           
openshift-kube-controller-manager                  kube-controller-manager-ip-10-0-51-31                                   17m          490Mi

Top pods after (Prometheus dropping the utilization)

$ head -n5 ~/opct/results/opct-sno-aws/sno2-info-top-pods-after.txt
NAMESPACE                                          NAME                                                         CPU(cores)   MEMORY(bytes)   
openshift-kube-apiserver                           kube-apiserver-ip-10-0-49-97                                 222m         1802Mi          
openshift-monitoring                               prometheus-k8s-0                                             39m          865Mi           
openshift-etcd                                     etcd-ip-10-0-49-97                                           70m          400Mi           
openshift-kube-controller-manager                  kube-controller-manager-ip-10-0-49-97                        12m          314Mi

roles/config/defaults/main.yaml

roles/bootstrap/defaults/main.yaml

playbooks/create_node.yaml

mtulio added 4 commits February 4, 2023 02:55

supporting stacks to create SNO node and resources

606ddaf

feat: intro profiles with sno/SingleReplica

f43e9e5

creating a working SNO with profiles

2c1ae43

doc: add sno install steps

61466b8

mtulio changed the title ~~introduce profiles adding SNO~~ introduce profiles adding SingleReplica (SNO) topology on AWS Feb 6, 2023

doc: deployment guide

d0bd957

mtulio mentioned this pull request Feb 7, 2023

Support Spot Instances on AWS provider #25

Open

mtulio marked this pull request as ready for review February 7, 2023 03:48

mtulio added 2 commits February 7, 2023 01:06

fix: rename from topology vars to cluster_profile

7d88849

doc: add disk layout

4b41767

mtulio commented Feb 7, 2023

View reviewed changes

roles/config/defaults/main.yaml Outdated Show resolved Hide resolved

roles/bootstrap/defaults/main.yaml Outdated Show resolved Hide resolved

playbooks/create_node.yaml Outdated Show resolved Hide resolved

chore: remove unused comments

8219345

mtulio added profile/sno profile/ha provider/aws labels Feb 8, 2023

mtulio merged commit 55271eb into main Feb 8, 2023

mtulio deleted the sno-aws branch February 8, 2023 02:55

mtulio mentioned this pull request Apr 4, 2023

Create Documentation for Single Node Installation (non-assisted installer) okd-project/planning#28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

introduce profiles adding SingleReplica (SNO) topology on AWS #24

introduce profiles adding SingleReplica (SNO) topology on AWS #24

mtulio commented Feb 6, 2023 •

edited

Loading

mtulio commented Feb 7, 2023

mtulio commented Feb 7, 2023 •

edited

Loading

introduce profiles adding SingleReplica (SNO) topology on AWS #24

introduce profiles adding SingleReplica (SNO) topology on AWS #24

Conversation

mtulio commented Feb 6, 2023 • edited Loading

mtulio commented Feb 7, 2023

mtulio commented Feb 7, 2023 • edited Loading

mtulio commented Feb 6, 2023 •

edited

Loading

mtulio commented Feb 7, 2023 •

edited

Loading