Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce profiles adding SingleReplica (SNO) topology on AWS #24

Merged
merged 8 commits into from
Feb 8, 2023

Conversation

mtulio
Copy link
Owner

@mtulio mtulio commented Feb 6, 2023

Introduce to SingleReplica (SNO) deployment in AWS with a cheaper instance (m6id.xlarge 16GiB RAM, 4vCPU) using ultra-higher disk performance to increase the stability of API, decreasing the resource usage of disk-intensive components, like monitoring stack.

The containers with no persistent storage (ephemeral) may lose the data when the EC2 is stopped (optimized to the dev environment).

The other improvement on the disk layout is to use a dedicated volume for ETCD and increase its stability as it's not sharing IOPS with system and cluster components.

Next step to decrease the costs:

@mtulio mtulio changed the title introduce profiles adding SNO introduce profiles adding SingleReplica (SNO) topology on AWS Feb 6, 2023
@mtulio
Copy link
Owner Author

mtulio commented Feb 7, 2023

A couple of metrics indicate the low-disk perfoamance and the impact on the API.

  • API target group health metric:

Screenshot from 2023-02-05 23-16-29

  • Single Node EBS when running single/shared disk:

Screenshot from 2023-02-05 23-19-58

Screenshot from 2023-02-05 23-19-31

Screenshot from 2023-02-05 23-19-12

@mtulio mtulio marked this pull request as ready for review February 7, 2023 03:48
@mtulio
Copy link
Owner Author

mtulio commented Feb 7, 2023

Performance after spliting the disks into different mount points:

$ tail -n 3 sno-run-etcdfio-*.txt
==> sno-run-etcdfio-root.txt <==
--------------------------------------------------------------------------------------------------------------------------------------------------------
INFO: 99th percentile of fsync is 3948544 ns
INFO: 99th percentile of the fsync is within the recommended threshold: - 20 ms, the disk can be used to host etcd

==> sno-run-etcdfio-varlibcontainers.txt <==
--------------------------------------------------------------------------------------------------------------------------------------------------------
INFO: 99th percentile of fsync is 114176 ns
INFO: 99th percentile of the fsync is within the recommended threshold: - 20 ms, the disk can be used to host etcd

==> sno-run-etcdfio-varlibetcd.txt <==
--------------------------------------------------------------------------------------------------------------------------------------------------------
INFO: 99th percentile of fsync is 5996544 ns
INFO: 99th percentile of the fsync is within the recommended threshold: - 20 ms, the disk can be used to host etcd
  • Top pods before
$ head -n5 ~/opct/results/opct-sno-aws/info-top-pods-before.txt 
NAMESPACE                                          NAME                                                                    CPU(cores)   MEMORY(bytes)   
openshift-kube-apiserver                           kube-apiserver-ip-10-0-51-31                                            199m         2094Mi          
openshift-monitoring                               prometheus-k8s-0                                                        106m         1900Mi          
openshift-etcd                                     etcd-ip-10-0-51-31                                                      99m          505Mi           
openshift-kube-controller-manager                  kube-controller-manager-ip-10-0-51-31                                   17m          490Mi
  • Top pods after (Prometheus dropping the utilization)
$ head -n5 ~/opct/results/opct-sno-aws/sno2-info-top-pods-after.txt
NAMESPACE                                          NAME                                                         CPU(cores)   MEMORY(bytes)   
openshift-kube-apiserver                           kube-apiserver-ip-10-0-49-97                                 222m         1802Mi          
openshift-monitoring                               prometheus-k8s-0                                             39m          865Mi           
openshift-etcd                                     etcd-ip-10-0-49-97                                           70m          400Mi           
openshift-kube-controller-manager                  kube-controller-manager-ip-10-0-49-97                        12m          314Mi    

roles/config/defaults/main.yaml Outdated Show resolved Hide resolved
roles/bootstrap/defaults/main.yaml Outdated Show resolved Hide resolved
playbooks/create_node.yaml Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant