Server Troubleshooting and Resolution

Troubleshooting High CPU Usage in Linux

If Prometheus and Grafana indicate high CPU usage on your Linux system, follow these steps to investigate and resolve the issue:

Identify CPU-intensive processes

Use the top command to view real-time system statistics and identify processes consuming high CPU:

top

Or use htop for a more user-friendly interface:

htop

Analyze specific processes

For detailed information about a process, use:

ps aux | grep <process_name_or_PID>

Check system load average View the system load average:

uptime

Monitor CPU usage over time Use the sar command to collect, report, and save CPU usage data:

sudo sar -u 1 10

This command reports CPU usage every 1 second for 10 iterations.

Examine CPU core usage To see CPU usage per core:

mpstat -P ALL 1 5

Investigate high I/O wait times If I/O wait is high, use iostat to monitor disk I/O:

iostat -xz 1 10

Resolution steps:

Terminate unnecessary processes:

kill <PID>

or force kill

kill -9 <PID>

Adjust process priority:

renice +10 <PID>

Limit CPU usage for a process:Use cgroups or the cpulimit tool

sudo cpulimit -p <PID> -l 50

Update or optimize software: Keep your system and applications up-to-date:

sudo apt update && sudo apt upgrade

Check for malware: Use tools like rkhunter or chkrootkit

sudo rkhunter --check

Optimize system services:Disable unnecessary services

sudo systemctl disable <service_name>

Backup your system before making significant changes, and always test in a non-production environment first.

Troubleshooting and Resolving Low Memory Space in Linux

When your Linux system is running low on memory, follow these steps to diagnose and address the issue:

Check Current Memory Usage

Use the free command to view memory statistics:

free -h

or a more detailed view, use:

cat /proc/meminfo

Identify Memory-Intensive Processes: Use top or htop to see which processes are consuming the most memory

# Use top
top

# Use htop
htop

Sort processes by memory usage in top by pressing Shift+M.

Analyze Specific Processes For detailed information about a process's memory usage:

ps aux | grep <process_name_or_PID>

To see the memory map of a process:

pmap -x <PID>

Check for Memory Leaks Use Valgrind to check for memory leaks in a specific application:

valgrind --leak-check=full /path/to/your/program

Monitor Swap Usage. Check swap space usage:

swapon --show

Examine System Logs. Look for any memory-related errors in system logs:

sudo journalctl -p err..emerg

Resolution steps

Terminate unnecessary processes:

kill <PID>

or force kill:

kill -9 <PID>

Clear Page Cache: To free up cached memory

sudo sync; echo 3 | sudo tee /proc/sys/vm/drop_caches

Increase Swap Space: Create a new swap file:

sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Add to /etc/fstab for persistence:

/swapfile none swap sw 0 0

Optimize Applications:

Update software to latest versions
Configure applications to use less memory
Use lightweight alternatives for resource-heavy applications

Implement Memory Limits:Use cgroups to set memory limits for services:

sudo systemctl set-property <service_name> MemoryLimit=1G

Clean Up Disk Space:Remove unnecessary files and uninstall unused applications:

sudo apt autoremove
sudo apt clean

Consider Hardware Upgrades: If issues persist, consider adding more RAM to your system.

Troubleshooting and Resolving Low Disk Space on a Linux Server

Low disk space on a Linux server can cause various issues, including application crashes and system instability. This guide provides steps and commands to troubleshoot and resolve low disk space issues.

Check Disk Usage

Use the df command to check disk usage of all mounted filesystems.

df -h

Identify Large Files and Directories: Use the du command to identify large files and directories

du -sh /path/to/directory/*

Find Top 10 Largest Directories in Root

du -ahx / | sort -rh | head -10

Clean Up Unnecessary Files

Remove Unnecessary Packages

sudo apt-get autoremove
sudo apt-get clean

Clear Systemd Journal Logs

sudo journalctl --vacuum-size=100M

Clear APT Cache (Debian/Ubuntu)

sudo apt-get clean

Delete Old Logs

sudo find /var/log -type f -name "*.log" -exec rm -f {} \;

Investigate and Clear Docker Disk Usage If you are using Docker, it can consume a significant amount of disk space.

Check Docker Disk Usage

sudo docker system df

Remove unused Docker data

sudo docker system prune -a

# or force Remove
sudo docker system prune -af

Implement log rotation using tools like logrotate to prevent log files from consuming too much disk space.
Consider adding more disk space or storage to the server if disk space issues persist.

Tasks

Create K8 cluster with EKS
Deploy Microservices App
Deploy Prometheus Monitoring Stack
Monitor cluster Nodes
Monitor K8s components
Monitor 3rd Party Application Redis Deploy Redis Exporter
Monitor Own Application (using custom libraries for diff programming lang)
Infrastructure level (CPU, RAM, Network) Platform level ( Application level (3rd party application, own application)

9.Data visualizations

Prometheus UI Grafana

Notifications Alert rules/ Alertmanager

1. Create a Kubernetes cluster on AWS EKS

Prerequisites

First , we will install AWS CLI as a prerequisite and configure it. View the documentation

Note: to create additional profiles, specify the profile name as shown:

aws configure --profile <named-profile>

To view the credentials file:

cat ~/.aws/credentials

To set the new profile as default run:

export AWS_PROFILE=<named-profile>

Confirm settings by running:

aws configure list

Ensure the principal IAM user being used to create the cluster has the following permissions:
- CloudFormation-full access
- EC2- full: Tagging Limited: List, Read, Write
- EC2 Auto Scaling: Limited: list, Write
- EKS: Full access
- IAM: Limited:List, Read, Write, Permissions Management
- Systems Manager: Limited: List, Read

We will be running our cluster with the admin user.

Install eksctl (The commandline tool for AWS EKS). The installation instruction can be found in AWS EKS documentation

# for ARM systems, set ARCH to: `arm64`, `armv6` or `armv7`
ARCH=amd64
PLATFORM=$(uname -s)_$ARCH

curl -sLO "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_$PLATFORM.tar.gz"

# (Optional) Verify checksum
curl -sL "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_checksums.txt" | grep $PLATFORM | sha256sum --check

tar -xzf eksctl_$PLATFORM.tar.gz -C /tmp && rm eksctl_$PLATFORM.tar.gz

sudo mv /tmp/eksctl /usr/local/bin

Verify eksctl installation

eksctl version

Create EKS cluster. To get the flags specifications you should use to overide the default, run:

eksctl create cluster --help

eksctl create cluster \
--name shop-cluster \
--region us-east-1 \
--nodegroup-name shop-nodes \
--node-type t2.micro \
--nodes 2 \
--nodes-min 1 \
--nodes-max 3 \

We can now run kubectl commands. Run:

kubectl config view

kubectl get nodes

2. Deploy Microservices App

We will deploy our microservices app with the following command:

kubectl apply -f config-microservices.yaml

3. Deploy Prometheus Monitoring Stack

We will deploy the prometheus monitoring stack while our microservices are starting up and update the helm charts.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm repo update

We will create a namespace called monitoring so that we can install Prometheus in its own namespace.

kubectl create namespace monitoring

helm install monitoring prometheus-community/kube-prometheus-stack -n monitoring

Check the status by running the displayed command:

kubectl --namespace monitoring get pods -1 "release=monitoring"

or run the following to view all the components of prometheus that were deployed:

kubectl get all -n monitoring

kubectl get configmap -n monitoring

kubectl get secret -n monitoring

kubectl get crd -n monitoring

We can also view the statefulset by redirecting it to a file:

kubectl get statefulset -n monitoring

kubectl describe statefulset <name-of-statefulset> -n monitoring  > prom-state.yaml

We can also view the operator by redirecting it to a file:

kubectl get deployment -n monitoring

kubectl describe deployment <name of operator> -n monitoring > operator.yaml

4. Monitor cluster Nodes

First we must answer the question: "What do we want to monitor?"
- We want to know when something unexpected happens
- We want to observe abnomalities e.g CPU spikes, insufficient storage, high throughput/load, unauthorized/unauthentiacted requests

When we get these information, It will help us to act appropriately.

To access the prometheus UI on local host. we can do a port-forward:

kubectl port-forward service/monitoring-kube-prometheus-prometheus -n monitoring 9090:9090 &

Copy the port forward ports and paste in your browser:

127.0.0.1/9090

To access grafana visualization UI on local host. We will port-forward:

kubectl port-forward service/monitoring-grafana 8080:80 &

Copy the port forward ports and paste in your browser:

127.0.0.1/8080

Alternatively, we can create a nodePort: To access the prometheus server from outside, create a new service- nodePort

kubectl expose service prometheus-server --type=NodePort --target-port=9090 --name=prometheus-server-ext

Confirm

kubectl get svc

Visit the IP on the browser

[PUBLIC_IP]/Port

The default username and passwords which are base 64 encoded in secret are:

user: admin
password: prom-operator

We can import custom dashboards or create our own.

We will run a script that will simulate sending of multiple requests to our application so that we can test the functionality of the data monitoring and visualization

kubectl run curl-test --image=radial/busyboxplus:curl -i --tty --rm

Get the endpoint of the online shop by running and copy the endpoint of the frontend and paste it in your browser:

kubectl get svc

Within the busybox container, we will probe the endpoint of our application by running the following script. save it in a text file named test.sh

for i in $(seq 1 10000)
do
  curl http://[frontend-endpoint] > test.text
done

Server Troubleshooting And Resolution Guide

Network Traffic Issues

Symptoms

Slow response times
High latency
Unexpected bandwidth usage

Troubleshooting Steps

Check network utilization: iftop -i <interface>
Analyze network connections: netstat -tuln
Monitor incoming/outgoing traffic: tcpdump -i <interface> -n

Resolution

Optimize application code for network efficiency
Implement caching mechanisms
Consider load balancing or CDN solutions

Network Errors

Symptoms

Connection timeouts
DNS resolution failures
SSL/TLS errors

Troubleshooting Steps

Check DNS resolution: nslookup <domain>
Test network connectivity: ping <host> traceroute <host>
Verify SSL/TLS configuration: openssl s_client -connect <host>:<port>

Resolution

Update DNS settings
Check firewall rules
Renew or reconfigure SSL/TLS certificates

Disk I/O Issues

Symptoms

High disk usage
Slow read/write operations
I/O wait time spikes

Troubleshooting Steps

Monitor disk I/O: iostat -x 1
Check disk usage: df -h du -sh /*
Identify processes causing high I/O: iotop

Resolution

Optimize database queries
Implement proper indexing
Consider upgrading to SSDs or faster storage
Adjust file system parameters (e.g., noatime mount option)

General Tips

Always backup data before making significant changes
Keep system and application logs for reference
Regularly update and patch your systems
Monitor server performance consistently to catch issues early

Prepared By Devops Python Team

Nwanochie Emmanuel
Omolara Adeboye
Sarah Aligbe
Divine Onyekwuluje
Aisha Muhammad

Wiki Pages

Home
CI CD Pipeline Configuration for the Python Application
Deployment with Systemd
NGINX Reverse Proxy Setup and SSL Configuration
Setting up the remote server and installing prerequisites

Server Troubleshooting and Resolution

Troubleshooting High CPU Usage in Linux

Resolution steps:

Troubleshooting and Resolving Low Memory Space in Linux

Resolution steps

Troubleshooting and Resolving Low Disk Space on a Linux Server

Tasks

1. Create a Kubernetes cluster on AWS EKS

2. Deploy Microservices App

3. Deploy Prometheus Monitoring Stack

4. Monitor cluster Nodes

Server Troubleshooting And Resolution Guide

Network Traffic Issues

Symptoms

Troubleshooting Steps

Resolution

Network Errors

Symptoms

Troubleshooting Steps

Resolution

Disk I/O Issues

Symptoms

Troubleshooting Steps

Resolution

General Tips

Prepared By Devops Python Team

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Wiki Pages

Home

CI CD Pipeline Configuration for the Python Application

Deployment with Systemd

NGINX Reverse Proxy Setup and SSL Configuration

Setting up the remote server and installing prerequisites

Clone this wiki locally