-
Notifications
You must be signed in to change notification settings - Fork 0
Server Troubleshooting and Resolution
If Prometheus and Grafana indicate high CPU usage on your Linux system, follow these steps to investigate and resolve the issue:
- Identify CPU-intensive processes
Use the top command to view real-time system statistics and identify processes consuming high CPU:
topOr use htop for a more user-friendly interface:
htop- Analyze specific processes
For detailed information about a process, use:
ps aux | grep <process_name_or_PID>- Check system load average View the system load average:
uptime- Monitor CPU usage over time Use the sar command to collect, report, and save CPU usage data:
sudo sar -u 1 10This command reports CPU usage every 1 second for 10 iterations.
- Examine CPU core usage To see CPU usage per core:
mpstat -P ALL 1 5- Investigate high I/O wait times If I/O wait is high, use iostat to monitor disk I/O:
iostat -xz 1 10- Terminate unnecessary processes:
kill <PID>or force kill
kill -9 <PID>- Adjust process priority:
renice +10 <PID>- Limit CPU usage for a process:Use cgroups or the cpulimit tool
sudo cpulimit -p <PID> -l 50- Update or optimize software: Keep your system and applications up-to-date:
sudo apt update && sudo apt upgrade- Check for malware: Use tools like rkhunter or chkrootkit
sudo rkhunter --check- Optimize system services:Disable unnecessary services
sudo systemctl disable <service_name>Backup your system before making significant changes, and always test in a non-production environment first.
When your Linux system is running low on memory, follow these steps to diagnose and address the issue:
- Check Current Memory Usage
Use the free command to view memory statistics:
free -hor a more detailed view, use:
cat /proc/meminfo- Identify Memory-Intensive Processes: Use
toporhtopto see which processes are consuming the most memory
# Use top
top
# Use htop
htopSort processes by memory usage in top by pressing Shift+M.
- Analyze Specific Processes For detailed information about a process's memory usage:
ps aux | grep <process_name_or_PID>To see the memory map of a process:
pmap -x <PID>- Check for Memory Leaks Use Valgrind to check for memory leaks in a specific application:
valgrind --leak-check=full /path/to/your/program- Monitor Swap Usage. Check swap space usage:
swapon --show- Examine System Logs. Look for any memory-related errors in system logs:
sudo journalctl -p err..emerg- Terminate unnecessary processes:
kill <PID>or force kill:
kill -9 <PID>- Clear Page Cache: To free up cached memory
sudo sync; echo 3 | sudo tee /proc/sys/vm/drop_caches- Increase Swap Space: Create a new swap file:
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfileAdd to /etc/fstab for persistence:
/swapfile none swap sw 0 0- Optimize Applications:
- Update software to latest versions
- Configure applications to use less memory
- Use lightweight alternatives for resource-heavy applications
- Implement Memory Limits:Use
cgroupsto set memory limits for services:
sudo systemctl set-property <service_name> MemoryLimit=1G- Clean Up Disk Space:Remove unnecessary files and uninstall unused applications:
sudo apt autoremove
sudo apt clean- Consider Hardware Upgrades: If issues persist, consider adding more RAM to your system.
Low disk space on a Linux server can cause various issues, including application crashes and system instability. This guide provides steps and commands to troubleshoot and resolve low disk space issues.
- Check Disk Usage
Use the df command to check disk usage of all mounted filesystems.
df -h- Identify Large Files and Directories: Use the
ducommand to identify large files and directories
du -sh /path/to/directory/*Find Top 10 Largest Directories in Root
du -ahx / | sort -rh | head -10
- Clean Up Unnecessary Files
- Remove Unnecessary Packages
sudo apt-get autoremove
sudo apt-get clean- Clear Systemd Journal Logs
sudo journalctl --vacuum-size=100M
- Clear APT Cache (Debian/Ubuntu)
sudo apt-get clean- Delete Old Logs
sudo find /var/log -type f -name "*.log" -exec rm -f {} \;
- Investigate and Clear Docker Disk Usage If you are using Docker, it can consume a significant amount of disk space.
- Check Docker Disk Usage
sudo docker system df
- Remove unused Docker data
sudo docker system prune -a
# or force Remove
sudo docker system prune -af-
Implement log rotation using tools like
logrotateto prevent log files from consuming too much disk space. -
Consider adding more disk space or storage to the server if disk space issues persist.
-
Create K8 cluster with EKS
-
Deploy Microservices App
-
Deploy Prometheus Monitoring Stack
-
Monitor cluster Nodes
-
Monitor K8s components
-
Monitor 3rd Party Application Redis Deploy Redis Exporter
-
Monitor Own Application (using custom libraries for diff programming lang)
-
Infrastructure level (CPU, RAM, Network) Platform level ( Application level (3rd party application, own application)
9.Data visualizations
Prometheus UI Grafana
Notifications Alert rules/ Alertmanager
- Prerequisites
- First , we will install AWS CLI as a prerequisite and configure it. View the documentation
Note: to create additional profiles, specify the profile name as shown:
aws configure --profile <named-profile>
To view the credentials file:
cat ~/.aws/credentials
To set the new profile as default run:
export AWS_PROFILE=<named-profile>
Confirm settings by running:
aws configure list
-
Ensure the principal IAM user being used to create the cluster has the following permissions:
- CloudFormation-full access
- EC2- full: Tagging Limited: List, Read, Write
- EC2 Auto Scaling: Limited: list, Write
- EKS: Full access
- IAM: Limited:List, Read, Write, Permissions Management
- Systems Manager: Limited: List, Read
We will be running our cluster with the admin user.
- Install eksctl (The commandline tool for AWS EKS). The installation instruction can be found in AWS EKS documentation
# for ARM systems, set ARCH to: `arm64`, `armv6` or `armv7`
ARCH=amd64
PLATFORM=$(uname -s)_$ARCH
curl -sLO "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_$PLATFORM.tar.gz"
# (Optional) Verify checksum
curl -sL "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_checksums.txt" | grep $PLATFORM | sha256sum --check
tar -xzf eksctl_$PLATFORM.tar.gz -C /tmp && rm eksctl_$PLATFORM.tar.gz
sudo mv /tmp/eksctl /usr/local/bin
- Verify eksctl installation
eksctl version
- Create EKS cluster. To get the flags specifications you should use to overide the default, run:
eksctl create cluster --help
eksctl create cluster \
--name shop-cluster \
--region us-east-1 \
--nodegroup-name shop-nodes \
--node-type t2.micro \
--nodes 2 \
--nodes-min 1 \
--nodes-max 3 \
- We can now run kubectl commands. Run:
kubectl config view
kubectl get nodes
We will deploy our microservices app with the following command:
kubectl apply -f config-microservices.yaml
- We will deploy the prometheus monitoring stack while our microservices are starting up and update the helm charts.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
- We will create a namespace called monitoring so that we can install Prometheus in its own namespace.
kubectl create namespace monitoring
helm install monitoring prometheus-community/kube-prometheus-stack -n monitoring
Check the status by running the displayed command:
kubectl --namespace monitoring get pods -1 "release=monitoring"
or run the following to view all the components of prometheus that were deployed:
kubectl get all -n monitoring
kubectl get configmap -n monitoring
kubectl get secret -n monitoring
kubectl get crd -n monitoring
We can also view the statefulset by redirecting it to a file:
kubectl get statefulset -n monitoring
kubectl describe statefulset <name-of-statefulset> -n monitoring > prom-state.yaml
We can also view the operator by redirecting it to a file:
kubectl get deployment -n monitoring
kubectl describe deployment <name of operator> -n monitoring > operator.yaml
- First we must answer the question: "What do we want to monitor?"
- We want to know when something unexpected happens
- We want to observe abnomalities e.g CPU spikes, insufficient storage, high throughput/load, unauthorized/unauthentiacted requests
When we get these information, It will help us to act appropriately.
- To access the prometheus UI on local host. we can do a port-forward:
kubectl port-forward service/monitoring-kube-prometheus-prometheus -n monitoring 9090:9090 &
Copy the port forward ports and paste in your browser:
127.0.0.1/9090
- To access grafana visualization UI on local host. We will port-forward:
kubectl port-forward service/monitoring-grafana 8080:80 &
Copy the port forward ports and paste in your browser:
127.0.0.1/8080
Alternatively, we can create a nodePort: To access the prometheus server from outside, create a new service- nodePort
kubectl expose service prometheus-server --type=NodePort --target-port=9090 --name=prometheus-server-ext
Confirm
kubectl get svc
Visit the IP on the browser
[PUBLIC_IP]/Port
The default username and passwords which are base 64 encoded in secret are:
- user: admin
- password: prom-operator
We can import custom dashboards or create our own.
- We will run a script that will simulate sending of multiple requests to our application so that we can test the functionality of the data monitoring and visualization
kubectl run curl-test --image=radial/busyboxplus:curl -i --tty --rm
Get the endpoint of the online shop by running and copy the endpoint of the frontend and paste it in your browser:
kubectl get svc
Within the busybox container, we will probe the endpoint of our application by running the following script.
save it in a text file named test.sh
for i in $(seq 1 10000)
do
curl http://[frontend-endpoint] > test.text
done
- Slow response times
- High latency
- Unexpected bandwidth usage
- Check network utilization:
iftop -i <interface> - Analyze network connections:
netstat -tuln - Monitor incoming/outgoing traffic:
tcpdump -i <interface> -n
- Optimize application code for network efficiency
- Implement caching mechanisms
- Consider load balancing or CDN solutions
- Connection timeouts
- DNS resolution failures
- SSL/TLS errors
- Check DNS resolution:
nslookup <domain> - Test network connectivity:
ping <host> traceroute <host> - Verify SSL/TLS configuration:
openssl s_client -connect <host>:<port>
- Update DNS settings
- Check firewall rules
- Renew or reconfigure SSL/TLS certificates
- High disk usage
- Slow read/write operations
- I/O wait time spikes
- Monitor disk I/O:
iostat -x 1 - Check disk usage:
df -h du -sh /* - Identify processes causing high I/O:
iotop
- Optimize database queries
- Implement proper indexing
- Consider upgrading to SSDs or faster storage
- Adjust file system parameters (e.g., noatime mount option)
- Always backup data before making significant changes
- Keep system and application logs for reference
- Regularly update and patch your systems
- Monitor server performance consistently to catch issues early
- Home
- CI CD Pipeline Configuration for the Python Application
- Deployment with Systemd
- NGINX Reverse Proxy Setup and SSL Configuration
- Setting up the remote server and installing prerequisites
(Content not available in the provided HTML)
(Content not available in the provided HTML)