Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert for high server network/ram/disk/cpu usage "heartbeat" monitoring #819

Open
1 task done
OryonMax opened this issue Oct 28, 2021 · 46 comments
Open
1 task done
Labels
area:monitor Everything related to monitors feature-request Request for new features to be added type:new proposing to add a new monitor

Comments

@OryonMax
Copy link

🏷️ Feature Request Type

New Monitor

πŸ”– Feature description

Please add Heartbeat Monitoring just like in HetrixTools.

βœ”οΈ Solution

Add a new monitor type which shows server's network usage, ram usage, disk and cpu usage and gives alert when usage is close to 90% so people know it's time to upgrade or add a new node.

❓ Alternatives

HetrixTools

πŸ“ Additional Context

No response

πŸ‘€ Have you spent some time to check if this feature request has been raised before?

  • I checked and didn't find similar feature request
@OryonMax OryonMax added the feature-request Request for new features to be added label Oct 28, 2021
@deefdragon
Copy link
Contributor

I feel this is out of scope of UK (for now at-least). For remote servers, if the usage stats are that important, something more tuned to metrics tracking (Prometheus/grafana etc.) should likely be employed.

(You could look into making a push monitor and writing a script yourself like this if you really need this feature for something).

@OryonMax
Copy link
Author

Everybody needs that nowadays and most Status Pages are Paid, I hope to see this feature in Uptime Kuma.

@markdesilva
Copy link

As @deefdragon said, this might be out out of scope for what UK does. However, using push and command line utilities like "mpstat" and "free" and if the UK developers allow users to change the units and description of whats monitored as I asked for in #749, then UK could give you individual graphs of cpu and memory utilization, only thing is you won't have one page with all the metrics together.

@OryonMax
Copy link
Author

Not allowed in UK?

@markdesilva
Copy link

Didn't say its not allowed, but as mentioned, not in the scope of what UK does, So even if the developer decides to put it in, probably it will not be a priority. Unless you want to code that portion yourself and then make a pull request for your work to be included in a future release.

@ImmaZoni
Copy link

@OrefaSol to provide some insight on why this is not really in scope.

A typical "status" service does nothing other than Ping a server. basically "Hey you there?" and it responds or doesn't.
yes or no. This ping does not provide data on CPU, RAM, Storage, anything. all you get is yes I'm alive, and it took this long.

Due to this Uptime Kuma is a permissionless service, meaning I don't need to approve uptime Kuma talking to my website or any website really.

If Louis were to try and implement something like this it would require a separate program/script that would go on the website you want to test and send extra info over to Uptime Kuma. So it requires direct access to every service you want to test and get this data on.

HetrixTools offers various services, one being a status service, and another being a server monitoring service.

Uptime Kuma (In its current form at least) is strictly a status service.

@OryonMax
Copy link
Author

Nope, HetrixTool has Hearbeat Monitoring Under Uptime Monitoring Product.

@markdesilva
Copy link

Sounds like maybe you should be using HetrixTools then as you’re obviously a fan.

@rihards-simanovics
Copy link

Nope, HetrixTool has Hearbeat Monitoring Under Uptime Monitoring Product.

What you are asking here is out of scope (perhaps for now), full stop.

Unless you are willing to code it yourself, wait until the UK developer does it.

Surely you understand that everyone who develops and contributes to a free and open-source product dedicates their free time to do so. The feature that you are asking for is from a paid product, there is a reason why it's paid, the money goes to a developer for their hard work.

@louislam
Copy link
Owner

Everyone relax🐻.

Just follow one rule. If you love the suggestion, give a πŸ‘.

Ignore it if you don't like it.

@zimbres
Copy link

zimbres commented Dec 31, 2021

I use https://www.netdata.cloud/

@rihards-simanovics
Copy link

@zimbres, this is amazing and Open Source. Hmm, I think I might have this running for my client reports... I will keep using the UK for internal services as those don't require reports generated. Thanks for sharing the tool name!

PS: Perhaps the UK might use some of the source code or take inspiration from that tool as it looks quite nice.

@ririko5834
Copy link

ririko5834 commented Apr 22, 2022

This should work like hetrixtools works, that you can get also stats about ram, CPU, disk, network, etc. displayed in graph on status page.

@markdesilva
Copy link

Here we go again with Hetrix tools.

I wonder if all these folks suggesting UK work like Hetrixtools just want the HT functions cos they want the unlimited HT functionality without paying for it.

Sounds like it doesn’t it? πŸ€·πŸ»β€β™‚οΈ

@rihards-simanovics
Copy link

rihards-simanovics commented Apr 22, 2022

@ririko5834 the basic answer to your request is "maybe in the future".

UptimeKuma is a relatively simple uptime monitoring application running on NodeJS. not saying that NodeJS is a bad language. Still, I am nearly pointing out that a different language is more favourable due to performance requirements for what you are asking.

As @markdesilva pointed out, and I agree with them, if you favour Hetrix Tools, you need to support the developer by getting a paid plan. UptimeKuma may be an open-source project for now, but I'm sure that when the time comes, the author will also want to have their own paid plans alongside open-source for those people who don't want to have a hustle of setting one up themselves.

That being said, keep in mind that, as pointed out by @ImmaZoni, to know the server's hardware status, the author will require developing and requesting to installing of a separate "companion" app on the server, which will push the CPU, RAM, etc. information to UptimeKuma. If the last stable release (v1.14.0) is anything to go by, the author wants this application to just run without any additional hoops to jump through (ref. to Cloudflare proxy functionality).

EDIT:
Almost forgot, @zimbres also noted that there is another open-source tool called netdata that you can use to monitor server hardware status.

@InSelfControll
Copy link

InSelfControll commented Dec 19, 2022

Hey you don't have to install anything on the server just need to let uptime-kuma the option to connect via ssh to every server and get these info then parse it to the status page.

I have made a bash script that send details like this to my email once it pass the 75% disk usage same goes to the ram and CPU.

I just need to find the right way to send the data to uptime-kuma now for it to send it to me via telegram.

@rihards-simanovics
Copy link

rihards-simanovics commented Dec 20, 2022

@InSelfControll

Hey you don't have to install anything on the server just need to let uptime-kuma the option to connect via ssh to every server and get these info then parse it to the status page.

Do you even realise how dangerous this is? Openly allow an application (of all things) to have access to a server via SSH? It's almost as if security holes don't exist. So now the hacker instead of hacking six of my servers only need to hack one of my servers and get SSH access to all the other servers.

The best and most secure way is to have a dedicated client application that would receive a request (be it via the web URL or else), process it and send a response JSON to an API on the UK side, or alternately just send the JSON data with an interval, so there is only one way communication from server to UK.

What you've proposed breaks the security best practises on so many levels.

@InSelfControll
Copy link

InSelfControll commented Dec 20, 2022

This user doesn't need any permissions except df -h, free -m commands you always can minimized the commands of a user to only 1/2 commands or give the user limited ssh access to only send this commands via ssh @rihards-simanovics, you don't have to give fully login access to ssh so no security issues.
Today I have it automatically send to my email / telegram from each server.

@rihards-simanovics
Copy link

rihards-simanovics commented Dec 20, 2022

This user doesn't need any permissions except df -h, free -m commands you always can minimized the commands of a user to only 1/2 commands or give the user limited ssh access to only send this commands

I did think of that, that being said it is still a very junky solution (hence why I didn't mention it). Besides, it's already been mentioned in this discussion that much better paid applications are available. If you have enough servers to warrant an advanced system like that, perhaps it's time to get the wallet out?

This user doesn't need any permissions except df -h, free -m commands you always can minimized the commands of a user to only 1/2 commands or give the user limited ssh access to only send this commands via ssh @rihards-simanovics, you don't have to give fully login access to ssh so no security issues.
Today I have it automatically send to my email / telegram from each server.

Again, it's almost as if security holes don't exist. You are playing an extremely dangerous game by even allowing the potential hacker to login. Look, I'm no security expert, but I can guarantee you, gaining access as a "limited user" is a first step, to a full blown hack, so let's not.

@InSelfControll
Copy link

InSelfControll commented Dec 21, 2022 via email

@markdesilva
Copy link

markdesilva commented Dec 22, 2022

@InSelfControll

Maybe I don't quite understand your description, but the status msg can take any message. It does not accept the messages in quotes like "Service is up" but it will take URL spaces as in %20 as in Server%20is%20up. Eg:

attl=`/usr/bin/ping -c 1 <UK server IP> | tail -1 | /usr/bin/cut -d"/" -f 5`

/usr/bin/curl -k "https://<UK server IP>:3001/api/push/XXXXXXXXX?msg=Service%20is%20up,%20ping%20time%20is%20$attl&ping=$attl"

As you can see, you can even pass variables (in this case the ping time to the UK server).

Then your UK will show this:

uk-push-service-msg

If you're using a linux machine, normal users (non root, no sudo) have access to to cat /proc/cpuinfo and cat /proc/meminfo, as well as df and can use cut, sed, awk, grep whatever info they need and pass it into the status message, no giving ssh access to UK or sudo or whatever so no security concerns. For windows I think there are equivalent Powershell commands normal users can use to get the values for cpu usage, memory usage and disk usage (Get-Volume).

Hope it works for you.

Cheers!

The other option is to fix the push passive monitor, and let users to send custom messages in it. Now the only message it sends is "ok" kinda useless message. I want to send the script output via curl into the push passive monitor instead just receiving "ok" message. Now each of my VMS runs the script all the time and if the disk usage is more then 85% I receive an email with the status and details about the disk usage.

@rihards-simanovics
Copy link

@markdesilva this seems like a better solution, but would this generate a notification? @InSelfControll needs this to see what the status of the hardware is on their telegram.

@InSelfControll
Copy link

InSelfControll commented Dec 22, 2022

@markdesilva this seems like a better solution, but would this generate a notification? @InSelfControll needs this to see what the status of the hardware is on their telegram.

With push notifications I get it directly to my teams / telegram as it should be.
I'll keep testing it and update.

The issue now that's the monitor get the heart bit but never send the message more than once.

Example: (for the test I did a check that check if the disk_usage is higher than 1% it should send critical alert)

#!/bin/bash

# Get disk usage
disk_usage=`/usr/bin/df -h | /usr/bin/grep "fedora" | /usr/bin/awk 'END {print $5}' | /usr/bin/tr -d "%"`
disk_usage1=`/usr/bin/df -h | /usr/bin/grep "fedora" | /usr/bin/awk 'END {print $5}'`

# Check if disk usage is higher than 85%
if [ $disk_usage -gt 1 ]; then
  # Send push notification
  /usr/bin/curl -k "http://1.1.1.1:3001/api/push/********?msg=Disk%20Usage%20is%20high:${disk_usage}%25"
fi

image
Look at the script and the picture.

@markdesilva
Copy link

@InSelfControll

Hi so sorry for the late response, I was out. Let me take a look at the script and what you want to do and get back to you.

@markdesilva
Copy link

markdesilva commented Dec 22, 2022

@InSelfControll,

With push notifications I get it directly to my teams / telegram as it should be. I'll keep testing it and update.

The issue now that's the monitor get the heart bit but never send the message more than once.

Example: (for the test I did a check that check if the disk_usage is higher than 1% it should send critical alert)

Hi, so from what I understand UK only reports on either status up or down. Once a status is reported, unless it changes (from up to down or down to up) it will not report. UK works that way for all reports. The idea behind this is so UK won't spam you multiple times via email. telegram etc. while you away from your system and can't check to rectify the error.

The only way to keep reporting the critical error is to keep keep flipping the status between up and down.

When you first report it down, store that some where, when it next reports check the previous status, and if its "down", change your url status to "up" and replace the stored status. The next time it checks the stored status, it will be "up" so then it will change the status to "down" and so on. For your code:

#!/bin/bash

# Get disk usage
disk_usage=`/usr/bin/df -h | /usr/bin/grep "extmedia2" | /usr/bin/awk 'END {print $5}' | /usr/bin/tr -d "%"`

# Check status file and flip status for continuous notices
if [ -f /tmp/du.status ]; then
   if [ `cat /tmp/du.status` == "up" ]; then
        echo "down" > /tmp/du.status
   else
        echo "up" > /tmp/du.status
   fi
else
   echo "up" > /tmp/du.status
fi

udstatus=`cat /tmp/du.status`

# Check if disk usage is higher than 85%
if [ $disk_usage -gt 1 ]; then
  # Send push notification
  /usr/bin/curl -k "https://1.1.1.1:3001/api/push/**********?status=$udstatus&msg=Disk%20Usage%20is%20high:${disk_usage}%25"
fi

Your UK will look like this:

uk-push_flipstatus

Take note, this will keep spamming you until you pause the monitor or disable the cron for the script.

Honestly I think the default of only sending the message once is the right way to go. Hope this helps.

Cheers!

@InSelfControll
Copy link

InSelfControll commented Dec 22, 2022

Hi,
Thanks for your reply.
I think this option should be in UK that it'll repeat it every X times if the issue didn't fixed.

Let's say it repeats 4 times each time after 1 minute after 3 times the report will changed into critical and the fourth time will be sent via email by the script.
UK will keep reporting every time till it fixed only on issues.

@markdesilva
Copy link

@InSelfControll

Silly me, there is already an option in the monitor for sending messages on consecutive heartbeats missed.

uk_retries

This will send the msg to your telegram every minute, but it will NOT reflect in the status on UK multiple times, only once. The only way I can find to have it send to telegram and to show on the UK status multiple times is what I said in my previous post.

@markdesilva
Copy link

markdesilva commented Dec 22, 2022

Hi, Thanks for your reply. I think this option should be in UK that it'll repeat it every X times if the issue didn't fixed.

Let's say it repeats 4 times each time after 1 minute after 3 times the report will changed into critical and the fourth time will be sent via email by the script. UK will keep reporting every time till it fixed only on issues.

Right, so you can sort of do this by setting the resend notifications if Down X times consequently to "4". But like I said, it will not update the UK status, but only keep sending to your alert (telegram, etc).

@InSelfControll
Copy link

Hi, Thanks for your reply. I think this option should be in UK that it'll repeat it every X times if the issue didn't fixed.

Let's say it repeats 4 times each time after 1 minute after 3 times the report will changed into critical and the fourth time will be sent via email by the script. UK will keep reporting every time till it fixed only on issues.

Right, so you can sort of do this by setting the resend notifications if Down X times consequently to "4". But like I said, it will not update the UK status, but only keep sending to your alert (telegram, etc).

The issue that if you mark it as down in the url so it'll not send the correct message it just sending "no heartbeat" instead my message.

@markdesilva
Copy link

Yes, you are right. In the alert (eg: telegram) message, it will only say "No heartbeat in the time window". For your own message to appear in the alert message, you will need to use my modifications to your script, just that the status will keep flipping between up and down.

The issue that if you mark it as down in the url so it'll not send the correct message it just sending "no heartbeat" instead my message.

@InSelfControll
Copy link

InSelfControll commented Dec 22, 2022

Yes, you are right. In the alert (eg: telegram) message, it will only say "No heartbeat in the time window". For your own message to appear in the alert message, you will need to use my modifications to your script, just that the status will keep flipping between up and down.

Hey look I have another script that works really nice and with a custom status=down message, but the only issue now that, for some reason, it doesn't get %25 (%) symbol at the end.

#!/bin/bash

# This script monitors RAM, CPU, and disk usage and sends an alert if disk usage is higher than 85%.

# Get current disk usage
DISKUSAGE=`/usr/bin/df -h | /usr/bin/awk '$NF=="/"{printf "%s\t\t", $5}' | /usr/bin/tr -d "%"`
# Get current RAM and CPU usage
RAM=`free -m | awk 'NR==2{printf "Memory Usage: %s/%sMB (%.2f%%)\n", $3,$2,$3*100/$2 }'`
CPU=`top -bn1 | grep load | awk '{printf "CPU Load: %.2f\n", $(NF-2)}'`
# Check if disk usage is higher than 85%
if [[ ${DISKUSAGE%?} -gt 21 ]]; then
  echo "High Disk Usage: $DISKUSAGE"
  echo "$RAM"
  echo "$CPU"
  # Send alert
  curl -s "https://uk.***com/api/push/******?status=down&msg=Disk%20usage%20is%20high:${DISKUSAGE}%25"
  else
          curl -s "https://uk.****.com/api/push/******?status=up&msg=Disk%20usage%20is%20Fixed:${DISKUSAGE}%25"
fi

I would be very happy if we can fix it together :)

@markdesilva
Copy link

markdesilva commented Dec 22, 2022

Hey look I have another script that works really nice and with custom status=down message but the only issue now that for some reason it doesn't get %25 (%) symbol at the end.
I would be very happy if we can fix it together :)
@InSelfControll

The problem is that there is a trailing tabs (/t) on your DISKUSAGE variable.

if you put [ ] in front and behind the DISKUSAGE variable when you echo it (line 12), you will see.

echo "High Disk Usage: [$DISKUSAGE]"

If you echo the curl url in your code you will also see:

echo "https://uk.***com/api/push/******?status=down&msg=Disk%20usage%20is%20high:${DISKUSAGE}%25"

Also the first line should be #!/bin/bash not !/bin/bash

Cheers!

@InSelfControll

This comment was marked as resolved.

@markdesilva
Copy link

It didn't copied the first line πŸ˜† my script have it already with #!/bin/bash,

Oh ok. Also its trailing tabs not newline, my mistake.

@InSelfControll
Copy link

InSelfControll commented Dec 22, 2022

Oh ok. Also its trailing tabs not newline, my mistake.

Yeah printf adding tabs, I removed it but still curl doesn't get %25 ASCII

@markdesilva
Copy link

markdesilva commented Dec 22, 2022

@InSelfControll

Yeah printf adding tabs, I removed it but still curl doesn't get %25 ASCII

Funny, it works for me.

This is my modified version of your script:

#!/bin/bash

# This script monitors RAM, CPU, and disk usage and sends an alert if disk usage is higher than 85%.

# Get current disk usage
DISKUSAGE=`/usr/bin/df -h | /usr/bin/awk '$NF=="/" {printf "%s", $5}' | /usr/bin/tr -d "%"`
# Get current RAM and CPU usage
RAM=`free -m | awk 'NR==2{printf "Memory Usage: %s/%sMB (%.2f%%)\n", $3,$2,$3*100/$2 }'`
CPU=`top -bn1 | grep load | awk '{printf "CPU Load: %.2f\n", $(NF-2)}'`
# Check if disk usage is higher than 85%

if [ ${DISKUSAGE} -gt 1 ]; then
  echo "High Disk Usage: [$DISKUSAGE]"
  echo "$RAM"
  echo "$CPU"
  # Send alert
  curl -s "https://uk.***com/api/push/******?status=up&msg=Disk%20usage%20is%20high:${DISKUSAGE}%25"
else
  curl -s "https://uk.***com/api/push/******??status=up&msg=Disk%20usage%20is%20Fixed:${DISKUSAGE}%25"
fi

Did you forget your 3001 or are you running your UK server on 443?

@InSelfControll
Copy link

InSelfControll commented Dec 22, 2022

Did you forget your 3001 or are you running your UK server on 443?

This is my private server, the other one is stg at work with port 3001.
This specific URL runs behind treafik proxy on docker, :) so no port.
I fixed the disk usage now working on CPU and memory that I have a little issue with.

Trying to figure out how to get only the percentage of CPU and memory for doing the test for it to

@markdesilva
Copy link

markdesilva commented Dec 22, 2022

Ah ok. In anycase the %25 works for me.

For CPU % used, have you tried using iostat (apt get sysstat) and piping that into bc?

CPUP_IDLE=`iostat  | grep -A1 "avg-cpu" | awk {'print $6'} | tail -1`
CPUP_USED=`echo "100 - $CPUP_IDLE" | bc`

For memory, free -g take the available/total and also pipe into bc.

This is my private server the other one is stg at work with port 3001. This specific url run behind treafik proxy on docker :) so no port.

@InSelfControll
Copy link

InSelfControll commented Dec 22, 2022

Ah ok. In anycase the %25 works for me.

For CPU % used, have you tried using iostat (apt get sysstat) and piping that into bc?

CPUP_IDLE=`iostat  | grep -A1 "avg-cpu" | awk {'print $6'} | tail -1`
CPUP_USED=`echo "100 - $CPUP_IDLE" | bc`

For memory, free -g take the available/total and also pipe into bc.

This is my private server the other one is stg at work with port 3001. This specific url run behind treafik proxy on docker :) so no port.

Done!
Thanks for your help.
I think that UK need to get this feature that send critical messages on hosts if there are issues other than the server is down.
Like that the http test that can monitor string and still get code:200 but mark as missing string.

I would love to see UK get this feature in the next release 😎

@InSelfControll

This comment was marked as off-topic.

@BasToTheMax

This comment was marked as spam.

@CommanderStorm
Copy link
Collaborator

CommanderStorm commented Aug 25, 2023

Quick reminder for everybody:
Issues are for discussing what needs to be done how by whom.
We use πŸ‘πŸ» on issues to prioritise work, as always: Pull Requests welcome.

You can currently implement such a monitor without any modification needed via the post monitor as stated above.
If you want to simplify this process, adding a monitor via a PR is a possibility (see our contribution guide for additional details).

@CommanderStorm CommanderStorm changed the title [Feature]: Heartbeat Monitoring Alert for high server network/ram/disk/cpu usage "heartbeat" monitoring Dec 4, 2023
@CommanderStorm CommanderStorm added area:monitor Everything related to monitors type:new proposing to add a new monitor labels Dec 4, 2023
@milzamsz
Copy link

found existing solution https://github.com/msgbyte/tianji

@MichaelBelgium
Copy link

MichaelBelgium commented Apr 27, 2024

@milzamsz If i'm correct, on tianji you're just having a list of servers. I don't see the possiblity to, for example, set alerts when cpu usage > x %

Neither add a server to a status page? And it doesn't support mysql/mariadb like v2 of uptime-kuma?

@CommanderStorm CommanderStorm mentioned this issue Apr 29, 2024
1 task
@milzamsz
Copy link

milzamsz commented May 5, 2024

@MichaelBelgium yes but it's better for me who doesn't need advanced monitoring like netdata/prometheus. im installing with easypanel and it come with postgres

@AquaMCU
Copy link

AquaMCU commented Jun 12, 2024

Hi. Id like to join the conversation ;)

Other idea: How about KUMA just adds a new monitor, that requests a page from the server to be monitored. Here you guys can knock yourself out and implement your GO_JS_PHP_BASH_FORTRAN or whatever module, that just responds with a percentage and does not respond when it s critical.

As for KUMA, for this Monitor, just display the data and make it NOK when it is not responding.

... easy to do for the KUMA team and nice and hackable for the rest of us ;)

Oliver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:monitor Everything related to monitors feature-request Request for new features to be added type:new proposing to add a new monitor
Projects
None yet
Development

No branches or pull requests