New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global netdata health monitoring service #2466

Open
ktsaou opened this Issue Jul 16, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@ktsaou
Member

ktsaou commented Jul 16, 2017

Release v1.8 will be focused on providing a global health monitoring service, available for free to all netdata users.

netdata has got 3 key donations to support this service:

  1. 1000USD/year in VMs donated by digitalocean.com
  2. A VM with 8GB RAM, 4 CPU cores and 50GB HD, donated by ventureer.com
  3. A VM with 16GB RAM, 4 CPU cores and 100GB HD, donated by stackscale.com

Using these VMs, I plan to provide a highly available health monitoring service for all netdata users, with these features:

  1. a dashboard providing a high level overview of the performance and health of all the servers each user has.

  2. email alarm notifications for all servers (the emails will be dispatched from these servers).

  3. an improved netdata registry featuring:

    • dashboard settings per user, saved at the registry
    • authentication with email (like slack.com - you enter your email, you get by email a link to enter - no more copy-paste of the registry GUID - no passwords to remember)

If you can contribute resources to this movement, just post here. I need all the help I can get...

@Oipo

This comment has been minimized.

Show comment
Hide comment
@Oipo

Oipo Aug 2, 2017

What kind of resources do you need? More VMs? Help setting up/maintaining said VMs?

Oipo commented Aug 2, 2017

What kind of resources do you need? More VMs? Help setting up/maintaining said VMs?

@ktsaou

This comment has been minimized.

Show comment
Hide comment
@ktsaou

ktsaou Aug 2, 2017

Member

More VMs are good, although I already have quite a few.

sysadmin work is the most demanding currently.

I need to setup an email server for my-netdata.io to send notifications. I also plan to add some kind of registration and passwordless authentication that will use emails for authenticating users (like slack). So a robust and secure email service is a prerequisite. Probably postfix, with letsencrypt certificates, etc.

Ideally, I try to write setup scripts to get everything done, at https://github.com/firehol/netdata-demo-site
So, most of the work to setup such VMs should be done with a press of a button. Another way could be to use ansible, or something similar.

The whole thing should be a little bit demonstrating for users willing to have their own setup and it should promote the use of netdata for monitoring it.

Then we have the monitoring service itself. For sure dozens or hundreds of thousands of netdata installations out there should be able to connect to this service and push status information, notifications and alarms. I want to eliminate most of the components involved to maintain such a service. So, I plan to make it work like this (it is oversimplified, but you will get the idea):

  1. Each netdata out there will push every 30 seconds a JSON file with all the information the central dashboard should know about it.
  2. The central netdata will receive this JSON file, and just save it to disk (each netdata has unique GUID, so a directory structure, squid-like will be needed). No parsing at all. Receive and save.
  3. The index of these files will be the registry netdata already maintains.
  4. The central dashboard will query the registry like it does today for the my-netdata menu, but then it will fetch the JSON file of each netdata to get the latest information about it, which will render in a very nice dashboard. If we do this right, it should be super fast and scalable.
  5. For alarm notifications, there will be a special message each netdata will send. These messages will just trigger the proper notification. Since the registry will maintain an email for each user, each netdata will just push the notification and all users subscribed to this netdata will be notified.

With the above setup, I plan to offer a very basic central dashboard with simple notifications. Then we will need to work to add more notification methods, customizing notifications per user, etc.

As you can understand, I need help to setup the maintenance of all these (backup / restore, fault tolerance, replication, etc).

Member

ktsaou commented Aug 2, 2017

More VMs are good, although I already have quite a few.

sysadmin work is the most demanding currently.

I need to setup an email server for my-netdata.io to send notifications. I also plan to add some kind of registration and passwordless authentication that will use emails for authenticating users (like slack). So a robust and secure email service is a prerequisite. Probably postfix, with letsencrypt certificates, etc.

Ideally, I try to write setup scripts to get everything done, at https://github.com/firehol/netdata-demo-site
So, most of the work to setup such VMs should be done with a press of a button. Another way could be to use ansible, or something similar.

The whole thing should be a little bit demonstrating for users willing to have their own setup and it should promote the use of netdata for monitoring it.

Then we have the monitoring service itself. For sure dozens or hundreds of thousands of netdata installations out there should be able to connect to this service and push status information, notifications and alarms. I want to eliminate most of the components involved to maintain such a service. So, I plan to make it work like this (it is oversimplified, but you will get the idea):

  1. Each netdata out there will push every 30 seconds a JSON file with all the information the central dashboard should know about it.
  2. The central netdata will receive this JSON file, and just save it to disk (each netdata has unique GUID, so a directory structure, squid-like will be needed). No parsing at all. Receive and save.
  3. The index of these files will be the registry netdata already maintains.
  4. The central dashboard will query the registry like it does today for the my-netdata menu, but then it will fetch the JSON file of each netdata to get the latest information about it, which will render in a very nice dashboard. If we do this right, it should be super fast and scalable.
  5. For alarm notifications, there will be a special message each netdata will send. These messages will just trigger the proper notification. Since the registry will maintain an email for each user, each netdata will just push the notification and all users subscribed to this netdata will be notified.

With the above setup, I plan to offer a very basic central dashboard with simple notifications. Then we will need to work to add more notification methods, customizing notifications per user, etc.

As you can understand, I need help to setup the maintenance of all these (backup / restore, fault tolerance, replication, etc).

@LKDevelopment

This comment has been minimized.

Show comment
Hide comment
@LKDevelopment

LKDevelopment Oct 25, 2017

@ktsaou are your plans on this actual, do you make it? or was this just a idea?

@ktsaou are your plans on this actual, do you make it? or was this just a idea?

@ktsaou

This comment has been minimized.

Show comment
Hide comment
@ktsaou

ktsaou Oct 25, 2017

Member

The plan is real, the time-frame is a problem. I have decided how to do it and most of the changes required.

At the same time, I have come with a new database format design that I believe will allow netdata to maintain a very large database without too much stress on the system (ram size, or disk I/O).

So, I am trying to decide which one to finish first.

Member

ktsaou commented Oct 25, 2017

The plan is real, the time-frame is a problem. I have decided how to do it and most of the changes required.

At the same time, I have come with a new database format design that I believe will allow netdata to maintain a very large database without too much stress on the system (ram size, or disk I/O).

So, I am trying to decide which one to finish first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment