Home

Costa Tsaousis edited this page Jul 12, 2018 · 223 revisions

General


Running Netdata

Special Uses

Notes on memory management


Database Replication and Mirroring


Backends
archiving netdata collected metrics to a time-series database


Health monitoring - Alarms
alarms and alarm notifications in netdata


Netdata Registry


Monitoring Info


Netdata Badges


Data Collection

Binary Modules

Python Modules

Node.js Modules

BASH Modules

Active BASH Modules

Obsolete BASH Modules

JAVA Modules


API Documentation


Web Dashboards


Running behind another web server


Package Maintainers


Donations


Blog


Other monitoring tools

Clone this wiki locally

image7

Welcome to netdata!

User Base Monitored Servers Sessions Served
(the figures come from netdata registry data, showing only installations that use the public global registry, counting since May 16th 2016)

New Users Today New Machines Today Sessions Today
Check Generating Badges for more information.

latest news

🥇 Check here the changelog of the current netdata vs the latest release (v1.10.0)


Mar 27th, 2018 - netdata v1.10.0 released!

  • new web server, a lot faster and more secure
  • updated all javascript libraries to their latest versions (fixed compatibility issues - now netdata chart can now be embedded on Atlassian Confluence pages and remain fully interactive!)
  • new plugins:
    • BTRFS (visualize BTRFS allocation with alarms)
    • bcache (monitor hybrid setups HDD + SSD)
    • ceph
    • nginx plus
    • libreswan (monitor the traffic of IPSEC tunnels)
    • traefik
    • icecast
    • ntpd
    • httpcheck (monitor any remote web server)
    • portcheck (monitor any remote TCP port)
    • spring-boot (monitor java spring-boot apps)
    • dnsdist
    • Linux hugepages
  • improved plugins:
    • statsd
    • web_log
    • cgroups for containers and VMs monitoring (netdata now supports systemd-nspawn and kubernetes - fixed security issue with cgroup-network)
    • Linux memory
    • diskspace
    • network interfaces
    • postgres
    • rabbitmq
    • apps.plugin (now it also tracks swap usage per process)
    • haproxy
    • uptime
    • ksm (kernel memory deduper)
    • mdstat (software raid)
    • elasticsearch
    • apcupsd
    • dhcpd
    • fronius
    • stiebeletron
  • new alarm notification methods
    • alerta
    • IRC (post on IRC channels)
  • and dozens more improvements, enhancements, features and compatibility fixes

demo sites

Live demo installations of netdata are available at https://my-netdata.io:

Location netdata demo URL 60 mins reqs VM Donated by
London (UK) london.my-netdata.io
(this is the global netdata registry and has named and mysql charts)
Requests Per Second DigitalOcean.com
Atlanta (USA) cdn77.my-netdata.io
(with named and mysql charts)
Requests Per Second CDN77.com
Israel octopuscs.my-netdata.io Requests Per Second OctopusCS.com
Roubaix (France) ventureer.my-netdata.io Requests Per Second Ventureer.com
Madrid (Spain) stackscale.my-netdata.io Requests Per Second StackScale Spain
Bangalore (India) bangalore.my-netdata.io Requests Per Second DigitalOcean.com
Frankfurt (Germany) frankfurt.my-netdata.io Requests Per Second DigitalOcean.com
New York (USA) newyork.my-netdata.io Requests Per Second DigitalOcean.com
San Francisco (USA) sanfrancisco.my-netdata.io Requests Per Second DigitalOcean.com
Singapore singapore.my-netdata.io Requests Per Second DigitalOcean.com
Toronto (Canada) toronto.my-netdata.io Requests Per Second DigitalOcean.com

Netdata dashboards are mobile and touch friendly.

netdata at a glance

Click this image to interact with it (most icons link to related documentation):

netdata-overview

Installation

Want to set it up on your systems now? Jump to Installation.


A welcome note

Welcome. I am @ktsaou, the founder of firehol.org and my-netdata.io.

netdata is a scalable, distributed, real-time, performance and health monitoring solution for Linux, FreeBSD and MacOS. It is open-source too.

Out of the box, it collects 1k to 5k metrics per server per second. It is the corresponding of running top, vmstat, iostat, iotop, sar, systemd-cgtop and a dozen more console tools in parallel. netdata is very efficient in this: the daemon needs just 1% to 3% cpu of a single core, even when it runs on IoT.

Many people view netdata as a collectd + graphite + grafana alternative, or compare it with cacti or munin. All these are really great tools, but they are not netdata. Let's see why.

My primary goal when I was designing netdata was to help us find why our systems and applications are slow or misbehaving. To provide a system that could kill the console for performance monitoring.

To do this, I decided that:

  • high resolution metrics is more important than long history
  • the more metrics collected, the better - we should not fear to add 1k metrics more per server
  • effective monitoring starts with monitoring everything about each node

Enterprises usually have dedicated resources and departments for collecting and analyzing system and application metrics at similar resolution and scale. netdata attempts to offer this functionality to everyone, without the dedicated resources - of course within limits.

For big setups, netdata can archive its metrics to graphite, opentsdb, prometheus and all compatible ones (kairosdb, influxdb, etc). This allows even enterprises with dedicated departments and infrastructure, to use netdata for data collection and real-time alarms.

Metrics in netdata are organized in collections called charts. Charts are meaningful entities, they have a purpose, a scope. This makes netdata extremely useful for learning the underlying technologies, for understanding how things work and what is available.

The organization of the dashboard is such to allow us quickly and easily search metrics affecting or affected by an event. Just center and zoom the event time frame on a chart, mark it (with ALT or CONTROL + area select), and scroll the dashboard top to bottom. You will be able to spot all the charts that have been influenced or have influence the event. Using the my-netdata menu to navigate between netdata servers maintains all these dashboard states, so you quickly analyze even multi-server performance issues.

Netdata also supports real-time alarms. Netdata alarms can be setup on any metric or combination of metrics and can send notifications to:

  • email addresses
  • slack channels
  • discord channels
  • IRC channels
  • pushover
  • pushbullet
  • telegram.org
  • pagerduty
  • twilio
  • messagebird
  • alerta
  • flock
  • twillo
  • kavenegar
  • syslog

Alarms are role based (each alarm can go to one or more roles), roles are multi-recipient and multi-channel (i.e. sysadmin = several email recipients + pushover) and each recipient may filter severity. You can also add more notification methods quite easily (it is a shell script).

The number of metrics collected by netdata provides very interesting alarms. Install netdata and run this:

while [ 1 ]; do telnet HOST 12345; done

where HOST is your default gateway (12345 is a random not-used port). It will not work of course. But leave it running for a few seconds. You will get an alert that your system is receiving an abnormally high number of TCP resets. If HOST is also running netdata, you will receive another alert there, that the system is sending an abnormally high number of TCP resets. This means that if you run a busy daemon and it crashes, you will get notified, although netdata knows nothing specific about it.

Of course netdata is young and still far from a complete monitoring solution that could replace everything. We work on it... patience...

What is it?

netdata is scalable, distributed real-time performance and health monitoring:

distributed

A netdata should be installed on each of your servers. It is the equivalent of a monitoring agent, as provided by all other monitoring solutions. It runs everywhere a Linux kernel runs: PCs, servers, embedded devices, IoT, etc.

Netdata is very resource efficient and you can control its resource consumption. It will use:

  • some spare CPU cycles, usually just 1-3% of a single core (check Performance),
  • the RAM you want it have (check Memory Requirements), and
  • no disk I/O at all, apart its logging (check Log Files). Of course it saves its DB to disk when it exits and loads it back when it starts.

scalable

Unlike traditional monitoring solutions that move all the metrics collected on all servers, to a central place, netdata by default keeps all the data on the server they are collected.

This allows netdata to collect thousands of metrics per second on each server.

When you use netdata, adding 10 more servers or collecting 10000 more metrics does not have any measurable impact on the monitoring infrastructure or the servers they are collected. This provides virtually unlimited scalability.

netdata collected metrics can be pushed to central time-series databases (like graphite, opentsdb or prometheus) for archiving (check netdata backends), and netdata can push these data at a lower frequency/detail to allow these servers scale. This is not required though. It exists only for long-term archiving and netdata never uses these databases as a data source.

real-time

Everything netdata does is per-second so that the dashboards presented are just a second behind reality, much like the console tools do. Of course, when netdata is installed on weak IoT devices, this frequency can be lowered, to control the CPU utilization of the device.

netdata is adaptive. It adapts its internal structures to the system it runs, so that the repeating task of data collection is performed utilizing the minimum of CPU resources.

The web dashboards are also real-time and interactive. netdata achieves this, by splitting the work load, between the server and the dashboard client (i.e. your web browser). Each server is collecting the metrics and maintaining a very fast round-robin database in memory, while providing basic data manipulation tasks (like data reduction functions), while each web client accessing these metrics is taking care of everything for data visualization. The result is:

  • minimum CPU resources on the servers
  • fully interactive real-time web dashboards, with some CPU pressure on the web browser while the dashboard is shown.

performance monitoring

netdata collects and visualizes metrics. If it is a number and it can be collected somehow, netdata can visualize it. Out of the box, it comes with plugins that collect hundreds of system metrics and metrics of popular applications.

health monitoring

netdata provides powerful alarms and notifications. It comes preconfigured with dozens of alarms to detect common health and performance issues and it also accepts custom alarms defined by you.


Documentation

This wiki is the whole of it. Other than the wiki, currently there is the... source code.

You should at least walk through the pages of the wiki. They have a good overview of netdata, what it can do and how to use it.


Support

If you need help, please use the github issues section.


FAQ

Is it ready?

Software is never ready. There is always something to improve.

Netdata is stable. We use it on production systems without any issues.

Is it released?

Yeap! Check the releases page.

Why you wrote data collection?

Well... there are plenty of data collectors already. But we have one or more of the following problems with them:

  • They are not able for per second data collection
  • They can do per second data collection, but they are not optimized enough for always running on all systems
  • They need to be configured, while we need auto-detection

Of course, we could use them just to get data at a slower rate, and this can be done, but it was not our priority. netdata proves that real-time data collection and visualization can be done efficiently.

Is it practical to have so short historical data?

For a few purposes yes, for others no.

Our focus is real-time data collection and visualization. Our (let's say) "competitors" are the console tools. If you are looking for a tool to get "statistics about past performance", netdata is the wrong tool.

Of course, historical data is our next priority.

Why there is no "central" netdata?

There is. You can configure a netdata to act as a central netdata for your network, where all hosts stream metrics in real-time to it. netdata also supports headless collectors, headless proxies, store and forward proxies, in all possible combinations.

However, we strongly believe monitoring should be scaled out, not up. A "central" monitoring server is just another problem and should be avoided.

We all have a wonderful tool on our desktops, which connects us to the entire world: the web browser! This is the "central" netdata that connects all the netdata installations. We have done a lot of work towards this and we believe we are very close to show you what we mean.

Keep in mind netdata versions 1.6+ support data replication and mirroring by streaming collected metrics in real-time to other netdata and versions 1.5+ support data archiving the time-series databases.

Can I help?

Of course! Please do!

As with all open source projects, the more people using it, the better the project is. So give it a github star, post about it on facebook, twitter, reddit, google+, hacker news, etc. Spreading the word costs you nothing and helps the project improve. It is the minimum you should give back for using a project that has thousands of hours of work in it and you get it for free.

Also important is to open github issues for the things that are not working well for you. This will help us make netdata better.

These are others areas we need help:

  • Can you code?

    • you can write plugins for data collection. This is very easy and any language can be used.
    • you can write dashboards, specially optimised for monitoring the applications you use.
  • Can you write documentation?

    • We have left the wiki open for anyone to edit. If any area needs further details, you can edit that page. (Of course we monitor all edits - so please try to contribute and not destroy things.)
  • Do you have special skills?

    • are you a marketing guy? Help us setup a social media strategy to build and grow the netdata community.
    • are you a devops guy? Help us setup and maintain the public global servers.
    • are you a linux packaging guy? Help us distribute pre-compiled packages of netdata for all major distributions, or help netdata be included in official distributions.

Is there a roadmap?

These are what we currently work on (in that order):

  1. Finish packaging for the various distros.

  2. Add health monitoring (alarms, notifications, etc)

  3. More plugins - a lot more plugins!

  • monitor more applications (hadoop and friends, postgres, etc)
  • rewrite the netfilter plugin to use libnlm.
  • allow internal plugins to be forked to external processes (this will protect the netdata daemon from plugin crashes, allow different security schemes for each plugin, etc).
  1. Improve the memory database (possibly using an internal deduper, compression, disk archiving, mirroring it to third party databases, etc).

  2. Invent a flexible UI to connect multiple netdata server together. We have done a lot of progress with the registry and the my-netdata menu, but still there are a lot more to do.

  3. Document everything (this is a work in progress already).

There are a lot more enhancements requested from our users (just navigate through the issues to get an idea). Enhancements like authentication on UI, alarms and alerts, etc will fit somehow into this list. Patience...