New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: sending to multiple backends #2352

Open
jpriebe opened this Issue Jun 16, 2017 · 9 comments

Comments

Projects
None yet
5 participants
@jpriebe
Copy link
Contributor

jpriebe commented Jun 16, 2017

I know you can specify two backend hosts in netdata.conf. Netdata will try the first, and use the second as a backup. It won't send to both.

It would be nice if it could just send the data to both hosts; this would allow for "poor-man's replication" of the metrics. For example, you could run influxdb on two hosts and have netdata send its metrics to both.

If one goes down temporarily, you will have a gap in its data, but at least you'll have data from that time period on the other machine.

In theory, you could copy over the missing data from the other server when the failed machine comes back up.

It seems like it wouldn't be too hard to add the option to just send to all configured backend hosts.

@ktsaou

This comment has been minimized.

Copy link
Member

ktsaou commented Jun 16, 2017

hm... it is a bit more complicated. Each backend is a thread. Then, I am sure people will ask for different settings on each backend (ie one graphite, one opentsdb), etc. So, it will become a project.

There are workarounds though... so you can do both of the solutions you propose with the current netdata.

To push the metrics to 2 backends concurrently, you can do it by having 2 netdata, even on the same machine. The idea is that for the second netdata:

  • you will bind it to 127.0.0.1:19998
  • you will disable all data collection (everything under the [plugins] section and [statsd]) and set all its directories to /tmp
  • you will configure it to receive metrics from the other netdata - you can also set its memory mode = ram and history = 60, so it will maintain a very short database in memory.
  • you will configure it to push its metrics to influxdb 2

Most of the above settings (all except streaming) can be given on a netdata command line (ie netdata -W set ...), but you can also supply a different netdata.conf with option -c.

So, netdata 1 will be sending metrics to netdata 2 and influxdb 1. Then, netdata 2 will be sending metrics to influxdb 2.

Of course, if you have multiple machines, you can send metrics from each machine to influxdb 1 and to a central netdata. Then at the central netdata push all metrics to influxdb 2.


There is also a workaround for handling failures:

  • start with a screen session: nc -l 1234 >/tmp/backup-metrics.txt
  • configure netdata to use nchost:1234 as the second backend (it will not be used if the first is up)
  • when the first backend fails, netdata will push all metrics to nchost:1234 and nc will save them to /tmp/backup-metrics.txt.
  • once the operation of your database is restored, kill nc (netdata will resume sending metrics to the first backend), run cat /tmp/backup-metrics.txt | nc --send-only influxdb:PORT to push the saved metrics to the database.

They are not automated, but anyway both solutions are just half an hour of work to set them up...

@ktsaou ktsaou added the enhancement label Jun 16, 2017

@jpriebe

This comment has been minimized.

Copy link
Contributor

jpriebe commented Jun 16, 2017

Great workarounds! I was just reading through the backend code, and I can appreciate the complexity of this change.

I like the idea of two netdatas on one machine. I imagine that the one with the small buffer would be pretty lightweight in terms of resource usage, since it's really just acting as a passthrough. Maybe not quite as lightweight as a simple nc, but hopefully pretty lightweight.

I also like your idea of using a centralized netdata. Once #2304 is implemented, that would be a good option (otherwise, you'd lose all your host-specific tags when the data moves to the centralized server.

I'll close this issue, since there are practical workarounds. Thanks!

@jpriebe jpriebe closed this Jun 16, 2017

@ktsaou

This comment has been minimized.

Copy link
Member

ktsaou commented Jun 16, 2017

The passthrough netdata will not be collecting any metrics by itself, it yes it would be very light.

You can make it very lightweight in terms of CPU usage, if you are sending metrics to backend as collected. In this case set memory mode = none. This will disable all netdata database maintainenance functions, including interpolation of the metrics. So, the only CPU this netdata will use is parsing the streamed data and pushing them to the backend. Again, not as fast as nc, but the fastest it can get.

ktsaou added a commit to ktsaou/netdata that referenced this issue Jun 17, 2017

@ktsaou

This comment has been minimized.

Copy link
Member

ktsaou commented Jun 17, 2017

@jpriebe check this script.

I tried to automate the backend fallback with a script that handles all the details.
The backend replay is not tested yet. If you can test it, please do so.

@ktsaou

This comment has been minimized.

Copy link
Member

ktsaou commented Jun 17, 2017

Get the latest script from PR #2354

@zhangxin0112

This comment has been minimized.

Copy link

zhangxin0112 commented Jun 29, 2017

i am the guy.
" Each backend is a thread. Then, I am sure people will ask for different settings on each backend (ie one graphite, one opentsdb), etc. So, it will become a project."

my requirement is netdata send data to kafka(now use nc to kafka) and influxdb .
i will try the PR #2354.

@ktsaou

This comment has been minimized.

Copy link
Member

ktsaou commented Aug 6, 2017

Exactly... it is a project then...

@ktsaou ktsaou reopened this Aug 10, 2017

@ktsaou

This comment has been minimized.

Copy link
Member

ktsaou commented Aug 10, 2017

reopened it.

@paulfantom paulfantom added module/core and removed enhancement labels Sep 22, 2018

@stale

This comment has been minimized.

Copy link

stale bot commented Nov 23, 2018

Currently netdata team doesn't have enough capacity to work on this issue. We will be more than glad to accept a pull request with a solution to problem described here. This issue will be closed after another 60 days of inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment