Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing TCP receive queue using riemann destination #2521

Closed
faxm0dem opened this issue Jan 31, 2019 · 4 comments
Closed

Increasing TCP receive queue using riemann destination #2521

faxm0dem opened this issue Jan 31, 2019 · 4 comments
Assignees

Comments

@faxm0dem
Copy link
Contributor

This was first reported to me by @smortex
Using syslog-ng-3.9.1 or master, we experience that the linux TCP receive-queue of the sender ever increases.

log {
	source {
	  file("/dev/stdin");
	};
	rewrite {
		set-tag("syslog");
	};
	destination {
		riemann(
		  server("127.0.0.1")
		  port(55555)
		  type(tcp)
		);
	};
};

so whenever we send one message to the log path, the following figure increases:

$netstat -tapn | grep 55555
tcp       36      0 127.0.0.1:38018         127.0.0.1:55555         ESTABLISHED 21904/syslog-ng
$netstat -tapn | grep 55555
tcp       42      0 127.0.0.1:38018         127.0.0.1:55555         ESTABLISHED 21904/syslog-ng
$netstat -tapn | grep 55555
tcp       46      0 127.0.0.1:38018         127.0.0.1:55555         ESTABLISHED 21904/syslog-ng

Example in production:

$netstat -tapn | grep 5555
tcp   2230284      0 192.168.239.22:32928    192.168.239.18:5555     ESTABLISHED 1847/syslog-ng
@furiel
Copy link
Collaborator

furiel commented Jan 31, 2019

@faxm0dem I quickly tried with master and the recv-q is increasing for me too. I used riemann v0.3.1, with librieman-client v1.8.1.

Maybe from some version riemann sends back extra data to syslog-ng (like status code or similar) that is not read by riemann-c-client, hence the increase.

There is a newer version of libriemann-c-client 1.10.4. Although I did not find anything relevant in the changelogs but worth a shot. I look into this.

@furiel furiel self-assigned this Jan 31, 2019
@faxm0dem
Copy link
Contributor Author

oh I see, that would make perfect sense indeed

@smortex
Copy link
Contributor

smortex commented Jan 31, 2019

Hi, some notes about my use-case:

Packages versions:

I did some quick tcpdump on the wire, an event produce 3 packets:

At the kernel level, everything is fine. The kernel seems to just wait for syslog-ng to read the payload of the riemann ACK (second packet).

It looks like riemann-c-client introduced an API that handles this. The commit message has some interesting bits regarding ACK for UDP.

Thanks!

@smortex
Copy link
Contributor

smortex commented Feb 1, 2019

Problem confirmed. I added this patch on top of master to see what happen. It only reads the data send by riemann, and ignore it (at least 3 things can go bad according to the API snippet of code):

diff --git a/modules/riemann/riemann-worker.c b/modules/riemann/riemann-worker.c
index bab0d2d31..a7ceace40 100644
--- a/modules/riemann/riemann-worker.c
+++ b/modules/riemann/riemann-worker.c
@@ -341,6 +341,12 @@ riemann_worker_flush(LogThreadedDestWorker *s)
   self->event.n = 0;
   self->event.list = (riemann_event_t **) malloc(sizeof (riemann_event_t *) *
                                                  MAX(1, owner->super.batch_lines));
+
+  if (r == 0) {
+       riemann_message_t *res = riemann_client_recv_message (self->client);
+       riemann_message_free(res);
+  }
+
   if (r != 0)
     return LTR_ERROR;
   else

I then made syslog-ng spam Riemann with the configuration of @faxm0dem and a minimal Riemann configuration that just ignores the messages from syslog-ng:

(logging/init {:console? false :file "/var/log/riemann/riemann.log"})

(let [host "127.0.0.1"]
  (tcp-server {:host host})
  (udp-server {:host host})
  (ws-server  {:host host}))

(periodically-expire 5 { :keep-keys [:host :service :tags] })

(let [
      index (index)
  ]
  (streams
    (tagged "riemann"
      (default :ttl 10
        index)

      (expired
        (fn [event] (info "expired" event))))))
romain@zappy ~/Projects/syslog-ng % yes Hello World | sudo ./syslog-ng/syslog-ng -F --module-path ./modules/affile/.libs/:./modules/riemann/.libs:./modules/syslogformat/.libs

Result in Riemann's dashboard:

In the blue box, the spamming with the patch applied: Riemann process events regularly. The queue size is almost empty (sometimes 1 message in the queue).

Then I reverted the patch and did the same test with the original code, the spamming is in the red box: throughput look inconsistent, but at least is not stable; the queue size is not empty; latency metrics only appeared once I killed syslog-ng (but since I killed it after ~120s, and it jumps at ~120000 ms, this seems correlated).

smortex added a commit to smortex/syslog-ng that referenced this issue Feb 2, 2019
When Riemann receive a message, it sends an acknowledgements to the
sending process when the connexion is over TCP (TCP or TLS mode of
operation).  Not reading this acknowledgement make these messages
fill-in the Recv-Q on the system running syslog-ng.

The riemann_communicate() function of riemann-c-client is a wrapper
around riemann_client_send_message_oneshot() that also reads this
acknowledgement if it exist, or mocks it when using UDP.

This patch change the riemann module to rely on riemann_communicate() to
send events and avoid filling-in the Recv-Q.  Because the function
returns a more granular riemann_message_t, update the trace message to
include these details, and adjust the error detection code accordingly.

Fixes syslog-ng#2521

Signed-off-by: Romain Tartière <romain@blogreen.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants