-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increasing TCP receive queue using riemann destination #2521
Comments
@faxm0dem I quickly tried with master and the recv-q is increasing for me too. I used riemann v0.3.1, with librieman-client v1.8.1. Maybe from some version riemann sends back extra data to syslog-ng (like status code or similar) that is not read by riemann-c-client, hence the increase. There is a newer version of libriemann-c-client 1.10.4. Although I did not find anything relevant in the changelogs but worth a shot. I look into this. |
oh I see, that would make perfect sense indeed |
Hi, some notes about my use-case: Packages versions:
I did some quick tcpdump on the wire, an event produce 3 packets: At the kernel level, everything is fine. The kernel seems to just wait for syslog-ng to read the payload of the riemann ACK (second packet). It looks like riemann-c-client introduced an API that handles this. The commit message has some interesting bits regarding ACK for UDP. Thanks! |
Problem confirmed. I added this patch on top of master to see what happen. It only reads the data send by riemann, and ignore it (at least 3 things can go bad according to the API snippet of code): diff --git a/modules/riemann/riemann-worker.c b/modules/riemann/riemann-worker.c
index bab0d2d31..a7ceace40 100644
--- a/modules/riemann/riemann-worker.c
+++ b/modules/riemann/riemann-worker.c
@@ -341,6 +341,12 @@ riemann_worker_flush(LogThreadedDestWorker *s)
self->event.n = 0;
self->event.list = (riemann_event_t **) malloc(sizeof (riemann_event_t *) *
MAX(1, owner->super.batch_lines));
+
+ if (r == 0) {
+ riemann_message_t *res = riemann_client_recv_message (self->client);
+ riemann_message_free(res);
+ }
+
if (r != 0)
return LTR_ERROR;
else
I then made syslog-ng spam Riemann with the configuration of @faxm0dem and a minimal Riemann configuration that just ignores the messages from syslog-ng: (logging/init {:console? false :file "/var/log/riemann/riemann.log"})
(let [host "127.0.0.1"]
(tcp-server {:host host})
(udp-server {:host host})
(ws-server {:host host}))
(periodically-expire 5 { :keep-keys [:host :service :tags] })
(let [
index (index)
]
(streams
(tagged "riemann"
(default :ttl 10
index)
(expired
(fn [event] (info "expired" event)))))) romain@zappy ~/Projects/syslog-ng % yes Hello World | sudo ./syslog-ng/syslog-ng -F --module-path ./modules/affile/.libs/:./modules/riemann/.libs:./modules/syslogformat/.libs Result in Riemann's dashboard: In the blue box, the spamming with the patch applied: Riemann process events regularly. The queue size is almost empty (sometimes 1 message in the queue). Then I reverted the patch and did the same test with the original code, the spamming is in the red box: throughput look inconsistent, but at least is not stable; the queue size is not empty; latency metrics only appeared once I killed syslog-ng (but since I killed it after ~120s, and it jumps at ~120000 ms, this seems correlated). |
When Riemann receive a message, it sends an acknowledgements to the sending process when the connexion is over TCP (TCP or TLS mode of operation). Not reading this acknowledgement make these messages fill-in the Recv-Q on the system running syslog-ng. The riemann_communicate() function of riemann-c-client is a wrapper around riemann_client_send_message_oneshot() that also reads this acknowledgement if it exist, or mocks it when using UDP. This patch change the riemann module to rely on riemann_communicate() to send events and avoid filling-in the Recv-Q. Because the function returns a more granular riemann_message_t, update the trace message to include these details, and adjust the error detection code accordingly. Fixes syslog-ng#2521 Signed-off-by: Romain Tartière <romain@blogreen.org>
This was first reported to me by @smortex
Using syslog-ng-3.9.1 or master, we experience that the linux TCP receive-queue of the sender ever increases.
so whenever we send one message to the log path, the following figure increases:
Example in production:
The text was updated successfully, but these errors were encountered: