Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read acknowledgements send by Riemann #2523

Merged
merged 1 commit into from Feb 4, 2019

Conversation

smortex
Copy link
Contributor

@smortex smortex commented Feb 1, 2019

When Riemann receive a message, it sends an acknowledgements to the sending process when the connexion is over TCP (TCP or TLS mode of operation). Not reading this acknowledgement make these messages fill-in the Recv-Q on the system running syslog-ng.

The riemann_communicate() function of riemann-c-client is a wrapper around riemann_client_send_message_oneshot() that also reads this acknowledgement if it exist, or mocks it when using UDP.

This patch change the riemann module to rely on riemann_communicate() to send events and avoid filling-in the Recv-Q. Because the function returns a more granular riemann_message_t, update the trace message to include these details, and adjust the error detection code accordingly.

Fixes #2521

@kira-syslogng
Copy link
Contributor

This user does not have permission to start the build. Can one of the admins verify this patch and start the build?
(admin: you have the next options (make sure you checked the code):
"ok to test" to accept this pull request (and further changes) for testing
"test this please" for a one time test run
do nothing -> CI won't start)

1 similar comment
@kira-syslogng
Copy link
Contributor

This user does not have permission to start the build. Can one of the admins verify this patch and start the build?
(admin: you have the next options (make sure you checked the code):
"ok to test" to accept this pull request (and further changes) for testing
"test this please" for a one time test run
do nothing -> CI won't start)

@furiel
Copy link
Collaborator

furiel commented Feb 1, 2019

@kira-syslogng ok to test

@kira-syslogng
Copy link
Contributor

Build FAILURE

@smortex
Copy link
Contributor Author

smortex commented Feb 1, 2019

Error 404 when I try to see the failure 😱
Same when I try to access the passed test… Maybe a permission problem?

@furiel
Copy link
Collaborator

furiel commented Feb 1, 2019

@smortex thanks for the investigation and for the pull request.

I just tested and unfortunately it crashes for me in two cases. I am not sure about the root cause but these may be connected. I got a clue with double free variant executed with valgrind:

==9034== Invalid free() / delete / delete[] / realloc()
==9034==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9034==    by 0x778FAE3: riemann_dw_free (riemann-worker.c:423)
[...]
==9034==  Address 0x73a0050 is 0 bytes inside a block of size 40 free'd
==9034==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9034==    by 0x7DB958A: protobuf_c_message_free_unpacked (in /usr/lib/x86_64-linux-gnu/libprotobuf-c.so.1.0.0)

so maybe there is some trouble about the ownership of self->event.list when the messages could not be sent.

  • crash in `riemann_communicate

With a simple configuration:

destination d_riemann {
    riemann(
      port(5555)
      type(tcp)
    );
};

Crash condition:

  1. start riemann
  2. start syslog-ng
  3. stop riemann
  4. send one message by syslog-ng (this will obviously fail)
  5. stop syslog-ng -> crash
#0  0x00007fb826f4343a in protobuf_c_message_free_unpacked ()
   from /usr/lib/x86_64-linux-gnu/libprotobuf-c.so.1
#1  0x00007fb826f43689 in protobuf_c_message_free_unpacked ()
   from /usr/lib/x86_64-linux-gnu/libprotobuf-c.so.1
#2  0x00007fb82735e18d in riemann_communicate ()
   from /usr/lib/x86_64-linux-gnu/libriemann-client.so.0
#3  0x00007fb827569682 in riemann_worker_flush (s=0x1b4eed0)
    at /home/furiel/workspace/test-syslogng/syslog-ng/modules/riemann/riemann-worker.c:326
[...]
  • double free

With a little more complex configuration: send messages in batches of 5, with 5s timeout, if the batch is not full:

destination d_riemann {
    riemann(
      port(5555)
      type(tcp)
      batch-lines(5)
      batch-timeout(5000)
    );
};
  1. start riemann
  2. start syslog-ng
  3. send one message (so that we have an incomplete batch)
  4. stop riemann before the timeout
  5. stop syslog-ng before the timeout -> double free

There is a variant when syslog-ng is stoppend after the timeout, I had a crash in that case too, but got the same backtrace as when the batch size was 1.

@Kokan
Copy link
Collaborator

Kokan commented Feb 1, 2019

@smortex yes, you cannot check the result because of permission settings

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@smortex
Copy link
Contributor Author

smortex commented Feb 1, 2019

Edit: Problem fixed

@furiel I added a commit that seems to improve the situation (after a rebase on top of master and --signoff of my previous commit).

Yet, I seem to have a crash that seems to be beyond the scope of this PR when I:

  1. Start riemann
  2. Start syslog-ng
  3. Stop riemann
  4. Start riemann

syslog-ng seems to crash after it reconnects.

My configuration:

@version: 3.19

log {
	source {
		file("/dev/stdin");
	};
	rewrite {
		set-tag("syslog");
	};
	destination {
		riemann(
				server("127.0.0.1")
				port(5555)
				type(tcp)
				timeout(5)   # <-- Added this
		       );
	};
};
while echo 'moin'; do sleep 0.2; done | sudo ./syslog-ng/syslog-ng -F --module-path ./modules/affile/.libs/:./modules/riemann/.libs:./modules/syslogformat/.libs
(gdb) bt
#0  0x0000000803c3ff54 in protobuf_c_message_get_packed_size () from /usr/local/lib/libprotobuf-c.so.1
#1  0x0000000803c40319 in protobuf_c_message_get_packed_size () from /usr/local/lib/libprotobuf-c.so.1
#2  0x0000000803a3b810 in msg__get_packed_size (message=0x805a1a0a0) at lib/riemann/proto/riemann.pb-c.c:155
#3  0x0000000803a3a51c in riemann_message_to_buffer (message=0x805a1a0a0, len=0x7fffdfffdb78) at message.c:324
#4  0x0000000803a37860 in _riemann_client_send_message_tcp (client=0x805a2e000, message=0x805a1a0a0) at lib/riemann/client/tcp.c:46
#5  0x0000000803a3771b in riemann_client_send_message (client=0x805a2e000, message=0x805a1a0a0) at client.c:308
#6  0x0000000803a37754 in riemann_client_send_message_oneshot (client=0x805a2e000, message=0x805a1a0a0) at client.c:317
#7  0x0000000803a3aeb7 in riemann_communicate (client=0x805a2e000, message=0x805a1a0a0) at simple.c:110
#8  0x000000080382af50 in riemann_worker_flush (s=0x803423800) at riemann-worker.c:329
#9  0x000000080382b8b6 in log_threaded_dest_worker_flush (self=0x803423800) at logthrdestdrv.h:186
#10 0x000000080382b30a in _insert_single (self=0x803423800, msg=0x805c48700) at riemann-worker.c:382
#11 0x000000080382ae25 in riemann_worker_insert (s=0x803423800, msg=0x805c48700) at riemann-worker.c:416
#12 0x0000000800876ec8 in log_threaded_dest_worker_insert (self=0x803423800, msg=0x805c48700) at logthrdestdrv.h:177
#13 0x0000000800876964 in _perform_inserts (self=0x803423800) at logthrdestdrv.c:350
#14 0x00000008008763ca in _perform_work (data=0x803423800) at logthrdestdrv.c:462
#15 0x00000008008ddfa7 in iv_run_tasks (st=0x805a23000) at iv_task.c:48
#16 0x00000008008e081c in iv_main () at iv_main_posix.c:99
#17 0x00000008008776b6 in _worker_thread (arg=0x803423800) at logthrdestdrv.c:652
#18 0x000000080087e80d in _worker_thread_func (st=0x8034bb5e0) at mainloop-worker.c:339
#19 0x0000000801df47da in g_thread_unref () from /usr/local/lib/libglib-2.0.so.0
#20 0x0000000802740c06 in pthread_create () from /lib/libthr.so.3
#21 0x0000000000000000 in ?? ()
(gdb) f 3
#3  0x0000000803a3a51c in riemann_message_to_buffer (message=0x805a1a0a0, len=0x7fffdfffdb78) at message.c:324
324	  l = msg__get_packed_size (message) + sizeof (buff->header);
(gdb) l
319	    {
320	      errno = EINVAL;
321	      return NULL;
322	    }
323	
324	  l = msg__get_packed_size (message) + sizeof (buff->header);
325	  buff = (struct buff *) malloc (l);
326	  msg__pack (message, buff->data);
327	
328	  buff->header = htonl (l - sizeof (buff->header));
(gdb) print *message
$2 = {base = {descriptor = 0x803c3d478, n_unknown_fields = 0, unknown_fields = 0x0}, has_ok = 0, ok = 0, error = 0x0, n_states = 0, states = 0x0, query = 0x0, n_events = 2, events = 0x80341cf50}
(gdb) print message->events[0]
$4 = (Event *) 0x726576697264
(gdb) print message->events[1]
$5 = (Event *) 0x805a36000
(gdb) print *( message->events[1])
$6 = {base = {descriptor = 0x803c3d318, n_unknown_fields = 0, unknown_fields = 0x0}, has_time = 1, time = 1549047788, state = 0x0, service = 0x805a0d060 "This", host = 0x805a0d038 "zappy", description = 0x0, n_tags = 2, 
  tags = 0x805a45020, has_ttl = 0, ttl = 0, n_attributes = 0, attributes = 0x0, has_time_micros = 0, time_micros = 0, has_metric_sint64 = 0, metric_sint64 = 0, has_metric_d = 0, metric_d = 0, has_metric_f = 0, metric_f = 0}
(gdb) print *( message->events[0])
Cannot access memory at address 0x726576697264

So the condition seems to be similar to the one the workaround fix, the message holds an event which address is not valid.

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@smortex
Copy link
Contributor Author

smortex commented Feb 1, 2019

@furiel Nailed it! Can you please test this and confirm it's working as expected?

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@furiel
Copy link
Collaborator

furiel commented Feb 2, 2019

@smortex it works indeed. Please squash the second commit into the first so that the code remain bisect-able.

When Riemann receive a message, it sends an acknowledgements to the
sending process when the connexion is over TCP (TCP or TLS mode of
operation).  Not reading this acknowledgement make these messages
fill-in the Recv-Q on the system running syslog-ng.

The riemann_communicate() function of riemann-c-client is a wrapper
around riemann_client_send_message_oneshot() that also reads this
acknowledgement if it exist, or mocks it when using UDP.

This patch change the riemann module to rely on riemann_communicate() to
send events and avoid filling-in the Recv-Q.  Because the function
returns a more granular riemann_message_t, update the trace message to
include these details, and adjust the error detection code accordingly.

Fixes syslog-ng#2521

Signed-off-by: Romain Tartière <romain@blogreen.org>
@smortex
Copy link
Contributor Author

smortex commented Feb 2, 2019

🎉 Here we go!

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@Kokan Kokan merged commit fa6656e into syslog-ng:master Feb 4, 2019
@smortex smortex deleted the read-riemann-ack branch June 25, 2019 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants