disk-buffer: trivial performance-related fixes and refactor #3743

MrAnno · 2021-07-26T11:53:31Z

We are currently profiling and measuring the performance of the disk-buffer() module in different scenarios in order to identify its bottlenecks and to improve its speed.

This PR is the first step in this adventure.
As we got familiar with the code base, we found some trivial performance-related issues, which we've fixed:

reduced two consecutive pwrite() calls to a single one when writing the serialized message and its length
removed unnecessary read/deserialization logic when skipping messages that are already available in the qreliable memory segment
replaced heap allocations with scratch_buffers on hot paths

This resulted in a ~20% performance improvement in general.

The PR also contains refactor commits.

For reviewers:

Click to expand!

The reliable disk buffer uses 2 memory queues:

qreliable:
- it's just a technicality, it is only used to kick flow-control in when the free space is less than mem-buf-size() in the disk-buffer file
- messages stored in this queue are also written into the disk-buffer file, so these messages will never be lost
qbacklog: outgoing messages that are not acked yet

The non-reliable disk buffer uses 3 memory queue segments:

qout: a non-reliable segment that increases speed to the detriment of reliability, its size can be configured with qout-size()
- the first qout-size() number of messages (64 by default) are stored only in the qout memory queue for performance reasons
- we only write/read messages to/from disk after the qout queue is full
qoverflow:
- this is used for the same reason as qreliable, to kick flow-control in when the disk-buffer file is full
- it stores mem-buf-length() number of messages after the disk-buffer file is fully saturated
qbacklog: outgoing messages that are not acked yet

kira-syslogng · 2021-07-26T12:15:21Z

Build SUCCESS

MrAnno · 2021-07-26T13:14:10Z

@kira-syslogng do stresstest

kira-syslogng · 2021-07-26T13:39:53Z

Kira-stress-test: Build SUCCESS

modules/diskq/logqueue-disk-reliable.c

modules/diskq/logqueue-disk.c

modules/diskq/qdisk.c

modules/diskq/logqueue-disk-reliable.c

modules/diskq/qdisk.c

szemere · 2021-07-28T15:42:16Z

modules/diskq/qdisk.c

@@ -427,6 +427,16 @@ _is_trying_to_read_from_unwritten_space(QDisk *self)
  return self->hdr->read_head == self->hdr->write_head;
 }

+static gssize
+_read_record_length_from_disk(QDisk *self, guint32 *record_length)


optional: I was about to comment the lack of this (6aca8bf3961ae83a2d68ebf80bbcd0ab0b0ca397) refactor here, when I found it. Squash them together?

kira-syslogng · 2021-07-29T10:01:55Z

Build FAILURE

gaborznagy

I have only found a minor thing, you guys found everyhing.

modules/diskq/logqueue-disk.c

kira-syslogng · 2021-07-29T10:47:23Z

Build SUCCESS

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

…d move We can use that value, because they are the same. From glib/glib/gstring.c: ``` GString * g_string_set_size (GString *string, gsize len) { g_return_val_if_fail (string != NULL, NULL); if (len >= string->allocated_len) g_string_maybe_expand (string, len - string->len); string->len = len; string->str[len] = 0; return string; } ``` Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

This commit changes the atomicity of changing the value of read_head. It is important, because read_head is on an mmap-ed region, and if a crash occurs, we still want to have a valid disk-buffer. The old could change read_head twice, if the first change pointed to a position, which nedded wrapping, it was then changed again to the beginning of the file (QDISK_RESERVED_SPACE). In this case, if a crash occurred between the two, we did not wrap the read_head, so in the next pop_head after the restart, we might have read from an invalid space. The new code calculates the new read_head position with a local variable and sets read_head once to the valid position. So we can either crash before setting or after setting. If we crash after we already changed the read_head, so everything is fine, if we crashed before, we will try to read the same message again after the restart. Either case we read from a valid position. In conclusion the new logic is safer in case of crash. Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

Signed-off-by: László Várady <laszlo.varady@protonmail.com>

This has no real performance gain, it just makes the responsibilities of LogQueueDisk a bit easier to understand. Signed-off-by: László Várady <laszlo.varady@protonmail.com>

Signed-off-by: László Várady <laszlo.varady@protonmail.com>

* concatenate message length and message, then write This results in obvious performance improvement, makes message-writes a bit more atomic, and does not worsen disk-buffer guarantees. Signed-off-by: Balázs Barkó <Balazs.Barko@oneidentity.com>

reason: didn't need to alloc && free GString every time when we want to write message to file Signed-off-by: Balázs Barkó <Balazs.Barko@oneidentity.com>

Signed-off-by: Balázs Barkó <Balazs.Barko@oneidentity.com>

Signed-off-by: László Várady <laszlo.varady@protonmail.com>

MrAnno · 2021-07-29T18:38:53Z

https://github.com/MrAnno/syslog-ng/compare/diskq-perf-low-hanging-fixes-orig..MrAnno:diskq-perf-low-hanging-fixes

kira-syslogng · 2021-07-29T18:56:24Z

Build SUCCESS

Kokan self-requested a review July 26, 2021 14:16

Kokan reviewed Jul 28, 2021

View reviewed changes

gaborznagy self-requested a review July 28, 2021 07:56

szemere self-requested a review July 28, 2021 13:55

szemere reviewed Jul 28, 2021

View reviewed changes

modules/diskq/qdisk.c Show resolved Hide resolved

szemere reviewed Jul 28, 2021

View reviewed changes

alltilla force-pushed the diskq-perf-low-hanging-fixes branch from 628e8aa to 8ea0672 Compare July 29, 2021 09:55

gaborznagy reviewed Jul 29, 2021

View reviewed changes

modules/diskq/logqueue-disk.c Outdated Show resolved Hide resolved

alltilla force-pushed the diskq-perf-low-hanging-fixes branch from 8ea0672 to 7967f90 Compare July 29, 2021 10:23

alltilla and others added 16 commits July 29, 2021 20:12

qdisk: early return in pop_head()

de491e7

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

qdisk: pop_head(): convert record_length from big endian sooner

ee28c75

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

qdisk: factor out _read_record_length_from_disk()

36cf2d2

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

qdisk: refactor: change "else if" to "if" in pop_head()

0b0c692

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

qdisk: factor out _is_record_length_valid()

4714894

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

qdisk: factor out _read_record_from_disk()

8e8a88f

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

qdisk: refactor: rename res to bytes_read in pop_head()

bf85ccf

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

qdisk: factor out _try_reading_record_length()

ae30499

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

qdisk: factor out _update_positions_after_read()

277c74d

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

qdisk: add qdisk_remove_head()

5a4d79a

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

reliable-diskq: use qdisk_remove_head() in skip_message()

a65a4ae

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>

disk-buffer: remove unused abstract method is_reliable()

5c9948d

Signed-off-by: László Várady <laszlo.varady@protonmail.com>

disk-buffer: make read_message() and write_message() non-abstract

7ff54e7

This has no real performance gain, it just makes the responsibilities of LogQueueDisk a bit easier to understand. Signed-off-by: László Várady <laszlo.varady@protonmail.com>

disk-buffer: remove undefined function pread_all

714844d

Signed-off-by: László Várady <laszlo.varady@protonmail.com>

MrAnno and others added 5 commits July 29, 2021 20:13

disk-buffer: make qdisk_read() private

726501d

Signed-off-by: László Várady <laszlo.varady@protonmail.com>

qdisk: using 1 write instead of 2

2b38b41

* concatenate message length and message, then write This results in obvious performance improvement, makes message-writes a bit more atomic, and does not worsen disk-buffer guarantees. Signed-off-by: Balázs Barkó <Balazs.Barko@oneidentity.com>

qdisk: using scratch_buffers to serialize the message

1458e13

reason: didn't need to alloc && free GString every time when we want to write message to file Signed-off-by: Balázs Barkó <Balazs.Barko@oneidentity.com>

qdisk: using scratch_buffers to deserialize messages

54fa1f8

Signed-off-by: Balázs Barkó <Balazs.Barko@oneidentity.com>

disk-buffer: make extracted methods inline

42b6ef3

Signed-off-by: László Várady <laszlo.varady@protonmail.com>

MrAnno force-pushed the diskq-perf-low-hanging-fixes branch from 7967f90 to 42b6ef3 Compare July 29, 2021 18:34

gaborznagy approved these changes Jul 30, 2021

View reviewed changes

Kokan approved these changes Jul 30, 2021

View reviewed changes

Kokan merged commit 14aaa6c into syslog-ng:master Jul 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disk-buffer: trivial performance-related fixes and refactor #3743

disk-buffer: trivial performance-related fixes and refactor #3743

MrAnno commented Jul 26, 2021

kira-syslogng commented Jul 26, 2021

MrAnno commented Jul 26, 2021

kira-syslogng commented Jul 26, 2021

szemere Jul 28, 2021

kira-syslogng commented Jul 29, 2021

gaborznagy left a comment

kira-syslogng commented Jul 29, 2021

MrAnno commented Jul 29, 2021

kira-syslogng commented Jul 29, 2021

disk-buffer: trivial performance-related fixes and refactor #3743

disk-buffer: trivial performance-related fixes and refactor #3743

Conversation

MrAnno commented Jul 26, 2021

kira-syslogng commented Jul 26, 2021

MrAnno commented Jul 26, 2021

kira-syslogng commented Jul 26, 2021

szemere Jul 28, 2021

Choose a reason for hiding this comment

kira-syslogng commented Jul 29, 2021

gaborznagy left a comment

Choose a reason for hiding this comment

kira-syslogng commented Jul 29, 2021

MrAnno commented Jul 29, 2021

kira-syslogng commented Jul 29, 2021