Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk-buffer: trivial performance-related fixes and refactor #3743

Merged
merged 21 commits into from Jul 30, 2021

Conversation

MrAnno
Copy link
Collaborator

@MrAnno MrAnno commented Jul 26, 2021

We are currently profiling and measuring the performance of the disk-buffer() module in different scenarios in order to identify its bottlenecks and to improve its speed.

This PR is the first step in this adventure.
As we got familiar with the code base, we found some trivial performance-related issues, which we've fixed:

  • reduced two consecutive pwrite() calls to a single one when writing the serialized message and its length
  • removed unnecessary read/deserialization logic when skipping messages that are already available in the qreliable memory segment
  • replaced heap allocations with scratch_buffers on hot paths

This resulted in a ~20% performance improvement in general.

The PR also contains refactor commits.

For reviewers:

Click to expand!

image

The reliable disk buffer uses 2 memory queues:

  • qreliable:
    - it's just a technicality, it is only used to kick flow-control in when the free space is less than mem-buf-size() in the disk-buffer file
    - messages stored in this queue are also written into the disk-buffer file, so these messages will never be lost
  • qbacklog: outgoing messages that are not acked yet

The non-reliable disk buffer uses 3 memory queue segments:

  • qout: a non-reliable segment that increases speed to the detriment of reliability, its size can be configured with qout-size()
    - the first qout-size() number of messages (64 by default) are stored only in the qout memory queue for performance reasons
    - we only write/read messages to/from disk after the qout queue is full
  • qoverflow:
    - this is used for the same reason as qreliable, to kick flow-control in when the disk-buffer file is full
    - it stores mem-buf-length() number of messages after the disk-buffer file is fully saturated
  • qbacklog: outgoing messages that are not acked yet

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@MrAnno
Copy link
Collaborator Author

MrAnno commented Jul 26, 2021

@kira-syslogng do stresstest

@kira-syslogng
Copy link
Contributor

Kira-stress-test: Build SUCCESS

@Kokan Kokan self-requested a review July 26, 2021 14:16
modules/diskq/logqueue-disk-reliable.c Outdated Show resolved Hide resolved
modules/diskq/logqueue-disk.c Outdated Show resolved Hide resolved
modules/diskq/qdisk.c Outdated Show resolved Hide resolved
modules/diskq/qdisk.c Outdated Show resolved Hide resolved
modules/diskq/qdisk.c Outdated Show resolved Hide resolved
modules/diskq/logqueue-disk-reliable.c Outdated Show resolved Hide resolved
@gaborznagy gaborznagy self-requested a review July 28, 2021 07:56
@szemere szemere self-requested a review July 28, 2021 13:55
@@ -427,6 +427,16 @@ _is_trying_to_read_from_unwritten_space(QDisk *self)
return self->hdr->read_head == self->hdr->write_head;
}

static gssize
_read_record_length_from_disk(QDisk *self, guint32 *record_length)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: I was about to comment the lack of this (6aca8bf3961ae83a2d68ebf80bbcd0ab0b0ca397) refactor here, when I found it. Squash them together?

@alltilla alltilla force-pushed the diskq-perf-low-hanging-fixes branch from 628e8aa to 8ea0672 Compare July 29, 2021 09:55
@kira-syslogng
Copy link
Contributor

Build FAILURE

Copy link
Collaborator

@gaborznagy gaborznagy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only found a minor thing, you guys found everyhing.

modules/diskq/logqueue-disk.c Outdated Show resolved Hide resolved
@alltilla alltilla force-pushed the diskq-perf-low-hanging-fixes branch from 8ea0672 to 7967f90 Compare July 29, 2021 10:23
@kira-syslogng
Copy link
Contributor

Build SUCCESS

alltilla and others added 16 commits July 29, 2021 20:12
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
…d move

We can use that value, because they are the same.

From glib/glib/gstring.c:
```
GString *
g_string_set_size (GString *string,
                   gsize    len)
{
  g_return_val_if_fail (string != NULL, NULL);

  if (len >= string->allocated_len)
    g_string_maybe_expand (string, len - string->len);

  string->len = len;
  string->str[len] = 0;

  return string;
}
```

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
This commit changes the atomicity of changing the value of read_head.

It is important, because read_head is on an mmap-ed region, and if a
crash occurs, we still want to have a valid disk-buffer.

The old could change read_head twice, if the first change pointed to
a position, which nedded wrapping, it was then changed again to the
beginning of the file (QDISK_RESERVED_SPACE).
In this case, if a crash occurred between the two, we did not wrap the
read_head, so in the next pop_head after the restart, we might have
read from an invalid space.

The new code calculates the new read_head position with a local variable
and sets read_head once to the valid position. So we can either crash
before setting or after setting. If we crash after we already changed
the read_head, so everything is fine, if we crashed before, we will try
to read the same message again after the restart. Either case we read
from a valid position.

In conclusion the new logic is safer in case of crash.

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: László Várady <laszlo.varady@protonmail.com>
This has no real performance gain, it just makes the responsibilities of
LogQueueDisk a bit easier to understand.

Signed-off-by: László Várady <laszlo.varady@protonmail.com>
Signed-off-by: László Várady <laszlo.varady@protonmail.com>
MrAnno and others added 5 commits July 29, 2021 20:13
Signed-off-by: László Várady <laszlo.varady@protonmail.com>
* concatenate message length and message, then write

This results in obvious performance improvement, makes message-writes a bit more atomic, and does not worsen disk-buffer guarantees.

Signed-off-by: Balázs Barkó <Balazs.Barko@oneidentity.com>
reason: didn't need to alloc && free GString every time when we want to write message to file

Signed-off-by: Balázs Barkó <Balazs.Barko@oneidentity.com>
Signed-off-by: Balázs Barkó <Balazs.Barko@oneidentity.com>
Signed-off-by: László Várady <laszlo.varady@protonmail.com>
@MrAnno MrAnno force-pushed the diskq-perf-low-hanging-fixes branch from 7967f90 to 42b6ef3 Compare July 29, 2021 18:34
@MrAnno
Copy link
Collaborator Author

MrAnno commented Jul 29, 2021

https://github.com/MrAnno/syslog-ng/compare/diskq-perf-low-hanging-fixes-orig..MrAnno:diskq-perf-low-hanging-fixes

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@Kokan Kokan merged commit 14aaa6c into syslog-ng:master Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants