Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logqueue-disk: serialize outside of the lock in the source threads #3746

Merged
merged 18 commits into from Aug 9, 2021

Conversation

alltilla
Copy link
Collaborator

@alltilla alltilla commented Aug 2, 2021

It is common to have multiple source threads.
When we use disk-buffer, the message's serialization and writing it to the disk happen in the source thread.

Currently, the serialization happens while the source thread is holding the LogQueue's lock. Also it is inbetween two accesses of the QDisk.
If we just simply move the serialization outside the lock, we are not guaranteed to have the same QDisk state, as we were having beforehand, so it is better to move the whole serialization between QDisk is ever accessed and the lock is obtained.

This can yield significant performance improvement, because while the source threads are doing their slow serialization, the destination thread can process messages from the QDisk. In my local measurements, this change introduced a 40-50% performance gain compared to syslog-ng-3.32.1.

No news file needed, IMO.

Signed-off-by: Attila Szakacs attila.szakacs@oneidentity.com

@MrAnno
Copy link
Collaborator

MrAnno commented Aug 2, 2021

This can yield significant performance improvement

Could you share the measurement, please? As far as I remember, it was a 60% improvement in general when the reliable disk-buffer was used.

The non-reliable disk-buffer has some qout/qoverflow move logic that slows us down, we're investigating that one separately.

No news file needed, IMO.

I think we could add a NEWS entry about the overall performance improvement in the last pull request.


I'm sharing two flame graphs, which show the ratios between the actual disk writes and the serialization/deserialization logic:

image
image

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@MrAnno MrAnno self-requested a review August 2, 2021 14:55
@alltilla
Copy link
Collaborator Author

alltilla commented Aug 2, 2021

This can yield significant performance improvement

Could you share the measurement, please? As far as I remember, it was a 60% improvement in general when the reliable disk-buffer was used.

The non-reliable disk-buffer has some qout/qoverflow move logic that slows us down, we're investigating that one separately.

No news file needed, IMO.

I think we could add a NEWS entry about the overall performance improvement in the last pull request.

I'm sharing two flame graphs, which show the ratios between the actual disk writes and the serialization/deserialization logic:

image
image

Sure, let me measure everything again tomorrow. Thanks!

modules/diskq/qdisk.c Outdated Show resolved Hide resolved
@alltilla alltilla force-pushed the diskq-perf-serialize-outside-lock branch from 316c17d to 5b1294a Compare August 3, 2021 09:03
@kira-syslogng
Copy link
Contributor

Build FAILURE

@alltilla alltilla force-pushed the diskq-perf-serialize-outside-lock branch from 5b1294a to 350b91a Compare August 3, 2021 10:24
@kira-syslogng
Copy link
Contributor

Build SUCCESS

modules/diskq/logqueue-disk.c Outdated Show resolved Hide resolved
modules/diskq/qdisk.c Outdated Show resolved Hide resolved
modules/diskq/qdisk.c Outdated Show resolved Hide resolved
modules/diskq/qdisk.c Outdated Show resolved Hide resolved
@alltilla alltilla force-pushed the diskq-perf-serialize-outside-lock branch 5 times, most recently from acaef7f to a028825 Compare August 4, 2021 09:02
@kira-syslogng
Copy link
Contributor

Build SUCCESS

@alltilla
Copy link
Collaborator Author

alltilla commented Aug 5, 2021

Added the hint for needing serialized message.

@MrAnno
Copy link
Collaborator

MrAnno commented Aug 5, 2021

@kira-syslogng do stresstest

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@kira-syslogng
Copy link
Contributor

Kira-stress-test: Build SUCCESS

modules/diskq/qdisk.h Show resolved Hide resolved
gboolean consumed = FALSE;

if (qdisk_started(self->qdisk) && qdisk_is_space_avail(self->qdisk, 64))
{
ScratchBuffersMarker marker;
GString *write_serialized = scratch_buffers_alloc_and_mark(&marker);

sa = serialize_string_archive_new(write_serialized);
log_msg_serialize(msg, sa, options->compaction ? LMSF_COMPACTION : 0);
if (!qdisk_serialize_msg(self->qdisk, msg, write_serialized))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qdisk: move serialization to qdisk is not only a move (refactor) commit, you also fixed the error handling of log_msg_serialize().

modules/diskq/qdisk.c Show resolved Hide resolved
modules/diskq/logqueue-disk.h Outdated Show resolved Hide resolved
modules/diskq/logqueue-disk.h Outdated Show resolved Hide resolved
modules/diskq/logqueue-disk.c Outdated Show resolved Hide resolved
modules/diskq/logqueue-disk-non-reliable.c Outdated Show resolved Hide resolved
@MrAnno
Copy link
Collaborator

MrAnno commented Aug 5, 2021

All of my review notes are based on the ways of how the serialization logic was moved out from the lock.

The previous interface looks like this:
image

I think it's really not okay to change the signature of push_tail() in LogQueueDisk, because the existence of those function pointers in the struct is just some inheritance trickery that makes it possible to lock the whole data structure in the top-level abstract class easily.
In reality, those function pointers should have the exact same signature as LogQueue has.

Why don't we move the locks where they actually belong?
If I'm not mistaken, that would resolve all of the above issues.
I think we could remove lock/unlock calls from LogQueueDisk, move them to where they are actually needed/used (LogQueueDiskReliable and LogQueueDiskNonReliable).

We may also get rid of the 2 protected methods of LogQueueDisk: write_message() and read_message() (they look weird anyway together with push_tail and pop_head).

This way, message serialization can be done as a separate unlocked step in the non-reliable and reliable disk buffer, and you could also implement the non-reliable serialization hint without polluting LogQueueDisk with something that only makes sense in the non-reliable disk-buffer implementation.

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
@alltilla alltilla force-pushed the diskq-perf-serialize-outside-lock branch from 3718a0f to 1c9060c Compare August 6, 2021 09:46
@alltilla
Copy link
Collaborator Author

alltilla commented Aug 6, 2021

@MrAnno Thanks, I implemented your suggestion. The old implementation is available at: https://github.com/alltilla/syslog-ng/commits/diskq-perf-serialize-outside-lock-orig if for some reason it will be needed.

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
…ll()

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
@alltilla alltilla force-pushed the diskq-perf-serialize-outside-lock branch from 1c9060c to 92110da Compare August 6, 2021 09:52
@kira-syslogng
Copy link
Contributor

Build SUCCESS

@alltilla
Copy link
Collaborator Author

alltilla commented Aug 6, 2021

@kira-syslogng do stresstest

@kira-syslogng
Copy link
Contributor

Kira-stress-test: Build SUCCESS

Copy link
Collaborator

@MrAnno MrAnno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the refactor was a really good direction. Now that locks are closer to the actual implementation, we already see additional places where those locks are not necessary.

A few lines of code duplication was added as a result of the refactor/inheritance fix, but further refactor rounds will be much easier upon this work than on the original code IMHO.

I think only one of my notes should be fixed in this PR (the accidental change of the disk-buffer file's magic numbers).

modules/diskq/logqueue-disk-reliable.c Show resolved Hide resolved
modules/diskq/logqueue-disk.c Show resolved Hide resolved
modules/diskq/logqueue-disk-non-reliable.c Show resolved Hide resolved
modules/diskq/logqueue-disk-reliable.c Show resolved Hide resolved
modules/diskq/logqueue-disk.c Outdated Show resolved Hide resolved
modules/diskq/logqueue-disk-non-reliable.c Outdated Show resolved Hide resolved
Copy link
Collaborator

@Kokan Kokan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this version has the same performance figures as the previous proposed change ?
Have you measured that only the refactor part had any performance difference ?

modules/diskq/logqueue-disk-non-reliable.c Outdated Show resolved Hide resolved
modules/diskq/qdisk.c Show resolved Hide resolved
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Just for the sake of symmetry.

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
The serialized message now already has the guint32 record length in its
beginning, so we do not need to make an adjustment in is_space_avail().

There are 2 other places where qdisk_is_space_avail() is used:
 * LogQueueDiskNonReliable::_has_movable_message():
     It is a guess with a relatively high size, so we do not need to
     adjust it.
 * LogQueueDisk::log_queue_disk_write_message():
     It will be fixed in one of the following commits.

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
This functions does 3 things:
 1. Does some preliminary check on QDisk.
 2. Serializes the message.
 3. Writes the message to the QDisk.

Serializing and writing to disk are expensive actions.
We are doing both, while holding the LogQueue::lock.
We do not need to hold the lock while serializing.

In order to move the serialization outside the lock, we need more
control, which cannot be done with this common function.

This commit only moves the function to the descendant classes, we will
implement our class specific behavior in the following commits.

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
…il()

This commit also fixes the available space checking, mentioned in
a169e331.

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
…message_to_disk()

The free space is checked in qdisk_push_tail(), it is unnecessary to do
it twice.

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
…h_tail()

Signed-off-by: Attila Szakacs <attila.szakacs@oneidentity.com>
@alltilla alltilla force-pushed the diskq-perf-serialize-outside-lock branch from 92110da to a80ef51 Compare August 9, 2021 09:23
@alltilla
Copy link
Collaborator Author

alltilla commented Aug 9, 2021

@Kokan

Does this version has the same performance figures as the previous proposed change ?

Yes.

Have you measured that only the refactor part had any performance difference ?

No, I can do so, if you would like me to.

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@MrAnno
Copy link
Collaborator

MrAnno commented Aug 9, 2021

@kira-syslogng do stresstest

@kira-syslogng
Copy link
Contributor

Kira-stress-test: Build SUCCESS

Copy link
Collaborator

@MrAnno MrAnno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@Kokan
Copy link
Collaborator

Kokan commented Aug 9, 2021

No, I can do so, if you would like me to.

Not needed, I was just curios if we have the data.

@Kokan Kokan merged commit f1387d0 into syslog-ng:master Aug 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants