fix issue 868 #878

bollhals · 2020-06-20T21:41:03Z

Proposed Changes

Fixes the issue #868 by modifying how we send data internally.
Previously we went from Method -> Command -> OutboundFrames -> Channel -> Memory -> Socket.
Now we go Method -> Command -> Memory -> Channel -> Socket.

This change

fixes the issue by coping the passed payload before we pass in the channel, thus it's not possible to change the payload afterwards anymore.
Gets rid of the outbound frame classes.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

Bug fix (non-breaking change which fixes issue Payload of BasicPublish is modifyable after the method was called #868 )
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause an observable behavior change in existing systems)
Documentation improvements (corrections, new content, etc)
Cosmetic change (whitespace, formatting, etc)

Checklist

I have read the CONTRIBUTING.md document
I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
All tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)
Any dependent changes have been merged and published in related repositories

bollhals · 2020-06-20T21:42:29Z

relevant commit is febdafa. The others are the ones from the #857 . I've continued on from that branch as there were some relevant changes in there.

lukebakken · 2020-06-24T20:58:57Z

@danielmarbach and @stebet if you have time to check this out that would be great. @stebet if you have time to run your memory allocation and other benchmarks that would be interesting as well. Thanks!

stebet · 2020-06-24T21:44:52Z

I'll take a look at this tomorrow :)

stebet · 2020-06-25T15:15:12Z

projects/RabbitMQ.Client/client/impl/Frame.cs

+        /* +------------+---------+----------------+---------+------------------+
+         * | Frame Type | Channel | Payload length | Payload | Frame End Marker |
+         * +------------+---------+----------------+---------+------------------+
+         * | 1 byte     | 2 bytes | 4 bytes        | x bytes | 1 byte           |
+         * +------------+---------+----------------+---------+------------------+ */


Took my way too long to figure this out from the code itself, thought that might help others better understand :)

stebet · 2020-06-25T15:18:51Z

This all looks good to me, perf and memory allocations looking very good as well.

stebet · 2020-06-26T09:44:41Z

I might copy some of this code for the async branch, especially the OutboundFrame riddance.

danielmarbach

This low level area is not exactly my comfort zone and I'm also a bit weak from a skill level when it comes to low level optimizations like this. I hope my review comments don't look too embarrassing. Still learning and probably will for a while!

projects/RabbitMQ.Client/client/impl/Command.cs

projects/RabbitMQ.Client/client/impl/Connection.cs

danielmarbach · 2020-06-26T10:02:45Z

projects/RabbitMQ.Client/client/impl/ContentHeaderPropertyReader.cs

@@ -45,26 +45,28 @@

 namespace RabbitMQ.Client.Impl
 {
-    internal struct ContentHeaderPropertyReader
+    internal ref struct ContentHeaderPropertyReader


It seems like

ReadBit

ReadLong

ReadLonglong

ReadLongstr

ReadShort

are no longer used. Should we ditch them? Or is it worth keeping them around just in case?

I'm fine with ditching them, I left them as they were there before.

@michaelklishin @lukebakken any thoughts? Do you prefer to have them around?

If code is unused and removing it does not change the API, remove it!

danielmarbach · 2020-06-26T10:04:48Z

projects/RabbitMQ.Client/client/impl/ContentHeaderPropertyReader.cs

            return result;
        }

        public byte[] ReadLongstr()
        {
-            byte[] result = WireFormatting.ReadLongstr(_memory.Slice(_memoryOffset));
-            _memoryOffset += 4 + result.Length;
+            byte[] result = WireFormatting.ReadLongstr(Span);


Seems to still alloc byte[] does it make sense to remove the ToArray() underneath or not worth it because the method is not used anyway? See my other comment

If it is not used, I rather delete it than change it. But in general we could return a Memory, but also here, it's a different contract we implicitly get by doing so.

Yeah but this is a purely internal thing . Maybe let's wait until Michael and Luke chime in

IMHO change internal contracts as you see fit 👍

Sounds like a fairly minor implementation detail to me? I trust @bollhals' judgment on this then :)

danielmarbach · 2020-06-26T10:10:05Z

projects/RabbitMQ.Client/client/impl/ContentHeaderPropertyReader.cs

            return result;
        }

        /// <returns>A type of <seealso cref="System.Collections.Generic.IDictionary{TKey,TValue}"/>.</returns>
        public Dictionary<string, object> ReadTable()
        {
-            Dictionary<string, object> result = WireFormatting.ReadTable(_memory.Slice(_memoryOffset), out int bytesRead);
-            _memoryOffset += bytesRead;
+            Dictionary<string, object> result = WireFormatting.ReadTable(Span, out int bytesRead);


Would we want to go down the path of actually pooling header dictionaries and then return it when the command is disposed as we do for the body buffer?

Possible, but risky and a "breaking change" as the consumer would be prohibited from taking any reference to these passed dictionaries. (Maybe you can open an Issue and we can discuss there about pro / cons?

Yeah true. Maybe it is not worth the risk though. I'm guessing a read-only dictionary would also not really help because someone might already somewhere abusing the writable nature of the type returned. So I'm on the fence if I even should raise an issue. @michaelklishin @lukebakken thoughts?

I've thought about this pooling. Also, caching that default BasicProperties and only allocating for the fields that change from the defaults to minimize allocations taking place there. Requires some bookeeping around when the default properties change though but should be doable and yield a perf benefit.

Pooling header dictionaries feels like a 7.0 thing. I think we'd want to prove that the benefits outweigh the (probably) more complicated code.

Let's handle pooling in a separate PR (and yes, sounds like a 7.0 change which may or may not be worth the complexity).

danielmarbach · 2020-06-26T10:18:44Z

projects/RabbitMQ.Client/client/impl/MethodArgumentWriter.cs

@@ -44,44 +44,42 @@

 namespace RabbitMQ.Client.Impl
 {
-    struct MethodArgumentWriter
+    internal ref struct MethodArgumentWriter


Should we consider ditching WriteContent?

throw new NotSupportedException("WriteContent should not be called");

projects/RabbitMQ.Client/client/impl/SocketFrameHandler.cs

projects/Unit/TestFrameFormatting.cs

bollhals · 2020-06-26T13:51:28Z

took care of the feedback except the ones with open questions, let's wait for them to be answered and then clean it up.

bollhals · 2020-06-30T21:09:56Z

@lukebakken could you take a look at the questions in some of the review comments?

Sidenote:
I wanted to figure out what breaks multi threaded usage of the same channel. From https://www.rabbitmq.com/dotnet-api-guide.html#concurrency-channel-sharing I saw that BasicPublish is problematic, so I wrote a test trying to break it, then I figured out that this change in here does make it safe to use the same channel in multiple threads for publishing.

Are there other operations that are unsafe to use in multithreaded environments without protection?

bording · 2020-06-30T21:20:12Z

I wanted to figure out what breaks multi threaded usage of the same channel.

From my understanding, the main reason multi-threaded usage breaks is because of incorrect frame interleaving. This means that any AMQP commands that can be multiple frames cannot tolerate having a different command's frame inserted into the multiple-frame sequence.

Any change that ensures that multiple frame commands are treated as an atomic unit should go a long way to enabling multi-threaded usage of a single channel.

michaelklishin · 2020-06-30T21:58:07Z

There is only one such command sent by publishers: actual basic.publish operation (which involves two or more frames). Putting a connection-wide lock on this path will have a substantial throughput effect.

bording · 2020-07-01T05:09:43Z

There is only one such command sent by publishers: actual basic.publish operation (which involves two or more frames). Putting a connection-wide lock on this path will have a substantial throughput effect.

I would think there would be a way to treat those frames as a single atomic unit without there being a lock to impact perf.

stebet · 2020-07-01T08:49:22Z

Yeas, this is actually something that was quite easy to do with Pipelines, and is part of the work i did in the async branch.

danielmarbach · 2020-07-01T09:48:10Z

@stebet isn't this PR also making this problem gone as is due to the nature of how the channel reader and writer interact with each other?

At least that is how I understood the code as well as @bollhals comment

so I wrote a test trying to break it, then I figured out that this change in here does make it safe to use the same channel in multiple threads for publishing.

stebet · 2020-07-01T10:33:52Z

@stebet isn't this PR also making this problem gone as is due to the nature of how the channel reader and writer interact with each other?

At least that is how I understood the code as well as @bollhals comment

so I wrote a test trying to break it, then I figured out that this change in here does make it safe to use the same channel in multiple threads for publishing.

It should be. Would be interesting to try to create a massively parallel test to see if we can break it somehow.

bollhals · 2020-07-01T19:39:44Z

Yes, this pr should fix at least the frame interleaving. I wasn‘t able to make it break anymore with my local test that published in 4 threads 50k messages each. Where as before rhias change it broke down already somewhere inthe first few thousands.

lukebakken · 2020-07-01T23:37:24Z

I appreciate the discussion. I've been busy dealing with customer support escalations.

michaelklishin · 2020-07-03T01:54:56Z

@bollhals basic.publish is the only mutliframe method that is generally unsafe. Here is how it works. If you publish a message with no payload, two frames will be sent:

[basic.pulish method][content header]

in a much more common case of some data to send, it will be three or more frames depending on payload size:

[basic.pulish method][content header][message body frame]+

All problematic scenarios with publishing on a shared channel end up with frame interleaving the server parser does not expect, e.g. something like this

[basic.pulish method 1][basic.pulish method 2][content header 2][message body frame 2][content header 1][message body frame 1]

or this

[basic.pulish method 1][queue.declare][content header 1][message body frame 1]

A group of tests that shares a channel for publishing combined with other workloads would be great to have. But even if we test manually as part of QA'ing this PR, it would still be perfectly fine for now. Thank you!

michaelklishin · 2020-07-03T02:00:47Z

This looks very promising to me. I'll try to spend some time on this in the next few days. It would be very interesting to add a few basic integration tests that share a channel in ways that were not previously possible. But I would also be happy to proceed with merging it without such tests since the original goal was not necessarily additional concurrency hazard safety for publishers.

@bollhals @danielmarbach @stebet thanks again for your substantial contributions to this client!

bollhals · 2020-07-03T17:54:53Z

This looks very promising to me. I'll try to spend some time on this in the next few days. It would be very interesting to add a few basic integration tests that share a channel in ways that were not previously possible. But I would also be happy to proceed with merging it without such tests since the original goal was not necessarily additional concurrency hazard safety for publishers.

I should be able to put some test(s) together, as I was doing some experimental tests anyway.

danielmarbach · 2020-07-03T19:53:30Z

I think we should merge this. It is a different concern and not necessary to hold this good change up

bollhals · 2020-07-04T20:47:40Z

I should be able to put some test(s) together, as I was doing some experimental tests anyway.

done and tested that it used to be failing before, passing now

stebet · 2020-07-04T21:29:17Z

Well done :)

lukebakken · 2020-07-06T15:09:29Z

Thanks everyone.

fix issue 868 (cherry picked from commit a654b1e)

bollhals added 3 commits June 17, 2020 20:16

various improvements around frames

0f5425a

spanification of WireFormatting & Reader/Writer

e593e36

fix issue 868

febdafa

lukebakken added this to the 6.2.0 milestone Jun 24, 2020

lukebakken self-assigned this Jun 24, 2020

lukebakken mentioned this pull request Jun 24, 2020

Reader / Writer improvements #857

Closed

11 tasks

lukebakken requested review from stebet and lukebakken June 24, 2020 18:05

lukebakken mentioned this pull request Jun 24, 2020

Ability to do concurrent dispatches both on the async as well as the sync consumer #866

Merged

11 tasks

stebet reviewed Jun 25, 2020

View reviewed changes

danielmarbach reviewed Jun 26, 2020

View reviewed changes

feedback from pull request

c540a86

michaelklishin mentioned this pull request Jul 3, 2020

Unable to implement a consumer that also publishes messages #886

Closed

add test for multi thread publish on a shared model

7705254

danielmarbach approved these changes Jul 4, 2020

View reviewed changes

lukebakken approved these changes Jul 6, 2020

View reviewed changes

lukebakken merged commit a654b1e into rabbitmq:master Jul 6, 2020

lukebakken added a commit that referenced this pull request Jul 6, 2020

Merge pull request #878 from bollhals/868

267fcd2

fix issue 868 (cherry picked from commit a654b1e)

michaelklishin mentioned this pull request Jul 6, 2020

Payload of BasicPublish is modifyable after the method was called #868

Closed

michaelklishin removed this from the 6.2.0 milestone Jul 6, 2020

bollhals mentioned this pull request Jul 8, 2020

Pool workers to handle incoming deliveries, ACKs etc. #906

Closed

danielmarbach mentioned this pull request Aug 24, 2020

Adjust to client changes embracing the new concurrency setting Particular/NServiceBus.RabbitMQ#665

Merged

bollhals deleted the 868 branch March 2, 2021 20:49

bard83 mentioned this pull request Apr 13, 2021

Concurrent Publish using shared model fails #1036

Closed

fix issue 868 #878

fix issue 868 #878

Conversation

bollhals commented Jun 20, 2020

Proposed Changes

Types of Changes

Checklist

bollhals commented Jun 20, 2020

lukebakken commented Jun 24, 2020

stebet commented Jun 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stebet commented Jun 25, 2020 • edited Loading

stebet commented Jun 26, 2020

danielmarbach left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bollhals commented Jun 26, 2020

bollhals commented Jun 30, 2020

bording commented Jun 30, 2020

michaelklishin commented Jun 30, 2020

bording commented Jul 1, 2020 • edited Loading

stebet commented Jul 1, 2020

danielmarbach commented Jul 1, 2020

stebet commented Jul 1, 2020

bollhals commented Jul 1, 2020

lukebakken commented Jul 1, 2020

michaelklishin commented Jul 3, 2020

michaelklishin commented Jul 3, 2020

bollhals commented Jul 3, 2020 • edited Loading

danielmarbach commented Jul 3, 2020

bollhals commented Jul 4, 2020

stebet commented Jul 4, 2020

lukebakken commented Jul 6, 2020

stebet commented Jun 25, 2020 •

edited

Loading

bording commented Jul 1, 2020 •

edited

Loading

bollhals commented Jul 3, 2020 •

edited

Loading