Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue 868 #878

Merged
merged 5 commits into from
Jul 6, 2020
Merged

fix issue 868 #878

merged 5 commits into from
Jul 6, 2020

Conversation

bollhals
Copy link
Contributor

Proposed Changes

Fixes the issue #868 by modifying how we send data internally.
Previously we went from Method -> Command -> OutboundFrames -> Channel -> Memory -> Socket.
Now we go Method -> Command -> Memory -> Channel -> Socket.

This change

  1. fixes the issue by coping the passed payload before we pass in the channel, thus it's not possible to change the payload afterwards anymore.
  2. Gets rid of the outbound frame classes.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue Payload of BasicPublish is modifyable after the method was called #868 )
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

@bollhals
Copy link
Contributor Author

relevant commit is febdafa. The others are the ones from the #857 . I've continued on from that branch as there were some relevant changes in there.

@lukebakken
Copy link
Contributor

@danielmarbach and @stebet if you have time to check this out that would be great. @stebet if you have time to run your memory allocation and other benchmarks that would be interesting as well. Thanks!

@stebet
Copy link
Contributor

stebet commented Jun 24, 2020

I'll take a look at this tomorrow :)

Comment on lines +55 to +59
/* +------------+---------+----------------+---------+------------------+
* | Frame Type | Channel | Payload length | Payload | Frame End Marker |
* +------------+---------+----------------+---------+------------------+
* | 1 byte | 2 bytes | 4 bytes | x bytes | 1 byte |
* +------------+---------+----------------+---------+------------------+ */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took my way too long to figure this out from the code itself, thought that might help others better understand :)

@stebet
Copy link
Contributor

stebet commented Jun 25, 2020

This all looks good to me, perf and memory allocations looking very good as well.

image

@stebet
Copy link
Contributor

stebet commented Jun 26, 2020

I might copy some of this code for the async branch, especially the OutboundFrame riddance.

Copy link
Collaborator

@danielmarbach danielmarbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This low level area is not exactly my comfort zone and I'm also a bit weak from a skill level when it comes to low level optimizations like this. I hope my review comments don't look too embarrassing. Still learning and probably will for a while!

projects/RabbitMQ.Client/client/impl/Connection.cs Outdated Show resolved Hide resolved
@@ -45,26 +45,28 @@

namespace RabbitMQ.Client.Impl
{
internal struct ContentHeaderPropertyReader
internal ref struct ContentHeaderPropertyReader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like

  • ReadBit
  • ReadLong
  • ReadLonglong
  • ReadLongstr
  • ReadShort

are no longer used. Should we ditch them? Or is it worth keeping them around just in case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with ditching them, I left them as they were there before.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelklishin @lukebakken any thoughts? Do you prefer to have them around?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If code is unused and removing it does not change the API, remove it!

return result;
}

public byte[] ReadLongstr()
{
byte[] result = WireFormatting.ReadLongstr(_memory.Slice(_memoryOffset));
_memoryOffset += 4 + result.Length;
byte[] result = WireFormatting.ReadLongstr(Span);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to still alloc byte[] does it make sense to remove the ToArray() underneath or not worth it because the method is not used anyway? See my other comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not used, I rather delete it than change it. But in general we could return a Memory, but also here, it's a different contract we implicitly get by doing so.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but this is a purely internal thing . Maybe let's wait until Michael and Luke chime in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO change internal contracts as you see fit 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a fairly minor implementation detail to me? I trust @bollhals' judgment on this then :)

return result;
}

/// <returns>A type of <seealso cref="System.Collections.Generic.IDictionary{TKey,TValue}"/>.</returns>
public Dictionary<string, object> ReadTable()
{
Dictionary<string, object> result = WireFormatting.ReadTable(_memory.Slice(_memoryOffset), out int bytesRead);
_memoryOffset += bytesRead;
Dictionary<string, object> result = WireFormatting.ReadTable(Span, out int bytesRead);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to go down the path of actually pooling header dictionaries and then return it when the command is disposed as we do for the body buffer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible, but risky and a "breaking change" as the consumer would be prohibited from taking any reference to these passed dictionaries. (Maybe you can open an Issue and we can discuss there about pro / cons?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah true. Maybe it is not worth the risk though. I'm guessing a read-only dictionary would also not really help because someone might already somewhere abusing the writable nature of the type returned. So I'm on the fence if I even should raise an issue. @michaelklishin @lukebakken thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this pooling. Also, caching that default BasicProperties and only allocating for the fields that change from the defaults to minimize allocations taking place there. Requires some bookeeping around when the default properties change though but should be doable and yield a perf benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pooling header dictionaries feels like a 7.0 thing. I think we'd want to prove that the benefits outweigh the (probably) more complicated code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's handle pooling in a separate PR (and yes, sounds like a 7.0 change which may or may not be worth the complexity).

@@ -44,44 +44,42 @@

namespace RabbitMQ.Client.Impl
{
struct MethodArgumentWriter
internal ref struct MethodArgumentWriter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider ditching WriteContent?

throw new NotSupportedException("WriteContent should not be called");

projects/RabbitMQ.Client/client/impl/SocketFrameHandler.cs Outdated Show resolved Hide resolved
projects/Unit/TestFrameFormatting.cs Show resolved Hide resolved
@bollhals
Copy link
Contributor Author

took care of the feedback except the ones with open questions, let's wait for them to be answered and then clean it up.

@bollhals
Copy link
Contributor Author

@lukebakken could you take a look at the questions in some of the review comments?

Sidenote:
I wanted to figure out what breaks multi threaded usage of the same channel. From https://www.rabbitmq.com/dotnet-api-guide.html#concurrency-channel-sharing I saw that BasicPublish is problematic, so I wrote a test trying to break it, then I figured out that this change in here does make it safe to use the same channel in multiple threads for publishing.

Are there other operations that are unsafe to use in multithreaded environments without protection?

@bording
Copy link
Collaborator

bording commented Jun 30, 2020

I wanted to figure out what breaks multi threaded usage of the same channel.

From my understanding, the main reason multi-threaded usage breaks is because of incorrect frame interleaving. This means that any AMQP commands that can be multiple frames cannot tolerate having a different command's frame inserted into the multiple-frame sequence.

Any change that ensures that multiple frame commands are treated as an atomic unit should go a long way to enabling multi-threaded usage of a single channel.

@michaelklishin
Copy link
Member

There is only one such command sent by publishers: actual basic.publish operation (which involves two or more frames). Putting a connection-wide lock on this path will have a substantial throughput effect.

@bording
Copy link
Collaborator

bording commented Jul 1, 2020

There is only one such command sent by publishers: actual basic.publish operation (which involves two or more frames). Putting a connection-wide lock on this path will have a substantial throughput effect.

I would think there would be a way to treat those frames as a single atomic unit without there being a lock to impact perf.

@stebet
Copy link
Contributor

stebet commented Jul 1, 2020

Yeas, this is actually something that was quite easy to do with Pipelines, and is part of the work i did in the async branch.

@danielmarbach
Copy link
Collaborator

@stebet isn't this PR also making this problem gone as is due to the nature of how the channel reader and writer interact with each other?

At least that is how I understood the code as well as @bollhals comment

so I wrote a test trying to break it, then I figured out that this change in here does make it safe to use the same channel in multiple threads for publishing.

@stebet
Copy link
Contributor

stebet commented Jul 1, 2020

@stebet isn't this PR also making this problem gone as is due to the nature of how the channel reader and writer interact with each other?

At least that is how I understood the code as well as @bollhals comment

so I wrote a test trying to break it, then I figured out that this change in here does make it safe to use the same channel in multiple threads for publishing.

It should be. Would be interesting to try to create a massively parallel test to see if we can break it somehow.

@bollhals
Copy link
Contributor Author

bollhals commented Jul 1, 2020

Yes, this pr should fix at least the frame interleaving. I wasn‘t able to make it break anymore with my local test that published in 4 threads 50k messages each. Where as before rhias change it broke down already somewhere inthe first few thousands.

@lukebakken
Copy link
Contributor

I appreciate the discussion. I've been busy dealing with customer support escalations.

@michaelklishin
Copy link
Member

@bollhals basic.publish is the only mutliframe method that is generally unsafe. Here is how it works. If you publish a message with no payload, two frames will be sent:

[basic.pulish method][content header]

in a much more common case of some data to send, it will be three or more frames depending on payload size:

[basic.pulish method][content header][message body frame]+

All problematic scenarios with publishing on a shared channel end up with frame interleaving the server parser does not expect, e.g. something like this

[basic.pulish method 1][basic.pulish method 2][content header 2][message body frame 2][content header 1][message body frame 1]

or this

[basic.pulish method 1][queue.declare][content header 1][message body frame 1]

A group of tests that shares a channel for publishing combined with other workloads would be great to have. But even if we test manually as part of QA'ing this PR, it would still be perfectly fine for now. Thank you!

@michaelklishin
Copy link
Member

This looks very promising to me. I'll try to spend some time on this in the next few days. It would be very interesting to add a few basic integration tests that share a channel in ways that were not previously possible. But I would also be happy to proceed with merging it without such tests since the original goal was not necessarily additional concurrency hazard safety for publishers.

@bollhals @danielmarbach @stebet thanks again for your substantial contributions to this client!

@bollhals
Copy link
Contributor Author

bollhals commented Jul 3, 2020

This looks very promising to me. I'll try to spend some time on this in the next few days. It would be very interesting to add a few basic integration tests that share a channel in ways that were not previously possible. But I would also be happy to proceed with merging it without such tests since the original goal was not necessarily additional concurrency hazard safety for publishers.

I should be able to put some test(s) together, as I was doing some experimental tests anyway.

@danielmarbach
Copy link
Collaborator

I think we should merge this. It is a different concern and not necessary to hold this good change up

@bollhals
Copy link
Contributor Author

bollhals commented Jul 4, 2020

I should be able to put some test(s) together, as I was doing some experimental tests anyway.

done and tested that it used to be failing before, passing now

@stebet
Copy link
Contributor

stebet commented Jul 4, 2020

Well done :)

@lukebakken lukebakken merged commit a654b1e into rabbitmq:master Jul 6, 2020
@lukebakken
Copy link
Contributor

Thanks everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants