Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #33368 [Messenger] variable contentType in (de)serialization process #33373

Closed
wants to merge 2 commits into from

Conversation

@pounard
Copy link
Contributor

pounard commented Aug 28, 2019

Q A
Branch? 4.4 for features / 3.4 or 4.3 for bug fixes
Bug fix? yes
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #33368
License MIT

Make the serializer more robust by adding a contextual and variable content type that falls back on default format. See #33368.

@Tobion

This comment has been minimized.

Copy link
Member

Tobion commented Aug 29, 2019

I don't see the purpose of this and how it fixes #33368. You are not gonna use the Symfony Serializer to deserialize arbitrary strings into a class as described in #33368

@pounard

This comment has been minimized.

Copy link
Contributor Author

pounard commented Aug 29, 2019

@Tobion it only deals with serialization format, it's a first step, only thing it allows is:

  • the code that sends a message can stamp with the "content type" (as in serialization format, not "message type"), for example if the serializer is configured to write JSON, I could with this arbitrarily send an XML message in the bus, if the consumer is an external application that consumes XML,

  • this code stores the content type information along the message in the headers in a generic way, transports can then get it and pass it along the message broker when sending if it's a native feature of the message broker (the AMQP transport could just put it into the "content-type" message property for example),

  • by storing the content type information in the headers, using an internal transport such as Doctrine, you can keep the content type information along the message, which allows arbitrary content types to be sent in the bus without having to care about the configuration,

  • which also allows the global serialization format to be changed at anytime yet still be able to process enqueued messages that have a different content type, since we stored it.

@Tobion

This comment has been minimized.

Copy link
Member

Tobion commented Aug 29, 2019

I think the better solution is to just configure different serializers for different transports. So you configure an xml serializer for transport A and a json serializer for transport B. This way you also do not need to add the content type stamp for every message.

@pounard

This comment has been minimized.

Copy link
Contributor Author

pounard commented Aug 29, 2019

I think the better solution is to just configure different serializers for different transports

It works if your transport always give you messages encoded with the same content type, but that's not real life. That's why the AMQP protocol adds the 'content-type' property on messages I guess, otherwise it would means you'd have to setup an exchange server for XML, one for JSON, ...

Even working with queues and filtering at the business purpose level, you could actually end up with messages encoded using different formats if you have multiple nodes working with different versions/revisions of the protocol altogether in the same network.

@@ -33,7 +33,7 @@ interface SerializerInterface
*
* @throws MessageDecodingFailedException
*/
public function decode(array $encodedEnvelope): Envelope;
public function decode(array $encodedEnvelope, ?string $contentType = null): Envelope;

This comment has been minimized.

Copy link
@Tobion

Tobion Aug 29, 2019

Member

The content type could just be an element of the $encodedEnvelope array. Then the interface does not need to change.
Also, none of the receivers actually pass the content type to the decode method yet. So this does not seem to do anything.
If the same transport is meant to handle messages of different format (which I think is not a good design), then the doctrine transport for example would need to persist the content type as well. So that when it reads the message, it can be decoded appropriately again.

This comment has been minimized.

Copy link
@pounard

pounard Aug 29, 2019

Author Contributor

The content type could just be an element of the $encodedEnvelope array. Then the interface does not need to change.

Indeed, it's true, may be just adding it into the content-type header I added ? I may have been a bit too far on this one, I started to think about the fact that the headers and message properties are different in AMQP and that other brokers could have such distinction too and propagate conflicting information, but that's stupid of me, I'll loose that complexity.

Also, none of the receivers actually pass the content type to the decode method yet. So this does not seem to do anything.

For now, I created the PR for discussion and feedback before going down the road and update transports. AMQP will be trivial to do, Doctrine could leverage it as well by adding a content_type column in the storage table.

which I think is not a good design

It's not a problem of design, as soon as you write an API or component whose goal is to speak with the outside world, you better be liberal in what you accept and conservative in what you send, and more than that, flexible in what you allow to the user of your API. Providing a dynamic content type handling is exactly the definition of being liberal, future proof, resilient to time and changes in protocol, and allowing the user of your API to manually specify the content type when (s)he sends the message is exactly being flexible and conservative: it allows a developer to target very specific nodes and comply to their capabilities.

@pounard

This comment has been minimized.

Copy link
Contributor Author

pounard commented Aug 29, 2019

@Tobion to be honest, propagating the content-type as well as the message type as first class citizens of the API is a requirement for such component as the messenger. Without it, you prevent anyone to use it in an heterogeneous environment.

I know it's opinionated, and such will have repercussions on the design and documentation, but as it is implemented today. Messenger is already very much opinionated on many many things and it's not very bendable, as it exists, either you use it the way it is, either you use something else. That's kind of sad because I see much potential in this component, it is the answer to many modern application design needs, and I'd very much love to continue to use it, because it is promising.

And to be fair, I started using it when it was very experimental, since october 2018, now it starts to evolve and be more and more stable, but sadly details such as the content type handling prevent us from working in the environment we target, I cannot decently go back 10 month in the past and rewrite all our code.

I know it also would be a non negligible change in the design and the mind construct around the messenger component as it exists, but it would make it globally much much more resilient and flexible, and open it to wider range of usages.

@pounard

This comment has been minimized.

Copy link
Contributor Author

pounard commented Aug 30, 2019

@Tobion I value very much your comments and questions, thanks a lot for your time. I originally made this PR as a PoC, in order to also be proactive in those discussion, to show I'm not just criticizing but also I'm willing to help if you wish so. I'd be pleased to let this PR as-is and not touching it, and continue to discuss the design issues in the #32049 if it is OK for you.

@nicolas-grekas nicolas-grekas added this to the next milestone Aug 30, 2019
@pounard pounard requested a review from sroze as a code owner Aug 31, 2019
@pounard

This comment has been minimized.

Copy link
Contributor Author

pounard commented Aug 31, 2019

@Tobion I pushed new set of changes, that does:

  • content type is always looked up from headers, which drops the BC break on SerializerInterface::decode(),

  • Content-Type header in encode() already existed, I re-used the existing one with caps instead of creating a lower cased new one, re-using more bits of existing code. The encode() method will unconditionally try to lookup for an incoming content type before using the configured one, which will make it much more resilient to different incoming data that it would await for otherwise,

  • $format parameter given to Serializer constructor can be a mimetype, it should work transparently as long as it is something that points to json, yaml or xml,

  • Serializer implementation is much smarter, and fallbacks on the PhpSerializer by itself if the default format is php, which means that there's no need for actually having two serializer services, a single one, this one will work in all cases,

  • previous statement doesn't prevent anymore from using their own implementation, or writing a decorator around it for edge cases or custom handling,

  • this also mean that the Symfony serializer from the serializer component is optional when a project only uses a bus internal to the application, and doesn't need to serialize messages with anything than PHP serialization mechanism. Nevertheless the idea behind keeping a smart seralizer that handles content-type even in those apps is that it will propagate the content type to the transports, even the Doctrine one: in the future if the app evolves and opens messaging to handle more content types, legacy yet still alive messages will be transparently handled until all consumed, without the need of writing a migration path for developers,

  • during decoding, PhpSerializer was requiring that the serialized string contains an envelope object, I don't see this as really useful, deserializing any type of message without stamps is probably also a valid use case,

  • headers in $encodedEnvelope is now optional, type is not mandatory when using the PhpSerializer, there was no use in enforcing it to be present, but it remains mandatory as soon as the serializer needs to be called, which means that the exceptions are still raised, but later in the code only when necessary,

  • the serializer do not require anymore that messages are objects, it didn't take much modifications to do this and I can't see why arbitrary strings could not be send or received as messages, even though this is very open to discussion since the rest of the messenger API enforces messages to be objects. From what I could read most of its code actually doesn't care about that, so I'm really wondering why this restriction should be kept ? That's an actual honest question, did the original messenger developers designed this for a specific reason other than to be able to target handlers ?

  • content type that will be sent in headers is always a mimetype, actually it was already before this PR, the getMimeTypeForFormat() function ensured (and still ensures) this,

  • content type that will be sent in headers in encode() will be application/x-php-serialized to be a mimetype for the transport, this is arbitrary, I had no idea how to call it, and I'm sure there's a better name for it,

  • a handful of new unit tests are added for all of this,

  • this is still and proof of concept, hence I left quite some @todo as notes in there for discussion.

Also, and because I had a few minutes to loose, I added another change that probably should be in another issue, but that's for discussion purpose. I saw at least one issue in the queue were someone was asking for this (another person than my own issues), and I think it is a good idea that the messenger component handles this:

  • a new class Symfony\Component\Messenger\Handler\RawMessage is introduced, I was unsure for the namespace so I put it in Handler, I don't see where it would fit best in the actual namespaces,

  • this class holds a message raw content, when the serializer was unable to decode it,

  • the feature is optional, it is disabled per default, and enabled by passing the $allowRawMessages boolean value to true in the Serializer class constructor.

And now, besides a much more dynamic incoming content type handling, and the fact it doesn't need to have an explicit PhpSerializer service, it also keeps backward compatibility (no BC break).

I'm open to all comments, this is just a poc meant to open discussion around the changes I suggest. I think having a smart serializer is good way to go, because if it's sufficiently smart, it will avoid people needing to write their own custom one each time they have content types exposed by their transports, which in my opinion is something that could happen a lot.

I didn't update any of the existing transports to make use the content type header, this is purely for discussion, I don't want to write that much code if it's not going to be useful.

Copy link
Member

Tobion left a comment

The PR is doing too much and in a non-intrusive way. So I'm closing the PR. If you like to keep working on these topics, please create separate PRs for each use-case.

@@ -31,9 +31,6 @@ final class Envelope
*/
public function __construct($message, array $stamps = [])
{
if (!\is_object($message)) {
throw new \TypeError(sprintf('Invalid argument provided to "%s()": expected object but got %s.', __METHOD__, \gettype($message)));

This comment has been minimized.

Copy link
@Tobion

Tobion Feb 9, 2020

Member

this is meant to stay

*
* @experimental in 4.3
*/
class RawMessage

This comment has been minimized.

Copy link
@Tobion

Tobion Feb 9, 2020

Member

could be useful as I already suggested something like this. but it's unrelated to the changes here and would need a separate PR.


// Allow serialized raw message which will permit arbitrary message
// unserialization, in case it wasn't stored with the full envelope.
if (!$envelope instanceof Envelope) {

This comment has been minimized.

Copy link
@Tobion

Tobion Feb 9, 2020

Member

seems like a very edge case. it would only happen if you serialize php objects from a different php app not using symfony messenger and want to consume it from an app using messenger. could be ok to accept but should also be a separate pr.

{
$this->serializer = $serializer ?? self::create()->serializer;
$this->format = $format;
$this->phpSerializer = new PhpSerializer();

This comment has been minimized.

Copy link
@Tobion

Tobion Feb 9, 2020

Member

that the symfony serializer is now both a serializer using symfony serializer component and the php serializer in one place is not a good solution. it defeats single responsibility principle. if you want to implement something like, then create a new serializer like ContentTypeAwareSerializer that decorates the other serializers.

if (!$this->allowRawMessages) {
throw new MessageDecodingFailedException(sprintf('Could not decode message: %s.', $e->getMessage()), $e->getCode(), $e);
}
$message = new RawMessage($encodedEnvelope['body'], $encodedEnvelope['headers']);

This comment has been minimized.

Copy link
@Tobion

Tobion Feb 9, 2020

Member

decoding the raw string as a fallback when the other serializers fail is not a good solution IMO. a handler for RawMessage would be called for a message you don't want to decode (handle raw) and for a message that fails to be deserialized (wrong content for example). those two things are separate use-cases.
so using the RawMessage should be an explicit decision and a separate serializer you need to enable on a transport.

@pounard

This comment has been minimized.

Copy link
Contributor Author

pounard commented Feb 10, 2020

@Tobion fair enough, it was only a prototype. Thanks for having answered my questions at the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.