Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Messenger] Store message body as a blob object, not text #33920

Closed
wants to merge 2 commits into from
Closed

[Messenger] Store message body as a blob object, not text #33920

wants to merge 2 commits into from

Conversation

rimas-kudelis
Copy link

Q A
Branch? 4.3
Bug fix? yes
New feature? no
Deprecations? no
Tickets Fix #33913
License MIT

This fixes #33913 for me. After changing the column type, I could send emails with attachments via messenger with no problems.

@rimas-kudelis
Copy link
Author

rimas-kudelis commented Nov 5, 2019

Is there anything that prevents you from accepting this patch? I really want to see this in 4.4 by the time it comes out.

@Tobion
Copy link
Member

Tobion commented Nov 5, 2019

While I think this could be right thing to do and would work with mysql, I'm pretty sure it breaks for other database systems. See for example the PdoSessionStorage that uses BLOB and has different queries for different vendors, e.g.

case 'oci':
$data = fopen('php://memory', 'r+');
fwrite($data, $sessionData);
rewind($data);
$sql = "UPDATE $this->table SET $this->dataCol = EMPTY_BLOB(), $this->lifetimeCol = :expiry, $this->timeCol = :time WHERE $this->idCol = :id RETURNING $this->dataCol into :data";

Copy link
Member

@Tobion Tobion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pass the blob type wherever the body is passed to a query, e.g.


You need to rebase to get the latest changes

This change also deserves a changelog and upgrade entry and description how to migrate the existing table.

@stof
Copy link
Member

stof commented Nov 5, 2019

As this is a Doctrine-based implementation, DBAL can abstract the difference for us. But it would require setting the parameter type when binding the body param (the default behavior would set it as a string parameter, and so won't handle the different behavior needed for blobs in some platforms)

@Tobion
Copy link
Member

Tobion commented Nov 5, 2019

I couldn't find where doctrine handles the pdo oci workaround I listed above. It seems that is exactly not implemented in dbal: https://github.com/doctrine/dbal/blob/80a45c6d7e7dba7d0c133bf06724dcf2be6ab5e9/lib/Doctrine/DBAL/Driver/PDOOracle/Driver.php#L17-L20

If doctrine can handle the other edge cases, I'm fine. But IMO we should do this change in 4.4, not in 4.3 as a bug fix. And document the table upgrade with mysql as an example.

@Tobion Tobion modified the milestones: 4.3, 4.4 Nov 5, 2019
@rimas-kudelis
Copy link
Author

rimas-kudelis commented Nov 5, 2019 via email

@weaverryan
Copy link
Member

weaverryan commented Nov 6, 2019

There is some history behind the decision to do text. Specifically, BLOB columns (iirc) in MySQL are not human-readable, which makes the system more to debug and understand. There was effort during 4.3 to try to keep messages readable.

In #30957, I first played with using a BLOB column (Drupal does this). But due to lack of readability and that some other transports might (?) have problems with binary data (I think SQS only allows a few binary characters).

I propose 2 alternate solutions:

A) We make it possible (via configuration) to use the PhpSerializer but base64_encode (and base64_decode) the data. Then it's binary safe.

B) This solution, but make users need to opt into it.

@rimas-kudelis
Copy link
Author

I've rebased and hinted blob type, as requested by @Tobion . I don't mind this going straight to 4.4 since I've already switched to that myself.

@weaverryan regarding BLOB types being not viewable: I can view these columns just fine in MySQL Workbench (although you have to click "Open Value in Editor" in the context menu for the field), so I don't think it's a problem. In general, any solution is fine by me, I just picked this one because it was the lowest hanging fruit and it fixed the bug which prevented the scenario documented in Symfony docs from working.

@stof
Copy link
Member

stof commented Nov 12, 2019

Looks like you messed a rebase. You branch seems to be based on master, not on 4.3

rimas-kudelis and others added 2 commits November 12, 2019 18:48
This fixes #33913 for me. After changing the column type, I could send emails with attachments via messenger with no problems.
@weaverryan
Copy link
Member

What impact would this have when users are upgrading? I suppose no change to their db table would be made (because setup() will not be called again)? That's probably fine... but what about the executeQuery() call that changes the type to Type::BLOB for the body? What happens if the column is not a blob?

@rimas-kudelis
Copy link
Author

@weaverryan I can't answer the question about what would happen, but I could add a README entry about the need for manual schema update, as suggested by @Tobion above.

@toooni
Copy link
Contributor

toooni commented Dec 29, 2019

As @weaverryan already suspected, the doctrine transport isn't the only one having issues with this (#35137).

IMO, this means that it would be best if this issue would be solved in serializing with:
A.) Adding a parameter to PhpSerializer to choose either readability (addslashes) or functionality with binary data (base64_encode).
B.) Creating a seperate SafeSerializer which uses base64_encode instead of addslashes.
C.) Revert the solution with addslashes from #30957 and lose readability.

@nicolas-grekas
Copy link
Member

What's the next step here?

@@ -127,7 +127,7 @@ public function send(string $body, array $headers, int $delay = 0): string
$now,
$availableAt,
], [
null,
Type::BLOB,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right thing to do. But someone needs to test if it keeps working for people that still have the text type. So can you insert non-binary data to the body column if it is of type text (before this change)? Otherwise this PR would be a bc break.

@ricohumme
Copy link

I came across this discussion and saw it hadn't been touched in a while. My current quick fix was altering the schema to support longblob, as blob wasn't cutting it.
In the meantime I have been thinking about this, and where things go wrong in this scenario is when binary data is being serialized. I therefor vote for the option to base64 en-/decode this data.
However I don't think the messenger component is responsible for this, but in this case the mailer component.
What is your opinion on this?

@toooni
Copy link
Contributor

toooni commented Jun 11, 2020

@ricohumme I agree. But I am not sure on what to do (as mentioned in my comment #33920 (comment)):

  • Having multiple serializers
  • Revert addslashes/stripslashes to base64_encode/base64_decode in PhpSerializer
  • Add a configurable parameter to PhpSerializer

But I am sure whatever solution will be chosen, base64 en-/decoding should be the default behaviour here.

@rimas-kudelis
Copy link
Author

I disagree. IMO, if Messenger accepts a message, it's Messenger's responsibility to handle it properly, and that includes transforming it if and when necessary. Mailer should not care about Messenger's internals, nor should Messenger care about Mailer's.

Let's say Mailer is not the only component that has objects with binary data and sends them in messages. Asking each such component to prepare data so that Messenger can safely transport it doesn't look like a good idea to me, assuming that Messenger can actually do all that itself when it's necessary.

@toooni
Copy link
Contributor

toooni commented Jun 11, 2020

@rimas-kudelis The PhpSerializer is part of the messenger component. So we basically mean the same ;)

Edit: Sorry, I misread the part about which component should be responsible for this. I think it should be the job of the messenger component.

@ricohumme
Copy link

ricohumme commented Jun 11, 2020

The class Symfony\Component\Mime\RawMessage implements \Serializable, so I believe it should be a responsibility of the message. Before serializing the binary part of the message, implemented in Symfony\Component\Mime\Email, it should be base64_encoded. There is already a __serialize method in place, but it does nothing with the contents of an attachment, except fetch it when it is a stream.

@rimas-kudelis
Copy link
Author

rimas-kudelis commented Jun 11, 2020

@ricohumme isn't it already serializable though? IMO the problem here is not about missing slashes, but about raw bytes which aren't defined in a particular character set.

@ricohumme
Copy link

What are you referring to with: isn't it already serializable though?

@toooni
Copy link
Contributor

toooni commented Jun 11, 2020

I am using a solution where the messenger component uses a SafeSerializer (copy of PhpSerializer with base64 en-/decoding) in the messenger component for several months in production now. I am sending emails with attachments going through messenger (Redis Transport) and everything works fine. I do not think there is anything wrong with the current mailer component implementation.

@rimas-kudelis
Copy link
Author

@ricohumme

What are you referring to with: isn't it already serializable though?

As you said, the class already implements \Serializable, which means, it already fulfills the contract. It could escape attachments, but it doesn't have to.

@ricohumme
Copy link

ricohumme commented Jun 11, 2020

@rimas-kudelis

As you said, the class already implements \Serializable, which means, it already fulfills the contract. It could escape attachments, but it doesn't have to.

Correct, but the attachment containing raw data can't be serialized to the database

@toooni I think the use of a SafeSerializer is beneficial here, but the message in the database is no longer readable what so ever. That is why my suggestion is only to apply the base64 en-/decoding to the attachment section containing raw data.

Otherwise you would need to perform decoding on the column first in order to perform any search on it, which could be heavy for the database when having a lot of messages, specifically nasty if they have failed. Or should perform everything on console using the messenger:failed command.

@toooni
Copy link
Contributor

toooni commented Jun 11, 2020

@ricohumme The issue is that we do not know how each transport behaves. AWS SQS for example has issues with serialized data from PHP because of some characters. I also had issues with the redis transport.
I do understand that readability is a desirable feature. But IMO working out of the box in (almost) every case has a higher priority. A human-readable serializer could still be a configurable alternative.

@ricohumme
Copy link

@toooni agreed. Then SafeSerializer is the way to go :)

@rimas-kudelis
Copy link
Author

Let's also not forget that a message can be stored in a BLOB-type field in its pristine form. It remains readable in that case, assuming your tool of choice allows viewing BLOB field content without too much hassle.

@ricohumme
Copy link

Like a said in a previous post, when pushing the data I had in a blob, which was one attachment, it failed because the data being posted was about 20 mb. So I used a longblob instead, as medium blob was selling short in that department as well.
For reference:
Blob = max 64 kb
Mediumblob = max 16 mb
Longblob = max 4 gb

@rimas-kudelis
Copy link
Author

Sure, sure. I meant the whole category of BLOB types, not the BLOB type specifically.

@nicolas-grekas nicolas-grekas changed the title Store message body as a blob object, not text [Messenger] Store message body as a blob object, not text Sep 3, 2020
@nicolas-grekas
Copy link
Member

I propose another approach in #39970, please have a look.

We've just been hit by this issue when sending PDFs, and having to write a custom serializer for Messenger is just bad DX. We need to solve this!

fabpot added a commit that referenced this pull request Jan 25, 2021
… them using base 64 (nicolas-grekas)

This PR was merged into the 4.4 branch.

Discussion
----------

[Messenger] Fix transporting non-UTF8 payloads by encoding them using base 64

| Q             | A
| ------------- | ---
| Branch?       | 4.4
| Bug fix?      | yes
| New feature?  | no
| Deprecations? | no
| Tickets       | Fix #33913
| License       | MIT
| Doc PR        | -

Replaces #33920

When using the Doctrine transport, sending emails with binary attachments currently requires a custom Messenger serializer because the "body" column is created for UTF-8 only.

In #33920, it is proposed to change the TEXT type to a BLOB. It leaves at least one problem unhandled: the conversion of existing messenger tables.

This PR takes a more conservative approach, by encoding messages to base 64, only if they are non-UTF8.

Compatibility with the existing format is preserved.

The drawback of this approach is that the size of eg email attachments is going to increase by 33% because of the extra encoding. I think this drawback is acceptable for 4.4, and that this PR is the most pragmatic way to make attachments just work.

Commits
-------

6fc9e51 [Messenger] Fix transporting non-UTF8 payloads by encoding them using base 64
@nicolas-grekas
Copy link
Member

Thank you @rimas-kudelis

@rimas-kudelis
Copy link
Author

Thank you for doing this properly! :)

@rimas-kudelis rimas-kudelis deleted the 4.3-patch-1 branch January 27, 2021 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants