New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with utf-8 characters in emails #537

Open
lcarlos-fokus opened this Issue Sep 23, 2015 · 33 comments

Comments

Projects
None yet
6 participants
@lcarlos-fokus

lcarlos-fokus commented Sep 23, 2015

See example below:

Hey lcarlos!

rafaelazem mentioned you in a post in Transportadora no FKcarrier - dobrando o valor do frete.

http://www.modulosfk.com.br/modulosfk/flarum/d/19/6

---

@lcarlos resolvido! Muito obrigado!

Apenas uma última dúvida, o FKcorreios G2 substitui o FKcarrier em suas funcionalidades, isto?

Obrigado!
@luceos

This comment has been minimized.

Member

luceos commented Sep 23, 2015

E-mails should be rewritten to be html and use a proper encoding, Mailer class automatically generates a text only variant if needed.

@dcsjapan

This comment has been minimized.

Member

dcsjapan commented Sep 26, 2015

Related to #241

@Averell7

This comment has been minimized.

Averell7 commented Jan 27, 2016

The reason is that the message is incoherent. Header indicates : Plain text / Utf8 and the content is converted in Html entities. In French, the result is unreadable. Here is a part of the source code of a message which informs of a new post :

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
(...)
---
Le problème de la ligne téléphonique a ét&eacut=
e; réparé,
--- 

The original text is : Le problème de la ligne téléphonique a été réparé
Le problème de la ligne téléphonique est réparé.

For a quick solution, the best option would be to suppress the conversion of accentuated characters to html entities. From my tests, I conclude that this happens before the use of the blade file :
\flarum\extensions\flarum-subscriptions\views\emails\newPost.blade.php
I have translated this file to French, and I have to use Utf-8. It works fine.

The problem is that the result of :
strip_tags($blueprint->post->contentHtml)
is already translated in Html. I tried :
strip_tags($blueprint->post->content)
The result is the same.

@Averell7

This comment has been minimized.

Averell7 commented Apr 4, 2016

I made some additional tests :

I modified the blade file to add :

TEST

content {{ $blueprint->post->content }}
contentPlain {{ $blueprint->post->contentPlain }}
contentHtml {{ $blueprint->post->contentHtml }}

The result in the email is :

TEST
content Le problème de la ligne téléphonique est réparé
contentPlain 
contentHtml <p>Le problème de la ligne téléphonique est réparé</p>

Conclusion : "content" is already converted to html entities, and contentPlain is empty.

We should convert those html entities to utf8, but I have no idea how to do that in a blade file !

@luceos

This comment has been minimized.

Member

luceos commented Apr 4, 2016

Haven't read through all of this, but can imagine the blade template uses double accolades instead of accolade and double exclamation marks. The difference is between allowing raw html (latter)...

@Averell7

This comment has been minimized.

Averell7 commented Apr 4, 2016

Here is the fix, not perfect, but indicating the direction.

To turn the mime type in html instead of plain :

in flarum\vendor\flarum\flarum-ext-subscriptions\src\Notification\NewPostBlueprint.php, line 63, change 'text' in 'html'. Or better, create a multipart email, with both versions :

 public function getEmailView()
    {
        return ['html' => 'flarum-subscriptions::emails.newPostHtml',
                'text' => 'flarum-subscriptions::emails.newPost'];
    }

Then create a new blade file, newPostHtml.blade.php and suite it to your needs.

I attach my blade files (from flarum\vendor\flarum\flarum-ext-subscriptions\views\emails)
NewPost-blade_files.zip

It is not perfect because the end of paragraph marks are not converted to html code, and the text of the post is a solid block. Nevertheless, it is MUCH MORE readable. Here is the message received :


Hello pierre

Averell a publié une réponse dans une discussion que vous suivez: test

Pour voir les nouvelles interventions, visitez le lien suivant:
test


Le problème de la ligne téléphonique a été réparé


Vous ne recevrez pas d'autres avertissements concernant cette discussion tant que vous n'aurez pas été lire les derniers messages.


WARNING : if you edit the blade file, write it in utf8, and be careful that the BOM must NOT be added in the beginning. I use Scite, and the encoding must be "UTF-8 cookie" and not "UTF-8"

@Averell7

This comment has been minimized.

Averell7 commented Apr 4, 2016

@luceos thanks for your interest. Could it be possible to set a milestone for this issue ?

@franzliedke

This comment has been minimized.

Member

franzliedke commented Apr 5, 2016

This will cause characters to be HTML escaped:

contentHtml {{ $blueprint->post->contentHtml }}

Try this instead:

contentHtml {!! $blueprint->post->contentHtml !!}
@Averell7

This comment has been minimized.

Averell7 commented Apr 5, 2016

Hello, @luceos I didn't understand what you meant, but it was the good idea, as @franzliedke explained, and it is the solution. Thanks @franzliedke , the bug is fixed.

Hey {!! $user->username !!}!

{!! $blueprint->post->user->username !!} made a post in a discussion you're following: {!! $blueprint->post->discussion->title !!}

To view the new activity, check out the following link:
{!! app()->url() !!}/d/{!! $blueprint->post->discussion_id !!}/{!! $blueprint->post->number !!}

---

{!! strip_tags($blueprint->post->contentHtml) !!}

---

You won't receive any more notifications about this discussion until you're up-to-date.

Now I don't know if developers will want only to fix it, or take in consideration now what you wrote above : "E-mails should be rewritten to be html and use a proper encoding". The code in my previous post gives a part of the answer, but there is still the problem of end of paragraph marks which are not converted to html codes.

At this point, should I create a PR for the fix ?
Same question about #893 which is fixed as well.

@franzliedke

This comment has been minimized.

Member

franzliedke commented Apr 5, 2016

Until we have time to implement #241, a fix for this would be welcome, yes. :)

@tobscure

This comment has been minimized.

Member

tobscure commented Apr 5, 2016

Probably not necessary/a good idea to use the unescaped output {!! !!} tags for everything, as it's a security issue (HTML injection). I think it should only be necessary around the contentHtml.

@Averell7

This comment has been minimized.

Averell7 commented Apr 5, 2016

@franzliedke I have now a code which sends multipart messages, html and plain text, with clickable link in html. This solves #241. I am using the code explained above :

public function getEmailView()
    {
        return ['html' => 'flarum-subscriptions::emails.newPostHtml',
                'text' => 'flarum-subscriptions::emails.newPost'];
    }

and then two blade files.

Here is the model I am working on, I show only the part which displays the message, there is no problem for the rest of the file.

<br>
---<br>
<br>
{!! str_replace(chr(10), "<br>", htmlentities(strip_tags($blueprint->post->contentHtml))); !!} <br>
<br>
---<br>
<br>

I forked flarum-ext-subscriptions, because it is easier for you to see what I have done.
A translatable version will be ready today

@Averell7

This comment has been minimized.

Averell7 commented Apr 6, 2016

Hello, I think my job is finished. I implemented :

  • multipart email message html/plain text
  • correct formatting, either in html or in plain text.
  • translation of the message

What could be done :
Adapt the html model to create something more beautiful. But I think this is not my job.

I consider this bug (#537) an bug #241 are fixed.

Could you verify and push this to the beta-6, I would appreciate because it caused some hours of night work (you know what I mean 😄 )

@dcsjapan

This comment has been minimized.

Member

dcsjapan commented Apr 6, 2016

I can't check the code for you, but I can offer some suggestions regarding the language resources.

First, please take some time to read the Internationalization docs, which explain the standard format to be used when adding translatable strings to Flarum. If you don't follow this format, you'll only be creating a new issue that someone else has to deal with before your code can be added to the project.

There is a specific format for email strings. Here's an example from core.yml in the English extension:

    # These translations are used in emails sent when users change their email address.
    confirm_email:
      body: |
        Hey {username}!
        Someone (hopefully you!) has changed their email address on {forum} to this one.
        If this was you, simply click the following link and your email will be confirmed:
        {url}
        If this was not you, please ignore this email.
      subject: Confirm Your New Email Address

Each email should consist of two parts:

  • The body, represented as a literal block, and
  • The subject line

The key names for these two parts should be body and subject; they should be namespaced under a key name that distinguishes the email from other emails. So the namespacing for the resources that you add should ultimately look something like this:

  • flarum-subscriptions.forum.new_post_notification.body

    flarum-subscriptions.email.new_post_notification.body

  • flarum-subscriptions.forum.new_post_notification.subject

    flarum-subscriptions.email.new_post_notification.subject

EDIT: Sorry ... I used the wrong namespacing as a result of copypasta. These translations should be in the flarum-subscriptions.email namespace, not flarum-subscriptions.forum.


EDIT: Writing the body text as a single literal block will also allow you to avoid splitting sentences up, which should be avoided at all costs. Fragmentary strings impose a hardcoded syntax on the text that will cause problems for many translators. For example:

+{{ $blueprint->post->user->username }} <?php echo $new_post ?> {{ $blueprint->post->discussion->title }}<br>

... Your $new_post string would be impossible to translate into any language that puts the verb at the beginning or end of the sentence. The proper way to handle this would be to pass the username and title to the translator as additional variables following the translation key.


Once you've got your namespacing and translator calls straightened out, please consider forking the English language pack repo and adding the necessary phrases there. Then you can remove some of your commenting and get your code closer to a finished state. That extra effort will make it easier for the devs to review your code and add your changes in a timely fashion.

@franzliedke

This comment has been minimized.

Member

franzliedke commented Apr 7, 2016

@Averell7 Where is your code? Did you send a PR?

@dcsjapan

This comment has been minimized.

Member

dcsjapan commented Apr 7, 2016

No PR, you can see his changes here:

flarum/subscriptions@master...Averell7:master

@franzliedke franzliedke modified the milestone: 0.1.0 Apr 7, 2016

@Averell7

This comment has been minimized.

Averell7 commented Apr 8, 2016

@dcsjapan thanks for the comment. I thought quite a lot about it. The reason is : This file uses a very special format : .blade.php. It is for that reason that it was never translated.
I improved my code, because what I am proposing is now full blade format. Here it is :

{!! app()->translator->trans('flarum-subscriptions.forum.email.hello') !!} {{ $user->username }}!

{{ $blueprint->post->user->username }} {!! app()->translator->trans('flarum-subscriptions.forum.email.new_post') !!} {!! $blueprint->post->discussion->title !!}

{!! app()->translator->trans('flarum-subscriptions.forum.email.check_link') !!}
{{ app()->url() }}/d/{{ $blueprint->post->discussion_id }}/{{ $blueprint->post->number }}

---

{!! strip_tags($blueprint->post->contentHtml) !!}

---

{!! app()->translator->trans('flarum-subscriptions.forum.email.no_more_notifications') !!}

It is just impossible to use the specific format for email strings because there are no php variables here. So we have two options, and I cannot decide between them :

  1. Full blade format (above) and separate strings for each small block of text. The same strings will be used for the html and plain text formats
  2. Rewrite the file in standard php format (no problem afaik), and then the specific format for email strings can be used. There will be specific strings for the plain text and html formats.

The final message will be exactly the same in both cases.
At this point I cannot go further. The developers must first choose and tell me if they want to keep the blade.php format (above) or allow me to switch to standard php format. Then I can finish the job, following your indications.

@dcsjapan

This comment has been minimized.

Member

dcsjapan commented Apr 8, 2016

It is for that reason that it was never translated.

Also, I overlooked the views directories while extracting the translations. I was concentrating on the JS files and didn't realize there was language to be extracted in other places.

It is just impossible to use the specific format for email strings because there are no php variables here.

I spent some time last night thinking about how to approach these translations, and I encountered the same difficulty with the blade format: it tends to produce a sentence structure that is too inflexible.

Unfortunately your improved code won't avoid that issue, either. To accommodate localization into as many languages as possible, I suspect it will be necessary to move away from the full blade format. The best approach would probably end up looking something like what Toby has done for the confirmation emails in SendConfirmationEmailController.php. But I'm just guessing about that.

At this point I cannot go further. The developers must first choose and tell me if they want to keep the blade.php format (above) or allow me to switch to standard php format.

I agree, one of the devs would be better able than me to suggest what your next step should be.

Thanks for your efforts on this, and for taking my comments into consideration.

@Averell7

This comment has been minimized.

Averell7 commented Apr 8, 2016

I spent some time last night thinking about how to approach these translations, and I encountered the same difficulty with the blade format: it tends to produce a sentence structure that is too inflexible.

So we arrive to the same conclusion : The best option is switching to php format. In this case, it is no problem. For plain text format, the file is :

<?php 
$app_url = app()->url();
$data = [
            '{dest_username}' => $user->username,
            '{post_username' => $blueprint->post->user->username,
            '{post_title}' => $blueprint->post->discussion->title,
            '{url}' =>  $app_url . "/d/" . $blueprint->post->discussion_id . "/" . $blueprint->post->number,
            '{content}' => $blueprint->post->content            
        ];
echo app()->translator->trans('flarum-subscriptions.forum.email.body', $data);
?>

And the translation string is :

      body: |
        Hey {dest_username}!

        {post_username} made a post in a discussion you're following: {post_title}

        To view the new activity, check out the following link:
        {url}

        ---

        {content}

        ---

        You won't receive any more notifications about this discussion until you're up-to-date.

Discuss the matter with developers, and let me know.
Code above has been tested and works well.

@dcsjapan

This comment has been minimized.

Member

dcsjapan commented Apr 9, 2016

One small change is needed: the key should be flarum-subscriptions.email.new_post_notification.body. Both forum and email are defined as second-level namespacing keys; only email is needed here.

The third-level key new_post_notification is also required, to distinguish this email from any other emails that might be added to the Subscriptions extension in the future.

I apologize for the misleading copypasta in my initial comments; I've edited that post to show the correct key names to use in this situation.


@tobscure Eventually it will also be necessary to pass two gender variables to the translator: one each for dest_username and post_username. I thought I'd mention that here just in case it could affect how or where this code gets added.

The same is true of the emails sent by Mentions.

@Averell7

This comment has been minimized.

Averell7 commented Apr 9, 2016

I also apologize because I had not yet studied the details about keys for translations.
In case the standard php solution is accepted, there will be a question : I will need two bodies, one for plain text et the other for html. If we stick to body: , then there will be :

subscriptions.email.new_post_notification_text.subject
subscriptions.email.new_post_notification_text.body

subscriptions.email.new_post_notification_html.subject
subscriptions.email.new_post_notification_html.body

The other option would be to add a suffix to body :

subscriptions.email.new_post_notification.subject
subscriptions.email.new_post_notification.body_txt
subscriptions.email.new_post_notification.body_html

which is much better IMHO.
Multipart emails are a must, if we want to profit from the possibilities of html. A user which refuses html will not like to see a message full of html codes.

@dcsjapan

This comment has been minimized.

Member

dcsjapan commented Apr 9, 2016

I would have to defer to Toby's judgment, but I think we probably don't want more than one body text per email in the YAML files. We've been working hard to avoid such duplication of language resources.

It may be possible to get by with only the HTML version, but before we discuss that possibility any further, please allow me to ask a question and make sure I'm not misunderstanding:

Will the PHP code you posted above for the plain text format mail solve the UTF8 characters problem? (That is, will users be able to see the quoted forum post without garbling by HTML entities?)

If it will, then perhaps it would be better to focus on closing this issue, and leave #241 for later. Trying to solve too many issues at once complicates the problem and makes it more difficult to implement quickly. (As you have seen, just adding the translations involves a few more complexities than one might expect.)

If the HTML entitles problem can't be solved without sending mail as HTML, then I may have an idea that could work.

@Averell7

This comment has been minimized.

Averell7 commented Apr 9, 2016

The htmlentities problem is solved in both codes posted above, pure PHP or blade.php. The final result is the same.
So, you can assume that presently any type of mail can be generated :

  • plain text only
  • html only
  • html/plain text

All types will have a clean text (with additional formatting possible in html).

It is just your choice, tell me and I'll do it, I have all the clues now. But there is still the question : blade or not. And here also, there is no difference in the result, the difference, as you know, is that in blade format the text must be broken in several pieces.

@Averell7

This comment has been minimized.

Averell7 commented Apr 9, 2016

I attach two screen capture :

  1. Plain text mail built with the above php file
  2. Html mail built with the blade version (I have not yet written the php version for html, but the result will be exactly the same)
    The text is the same in both cases. The client is Thunderbird.

pure php-plain_text
blade - html

@Averell7

This comment has been minimized.

Averell7 commented Apr 9, 2016

Just for the fun : the same email in the beta5 version.
beta5

@Averell7

This comment has been minimized.

Averell7 commented Apr 10, 2016

@dcsjapan I found the trick, it was so simple ! The body: string will be the html version, and the program will strip the html tags to have the plain/text version.

As soon as you have decided about PHP or blade, I can finish the code.

@dcsjapan

This comment has been minimized.

Member

dcsjapan commented Apr 10, 2016

The body: string will be the html version, and the program will strip the html tags to have the plain/text version.

@Averell7 That's more or less what I had in mind. My only concern is to prevent the program from stripping HTML tags that should be left intact, e.g. any HTML tags inside a [code] block in the quoted post content.

The HTML version of the email can be added to the YAML file using abbreviated HTML tags as described in the Adding HTML Tags section of the i18n docs. Please read that section carefully; simply adding standard HTML tags won't work, as they will be stripped from the text by the translator.

As we both concluded, the blade approach doesn't adapt well to translation; so I think it would probably be best to go with the PHP approach. Do you agree with this @tobscure ?

@Averell7

This comment has been minimized.

Averell7 commented Apr 10, 2016

My only concern is to prevent the program from stripping HTML tags that should be left intact, e.g. any HTML tags inside a [code] block in the quoted post content.

  1. protect < and > characters
  2. create the translation, and strip tags
  3. restore protected < and >

It is implemented in the version referenced below

@Averell7

This comment has been minimized.

Averell7 commented Apr 11, 2016

My only concern is to prevent the program from stripping HTML tags that should be left intact, e.g. any HTML tags inside a [code] block in the quoted post content.

The version I updated on Github works fine
flarum/subscriptions@master...Averell7:master

If @tobscure accepts it, we are done.

@luceos

This comment has been minimized.

Member

luceos commented Apr 11, 2016

After having a lengthy discussion on Gitter with @dcsjapan , who patiently took the time to explain the translation issues involved here, I think it's best to do the following:

  • do not drop blade formatting in the blade files; the proposed pr uses only raw php in a blade file. Everything you are doing there can be done within blade. Dropping blade markup now might create future issues and can drop extensibility too.
  • the reason for this issue is utf-8 support, best practice is to solve one issue per PR. Let's do that by using the unescaped blade formatters Franz explained. I recommend to work on full html support from issue #241 and PR against that.
  • the issue of utf-8 support has to be fixed for all emails, this also includes the mentions extension.
@Averell7

This comment has been minimized.

Averell7 commented Apr 12, 2016

Everything you are doing there can be done within blade.

May be, but I don't know how and I cannot study blade for that, it must be done by someone who is familiar with this format.

I gave above simple versions in blade format which cannot address the problem of translations, but fix the bug of this thread. Sorry if cannot help more, I have also my own programs to maintain.

Remember to fix #920 It was fixed by my code, but if another takes the question afresh, the bug may arise again, so I opened again the issue.

Remember also that perhaps something has to be fixed in flarum\vendor\flarum\core\js\lib\models\Post.js because contentPlain returns nothing. The function is in strings.js :

export function getPlainContent(string) {
  const dom = $('<div/>').html(string.replace(/(<\/p>|<br>)/g, '$1 &nbsp;'));

  dom.find(getPlainContent.removeSelectors.join(',')).remove();

  return dom.text();
}

/**
 * An array of DOM selectors to remove when getting plain content.
 *
 * @type {Array}
 */
getPlainContent.removeSelectors = ['blockquote', 'script'];

That's all, guys. I hope you will find the right solution in blade.

@Averell7

This comment has been minimized.

Averell7 commented Apr 12, 2016

@dcsjapan , Thanks for the collaboration, I really appreciated working with you.

@dcsjapan

This comment has been minimized.

Member

dcsjapan commented Apr 12, 2016

@Averell7 Thank you ... and I'm sorry if I got things muddled. I thought trying to match the the code to that used for the confirmation emails was the right direction ... but @luceos explained to me last night that the blade templating will be needed in the long run, to allow styling of the emails. That's an aspect that I hadn't considered.

@luceos luceos referenced this issue Feb 2, 2017

Open

HTML emails #241

@tobscure tobscure removed this from the 0.1.0 milestone Jul 22, 2017

@franzliedke franzliedke removed the Backend label Jul 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment