New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in html files sent as attachments, non-breaking-spaces (typed in with alt_space) are transformed #600

Open
jens-maus opened this Issue Apr 26, 2016 · 13 comments

Comments

Projects
None yet
2 participants
@jens-maus
Owner

jens-maus commented Apr 26, 2016

Originally by JDuch@fulladsl.be on 2015-05-12 22:09:09 +0200


Summary

Wheras Simplemail does send the same file containing non-breaking blanks undistorted, YAM2.10 Dev does ,

Steps to reproduce

1.Insert the line
"NBSstart             NBSstop" in a html file

  1. attach the file to a mail addressed to you
  2. Inspect the file attached to the received mail
    Look at the inserted line

Expected results

Inserted line should still be the same:
"NBSstart             NBSstop"

Actual results

"NBSstart��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� NBSstop" when looked at with CED (in CED the blanks in the preceding line appear as squares)
In IBrowse the line is presented as.
"NBSstartÃfÂ,Ã, ÃfÂ,Ã, ÃfÂ,Ã, ÃfÂ,Ã, ÃfÂ,Ã, ÃfÂ,Ã, ÃfÂ,Ã, ÃfÂ,Ã, ÃfÂ,Ã, ÃfÂ,Ã, ÃfÂ,Ã, ÃfÂ,Ã, ÃfÂ,Ã, NBSstop"

Regression

Notes

@tboeckel

This comment has been minimized.

Show comment
Hide comment
@tboeckel

tboeckel Apr 26, 2016

Collaborator

Originally on 2015-05-13 13:15:20 +0200


Are you sure you really attached a file containing NBSP characters (0xa0)?

I just created an example text file and verified that it really contains NBSP characters and for me YAM correctly encodes it as quoted-printable and perfectly restores the file byte by byte after having received it again.

Collaborator

tboeckel commented Apr 26, 2016

Originally on 2015-05-13 13:15:20 +0200


Are you sure you really attached a file containing NBSP characters (0xa0)?

I just created an example text file and verified that it really contains NBSP characters and for me YAM correctly encodes it as quoted-printable and perfectly restores the file byte by byte after having received it again.

@tboeckel

This comment has been minimized.

Show comment
Hide comment
@tboeckel

tboeckel Apr 26, 2016

Collaborator

Originally on 2015-05-13 13:16:45 +0200


Attachment added: nbsp_test.lha (0.1 KiB)
text file with NBSP characters

Collaborator

tboeckel commented Apr 26, 2016

Originally on 2015-05-13 13:16:45 +0200


Attachment added: nbsp_test.lha (0.1 KiB)
text file with NBSP characters

@jens-maus

This comment has been minimized.

Show comment
Hide comment
@jens-maus

jens-maus Apr 26, 2016

Owner

Originally by JDuch@fulladsl.be on 2015-05-14 18:55:48 +0200


I used your file "as is" and inserted in a html nbsp_test.html file, attached both to a yam mail addressed to myself, and saved the results as
nbsp_test_returned.txt and nbsp_test_returned.html.

Those files are uploade in nbsp_.lha

Owner

jens-maus commented Apr 26, 2016

Originally by JDuch@fulladsl.be on 2015-05-14 18:55:48 +0200


I used your file "as is" and inserted in a html nbsp_test.html file, attached both to a yam mail addressed to myself, and saved the results as
nbsp_test_returned.txt and nbsp_test_returned.html.

Those files are uploade in nbsp_.lha

@jens-maus

This comment has been minimized.

Show comment
Hide comment
@jens-maus

jens-maus Apr 26, 2016

Owner

Originally by JDuch@fulladsl.be on 2015-05-14 18:56:56 +0200


Attachment added: nbsp_.lha (0.3 KiB)

Owner

jens-maus commented Apr 26, 2016

Originally by JDuch@fulladsl.be on 2015-05-14 18:56:56 +0200


Attachment added: nbsp_.lha (0.3 KiB)

@tboeckel

This comment has been minimized.

Show comment
Hide comment
@tboeckel

tboeckel Apr 26, 2016

Collaborator

Originally on 2015-05-18 08:54:26 +0200


No problem here. Your HTML document is encoded as quoted-printable by YAM and as base64 by Thunderbird. YAM then correctly saves the attachments of itself and Thunderbird exactly byte for byte as the original file.

Collaborator

tboeckel commented Apr 26, 2016

Originally on 2015-05-18 08:54:26 +0200


No problem here. Your HTML document is encoded as quoted-printable by YAM and as base64 by Thunderbird. YAM then correctly saves the attachments of itself and Thunderbird exactly byte for byte as the original file.

@jens-maus

This comment has been minimized.

Show comment
Hide comment
@jens-maus

jens-maus Apr 26, 2016

Owner

Originally by JDuch@fulladsl.be on 2015-05-18 15:41:30 +0200


Maybe because i am using Yam in French locale & with charset ISO-8859-15 ?

I noted that the raw sent message refers tot ISO-8859-1 not ISO-8859-15

----=_BOUNDARY.5efd37f068e04a7c.ed
Content-Type: text/plain; charset=ISO-8859-1;
name="nbsp_test.txt"
Content-Disposition: attachment;
filename="nbsp_test.txt";
size=17
Content-Transfer-Encoding: quoted-printable

start=A0=A0=A0=A0=A0=A0=A0=A0stop

Owner

jens-maus commented Apr 26, 2016

Originally by JDuch@fulladsl.be on 2015-05-18 15:41:30 +0200


Maybe because i am using Yam in French locale & with charset ISO-8859-15 ?

I noted that the raw sent message refers tot ISO-8859-1 not ISO-8859-15

----=_BOUNDARY.5efd37f068e04a7c.ed
Content-Type: text/plain; charset=ISO-8859-1;
name="nbsp_test.txt"
Content-Disposition: attachment;
filename="nbsp_test.txt";
size=17
Content-Transfer-Encoding: quoted-printable

start=A0=A0=A0=A0=A0=A0=A0=A0stop

@tboeckel

This comment has been minimized.

Show comment
Hide comment
@tboeckel

tboeckel Apr 26, 2016

Collaborator

Originally on 2015-05-19 08:32:54 +0200


Replying to JosDuchIt:

I noted that the raw sent message refers tot ISO-8859-1 not ISO-8859-15

You can define different charsets for GUI and for writing mails. Usually the GUI charset matches your system charset and YAM will warn you otherwise. The charset for writing mails also defaults to the system charset, but this one can be adjusted without any restrictions.

All text attachments (i.e. *.txt, *.html, mails, etc) will get the "write mail charset" included in the Content-Type header. Unfortunately it is impossible to correctly detect the encoding of a text file, because nobody can tell you wether a character beyond 0x80 exists because it is a german umlaut, or wether it exists because it introduces a UTF8 sequence. That's why YAM must rely on the user settings. No matter which charset you are using, YAM will never reencode the attached file. It will just declare the file to be encoded in the selected write charset.

But as you can see YAM definitely correctly attached the file without changing any on the NBSP characters:

start=A0=A0=A0=A0=A0=A0=A0=A0stop

So the final question is wether we really have a problem/bug in YAM here or whether it is just a matter of misunderstanding and possibly wrong handling of the original file by several text editors?

Collaborator

tboeckel commented Apr 26, 2016

Originally on 2015-05-19 08:32:54 +0200


Replying to JosDuchIt:

I noted that the raw sent message refers tot ISO-8859-1 not ISO-8859-15

You can define different charsets for GUI and for writing mails. Usually the GUI charset matches your system charset and YAM will warn you otherwise. The charset for writing mails also defaults to the system charset, but this one can be adjusted without any restrictions.

All text attachments (i.e. *.txt, *.html, mails, etc) will get the "write mail charset" included in the Content-Type header. Unfortunately it is impossible to correctly detect the encoding of a text file, because nobody can tell you wether a character beyond 0x80 exists because it is a german umlaut, or wether it exists because it introduces a UTF8 sequence. That's why YAM must rely on the user settings. No matter which charset you are using, YAM will never reencode the attached file. It will just declare the file to be encoded in the selected write charset.

But as you can see YAM definitely correctly attached the file without changing any on the NBSP characters:

start=A0=A0=A0=A0=A0=A0=A0=A0stop

So the final question is wether we really have a problem/bug in YAM here or whether it is just a matter of misunderstanding and possibly wrong handling of the original file by several text editors?

@tboeckel

This comment has been minimized.

Show comment
Hide comment
@tboeckel

tboeckel Apr 26, 2016

Collaborator

Originally on 2015-05-19 16:07:12 +0200


In (053ca93):

  • YAM_UT.c, WriteWindow.c: implemented a function to check whether a string is correctly UTF8 encoded and contains at least one UTF8 character. Based on this check the charset of text attachments is forced to either UTF8 or ISO-8859-1 instead of the configured write mail charset. This refs #600. Please note that YAM does NO reencoding, it just gives the receiver a hint how to handle the attached file.
Collaborator

tboeckel commented Apr 26, 2016

Originally on 2015-05-19 16:07:12 +0200


In (053ca93):

  • YAM_UT.c, WriteWindow.c: implemented a function to check whether a string is correctly UTF8 encoded and contains at least one UTF8 character. Based on this check the charset of text attachments is forced to either UTF8 or ISO-8859-1 instead of the configured write mail charset. This refs #600. Please note that YAM does NO reencoding, it just gives the receiver a hint how to handle the attached file.
@jens-maus

This comment has been minimized.

Show comment
Hide comment
@jens-maus

jens-maus Apr 26, 2016

Owner

Originally by JDuch@fulladsl.be on 2015-05-19 18:17:57 +0200


Replying to tboeckel:

In (053ca93):

> * YAM_UT.c, WriteWindow.c: implemented a function to check whether a string is correctly UTF8 encoded and contains at least one UTF8 character. Based on this check the charset of text attachments is forced to either UTF8 or ISO-8859-1 instead of the configured write mail charset. This refs #600. Please note that YAM does NO reencoding, it just gives the receiver a hint how to handle the attached file.
Owner

jens-maus commented Apr 26, 2016

Originally by JDuch@fulladsl.be on 2015-05-19 18:17:57 +0200


Replying to tboeckel:

In (053ca93):

> * YAM_UT.c, WriteWindow.c: implemented a function to check whether a string is correctly UTF8 encoded and contains at least one UTF8 character. Based on this check the charset of text attachments is forced to either UTF8 or ISO-8859-1 instead of the configured write mail charset. This refs #600. Please note that YAM does NO reencoding, it just gives the receiver a hint how to handle the attached file.
@jens-maus

This comment has been minimized.

Show comment
Hide comment
@jens-maus

jens-maus Apr 26, 2016

Owner

Originally by JDuch@fulladsl.be on 2015-05-19 18:29:26 +0200


I think i understand now
You can define different charsets for GUI and for writing mails. Usually the GUI charset matches your system charset and YAM will warn you otherwise. The charset for writing mails also defaults to the system charset, but this one can be adjusted without any restrictions.

  • My system is set to ISO-8859-15
  • Yam's GUI is set to ISO-8859-15
  • i was not aware of this but YAM's write charset was ISO-8859-15

I changet it to ISO-8859-15 too
i'll report on results
thanks for help

Owner

jens-maus commented Apr 26, 2016

Originally by JDuch@fulladsl.be on 2015-05-19 18:29:26 +0200


I think i understand now
You can define different charsets for GUI and for writing mails. Usually the GUI charset matches your system charset and YAM will warn you otherwise. The charset for writing mails also defaults to the system charset, but this one can be adjusted without any restrictions.

  • My system is set to ISO-8859-15
  • Yam's GUI is set to ISO-8859-15
  • i was not aware of this but YAM's write charset was ISO-8859-15

I changet it to ISO-8859-15 too
i'll report on results
thanks for help

@jens-maus

This comment has been minimized.

Show comment
Hide comment
@jens-maus

jens-maus Apr 26, 2016

Owner

Originally by JDuch@fulladsl.be on 2015-05-19 20:01:52 +0200


All i can tell is that now the attaced files do show the new YAM/write USO-8859-15 charset
When saved & seen in viewer, editor or Browser the result is unchange
I guess the ticket #600 wil take care ofit
----=_BOUNDARY.5e3e94806051e251.e2
Content-Type: text/plain; charset=ISO-8859-15;
name="nbsp_test.txt"
Content-Disposition: attachment;
filename="nbsp_test.txt";
size=17
Content-Transfer-Encoding: quoted-printable

start=A0=A0=A0=A0=A0=A0=A0=A0stop

Owner

jens-maus commented Apr 26, 2016

Originally by JDuch@fulladsl.be on 2015-05-19 20:01:52 +0200


All i can tell is that now the attaced files do show the new YAM/write USO-8859-15 charset
When saved & seen in viewer, editor or Browser the result is unchange
I guess the ticket #600 wil take care ofit
----=_BOUNDARY.5e3e94806051e251.e2
Content-Type: text/plain; charset=ISO-8859-15;
name="nbsp_test.txt"
Content-Disposition: attachment;
filename="nbsp_test.txt";
size=17
Content-Transfer-Encoding: quoted-printable

start=A0=A0=A0=A0=A0=A0=A0=A0stop

@tboeckel

This comment has been minimized.

Show comment
Hide comment
@tboeckel

tboeckel Apr 26, 2016

Collaborator

Originally on 2015-05-21 07:28:17 +0200


I really have to ask again: do we still have a problem here or is this issue invalid? You never provided your original files, but just the received version of my files. Reading to you latest answer makes me think that the issue is solved. It would be nice if you would either confirm this or provide all necessary information to let me reproduce the problems you are facing.

Collaborator

tboeckel commented Apr 26, 2016

Originally on 2015-05-21 07:28:17 +0200


I really have to ask again: do we still have a problem here or is this issue invalid? You never provided your original files, but just the received version of my files. Reading to you latest answer makes me think that the issue is solved. It would be nice if you would either confirm this or provide all necessary information to let me reproduce the problems you are facing.

@jens-maus

This comment has been minimized.

Show comment
Hide comment
@jens-maus

jens-maus Apr 26, 2016

Owner

Originally by JDuch@fulladsl.be on 2015-05-21 15:13:47 +0200


We really still have the same problem. Even after synchronising all char settings to ISO-8859-15 i reported : "same result"
I tested using your original files & of course supposed only how YAM returned & saved the returned file was of interest to you.

In fact the test is still simpler:

    1. add the original file(s) to a Yam message as an attachment
    1. hit the "send later" button
    1. reopen the mail and save the atachments: they are visually different as well in a texteditor, viewer, or for a .html file in a browser

In my last reaction, i interpreted this text

 YAM_UT.c, WriteWindow.c: implemented a function to check whether a string is correctly UTF8 encoded and contains at least one UTF8 character. Based on this check the charset of text attachments is forced to either UTF8 or ISO-8859-1 instead of the configured write mail charset. This refs #600. Please note that YAM does NO reencoding, it just gives the receiver a hint how to handle the attached file.

in the following way:
i concluded that you did identify the origin of the problem, namely that presently " the configured write mail charset" is not respected (ISO8859-1 instead of ISO--8859-15)

& i expressed the hope that you (ticket #600) would fix it.

Of course i can not be sure to be complete about what you need to reproduce the problem, as far as i am concerned, i don't know how to avoid it in these circumstances:

  • Sam 460ex
  • OS4.1 Update 6
  • latest YAM2.10 dev -debug
  • charset system, YAM-GUI, YAM-mail all ISO-8859-15

In fact i just changed all these to ISO-8859-1 & redid the test describe in this last post1)2)3): same result:

Visually the files are changed& contain the line "start        stop"

I am very puzzled if i would be the only one experiencing this.

Action ?

Owner

jens-maus commented Apr 26, 2016

Originally by JDuch@fulladsl.be on 2015-05-21 15:13:47 +0200


We really still have the same problem. Even after synchronising all char settings to ISO-8859-15 i reported : "same result"
I tested using your original files & of course supposed only how YAM returned & saved the returned file was of interest to you.

In fact the test is still simpler:

    1. add the original file(s) to a Yam message as an attachment
    1. hit the "send later" button
    1. reopen the mail and save the atachments: they are visually different as well in a texteditor, viewer, or for a .html file in a browser

In my last reaction, i interpreted this text

 YAM_UT.c, WriteWindow.c: implemented a function to check whether a string is correctly UTF8 encoded and contains at least one UTF8 character. Based on this check the charset of text attachments is forced to either UTF8 or ISO-8859-1 instead of the configured write mail charset. This refs #600. Please note that YAM does NO reencoding, it just gives the receiver a hint how to handle the attached file.

in the following way:
i concluded that you did identify the origin of the problem, namely that presently " the configured write mail charset" is not respected (ISO8859-1 instead of ISO--8859-15)

& i expressed the hope that you (ticket #600) would fix it.

Of course i can not be sure to be complete about what you need to reproduce the problem, as far as i am concerned, i don't know how to avoid it in these circumstances:

  • Sam 460ex
  • OS4.1 Update 6
  • latest YAM2.10 dev -debug
  • charset system, YAM-GUI, YAM-mail all ISO-8859-15

In fact i just changed all these to ISO-8859-1 & redid the test describe in this last post1)2)3): same result:

Visually the files are changed& contain the line "start        stop"

I am very puzzled if i would be the only one experiencing this.

Action ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment