Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoted-printable values ignore charset (always UTF-8) #10

Closed
GoogleCodeExporter opened this issue Mar 21, 2015 · 9 comments
Closed

Quoted-printable values ignore charset (always UTF-8) #10

GoogleCodeExporter opened this issue Mar 21, 2015 · 9 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. Set a Note-value to something with newlines and special chars
2. Write vCard with VCardVersion.V2_1


What is the expected output?
The complete result is in ISO 8859-1 including the quoted-printable parts after 
they are decoded

What is the actual output?
The result is in ISO 8859-1 except for the quoted-printable parts which after 
decoding turn out to be in UTF-8 

What version of ez-vcard are you using?
0.9.0

What version of Java are you using?
1.6

Please provide any additional information below.
This also happens if I explicitly set the charset of the properties to ISO 
8859-1

Original issue reported on code.google.com by tom_vo...@gmx.de on 27 Nov 2013 at 3:03

@GoogleCodeExporter
Copy link
Author

Hello,

Thanks for your input.  I'm not sure if this is a bug though.  It makes sense 
that you could get a UTF-8 string after decoding a quoted-printable string.  
The purpose of quoted-printable is to encode characters which cannot be encoded 
in the current character set.

Original comment by mike.angstadt on 4 Dec 2013 at 3:20

@GoogleCodeExporter
Copy link
Author

Hi, 

thanks for the reply. 
But if I set the charset on the type to iso 8859-1 then I would expect the 
resulting string after decoding to be of charset iso 8859-1 and not utf-8.

Original comment by tom_vo...@gmx.de on 4 Dec 2013 at 3:46

@GoogleCodeExporter
Copy link
Author

What is the exact string you are using in the CHARSET parameter value?  It 
looks like there must be a hyphen between "ISO" and "8859-1", instead of a 
space.  If there is a space, Java will not recognize the charset, which causes 
ez-vcard to decode it using UTF-8.

Original comment by mike.angstadt on 4 Dec 2013 at 4:07

@GoogleCodeExporter
Copy link
Author

"ISO-8859-1"

Original comment by tom_vo...@gmx.de on 4 Dec 2013 at 4:24

@GoogleCodeExporter
Copy link
Author

Can you check to see if there are any parser warnings?  ez-vcard will add a 
parser warning if there is problem decoding a quoted-printable value.

To do that with the VCardReader class, call the getWarnings() method.  To do 
that with the Ezvcard class, pass an empty list into the "warnings()" method, 
then print the list after parsing the vCard.

Original comment by mike.angstadt on 4 Dec 2013 at 4:30

@GoogleCodeExporter
Copy link
Author

No warnings concerning quoted-printable.
Here's what I do:

  Note noteType = new Note(person.getComment());
  noteType.getParameters().setCharset(charset);
  vcard.addNote(noteType);
  ...
  StringWriter writer = new StringWriter();
  VCardWriter vCardWriter = new VCardWriter(writer, VCardVersion.V2_1, null, "\r\n");
  log.debug(vcard.validate(VCardVersion.V2_1));
  Ezvcard.write(vcard).version(VCardVersion.V2_1).go(writer);

Note contains:
"test
äöüß
test"

Result is:
NOTE;CHARSET=ISO-8859-1;ENCODING=quoted-printable:test=0A=C3=A4=C3=B6=C3=BC=
 =C3=9F=0Atest

Decoded:
testäöü �test

:(

Original comment by tom_vo...@gmx.de on 9 Dec 2013 at 5:24

@GoogleCodeExporter
Copy link
Author

Ah, I see.  Ok, fixed it.  Thanks :D 

Original comment by mike.angstadt on 12 Dec 2013 at 3:49

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

I thought about this some more.  My first solution didn't solve the root of the 
problem, which is that the character encoding of the ***Writer*** object should 
be used by default when encoding a quoted-printable value.  You shouldn't need 
to manually set the CHARSET parameter.

The fix I've just committed will use the Writer object's character encoding if 
no CHARSET parameter is provided.  If it can't determine the Writer's character 
encoding, it will use your system's default character encoding.  If a CHARSET 
parameter is set, then it will use that character encoding instead of the 
Writer's.

Attached is the patched JAR.

Original comment by mike.angstadt on 13 Dec 2013 at 5:29

Attachments:

@GoogleCodeExporter
Copy link
Author

Hi,

great, thanks!

Original comment by tom_vo...@gmx.de on 16 Dec 2013 at 9:06

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant