Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode utf-8 encoding causing issue for xml parser #40

Closed
drchanimal opened this issue Apr 10, 2013 · 11 comments
Closed

unicode utf-8 encoding causing issue for xml parser #40

drchanimal opened this issue Apr 10, 2013 · 11 comments

Comments

@drchanimal
Copy link

Hi,
I am trying to chase down this issue and it comes down to Builder. Rails ActiveResource uses ActiveSupport's to_xml method which in return call Builder to generate the xml. Builder will generate the xml with encoding set to UTF-8 and showing the unicode without escaping to ascii entity. However, this is cause the receiving end xml parser to fail. The reason I believe is that if there are unicode in the xml, the encoding must be UTF-16. See: http://www.w3schools.com/xml/xml_encoding.asp . The only time it works is when encoding set to UTF-16. Thought?

@pablito
Copy link

pablito commented May 3, 2013

@drchainmail,
"UTF-8 is a variable-width encoding that can represent every character in the Unicode character set. It was designed for backward compatibility with ASCII and to avoid the complications of endianness and byte order marks in UTF-16 and UTF-32." [http://en.wikipedia.org/wiki/UTF-8], so you can (and should, as hinted by the default value of xml encoding, which is, guess, utf-8) use utf-8 to represent Unicode characters.
If your xml parser is failing, it is not receiving an utf-8 encoded stream, do you process the resulting string somehow before parsing it?

@jimweirich
Copy link
Owner

utf-8 is should be fine for any XML parser, as long as the string is actually encoded as utf-8. Can you provide a simple example where builder is not generating proper utf-8 output (given utf-8 input)?

@StandardNerd
Copy link

I have a similar issue. In the XML-File all special characters are converted but i need it unchanged for further processing.

$KCODE = 'UTF8'
xml = Builder::Markup.new
xml.instruct!(:xml, :encoding => "UTF-8")
xml.sample("Iñtërnâtiônàl")
xml.target!  

But when i try this out i get a XML-File with:

I#241;t#235;rn#226;ti#244;n#224;l (i removed "&" here otherwise the ascii character conversion is not visible here) instead of Iñtërnâtiônàl

@jimweirich
Copy link
Owner

I can't reproduce this.

# -*- coding: utf-8 -*-
require 'builder'

$KCODE = 'UTF8'
xml = Builder::XmlMarkup.new
xml.instruct!(:xml, :encoding => "UTF-8")
xml.sample("Iñtërnâtiônàl")
puts xml.target!

Gives:

$ ruby -v xxx.rb 
ruby 2.0.0p195 (2013-05-14 revision 40734) [x86_64-darwin12.3.0]
xxx.rb:4: warning: variable $KCODE is no longer effective; ignored
<?xml version="1.0" encoding="UTF-8"?><sample>Iñtërnâtiônàl</sample>
$ 

What version of Ruby are you using? 2.0 reports that $KCODE is ignored.

@StandardNerd
Copy link

I use Ruby 1.8.7

@jimweirich
Copy link
Owner

Even with 1.8.7 I get:

$ ruby -v -rubygems xxx.rb 
ruby 1.8.7 (2012-06-29 patchlevel 370) [i686-darwin12.2.0]
<?xml version="1.0" encoding="UTF-8"?><sample>Iñtërnâtiônàl</sample>
$ 

Any other environmental issues that may effect this (OS, etc)?

@StandardNerd
Copy link

We use Windows Server and MS IIS as Webserver, Ruby 1.8.7, Rails 3.0.20.

Please take a look at my post at stackoverflow:
http://stackoverflow.com/questions/17067241/ruby-1-8-7-file-encoding-render-to-string-ansi-instead-of-utf-8

I added a few screenshots, maybe the problem is the file-encoding of the generated xml-file.

https://plus.google.com/photos/101165525768874358940/albums/5889322171920232337?authkey=CKaPqczE7tGdag

@jimweirich
Copy link
Owner

You are using the latest version of builder, right? (3.2.2)

@StandardNerd
Copy link

ok, actually not, but i updated the builder gem to 3.2.2 and tested it again and no changes. The special chars are still converted into ascii

@StandardNerd
Copy link

Wait, i got the following output

C:\Appl_Ruby>gem dependency --reverse-dependencies builder
Gem builder-2.1.2
Used by
actionpack-3.0.11 (builder (> 2.1.2))
activemodel-3.0.11 (builder (
> 2.1.2))

Gem builder-3.2.2

@jimweirich
Copy link
Owner

With version 2.1.2 I get the same output as you. It's a version issue. According to the CHANGELOG, you need at least version 2.2.0 to get the behavior you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants