Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editing an element and generating XML files using the {attribute_quote: :quote} wrongly escapes quotes in attributes #92

Closed
edouard opened this issue Dec 2, 2022 · 7 comments

Comments

@edouard
Copy link

edouard commented Dec 2, 2022

require 'rexml'

xml = <<~XML
<?xml version="1.0" encoding="UTF-8"?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.2" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd">
  <file original="test.plist" source-language="en" datatype="plaintext" target-language="en">
    <header>
      <tool tool-id="com.apple.dt.xcode" tool-name="Xcode" tool-version="8.3.3" build-num="8E3004b" />
    </header>
    <body>
      <trans-unit id="test">
        <source>test</source>
        <target></target>
      </trans-unit>
      <trans-unit id="We're happy to see you">
        <source>We're happy to see you</source>
        <target></target>
      </trans-unit>
    </body>
  </file>
</xliff>
XML

@doc = REXML::Document.new(xml)
REXML::XPath.first(@doc, '//trans-unit').attributes['id'] = "I'm here"
puts @doc.to_s
<?xml version='1.0' encoding='UTF-8'?>
<xliff xsi:schemaLocation='urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd' version='1.2' xmlns='urn:oasis:names:tc:xliff:document:1.2' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
  <file datatype='plaintext' original='test.plist' source-language='en' target-language='en'>
    <header>
      <tool build-num='8E3004b' tool-id='com.apple.dt.xcode' tool-name='Xcode' tool-version='8.3.3'/>
    </header>
    <body>
      <trans-unit id='I&apos;m here'>
        <source>test</source>
        <target/>
      </trans-unit>
      <trans-unit id='We&apos;re happy to see you'>
        <source>We're happy to see you</source>
        <target/>
      </trans-unit>
    </body>
  </file>
</xliff>

All good! ✅

When using {attribute_quote: :quote} to generate files with double quoted attributes, not editing anything there:

@doc = REXML::Document.new(xml, {attribute_quote: :quote})
puts @doc.to_s
<?xml version='1.0' encoding='UTF-8'?>
<xliff xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd" version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <file datatype="plaintext" original="test.plist" source-language="en" target-language="en">
    <header>
      <tool build-num="8E3004b" tool-id="com.apple.dt.xcode" tool-name="Xcode" tool-version="8.3.3"/>
    </header>
    <body>
      <trans-unit id="test">
        <source>test</source>
        <target/>
      </trans-unit>
      <trans-unit id="We're happy to see you">
        <source>We're happy to see you</source>
        <target/>
      </trans-unit>
    </body>
  </file>
</xliff>

All good too! ✅

But if I edit a trans-unit:

@doc = REXML::Document.new(xml, {attribute_quote: :quote})
REXML::XPath.first(@doc, '//trans-unit').attributes['id'] = "I'm here"
puts @doc.to_s
<?xml version='1.0' encoding='UTF-8'?>
<xliff xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd" version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <file datatype="plaintext" original="test.plist" source-language="en" target-language="en">
    <header>
      <tool build-num="8E3004b" tool-id="com.apple.dt.xcode" tool-name="Xcode" tool-version="8.3.3"/>
    </header>
    <body>
      <trans-unit id="I&apos;m here">
        <source>test</source>
        <target/>
      </trans-unit>
      <trans-unit id="We're happy to see you">
        <source>We're happy to see you</source>
        <target/>
      </trans-unit>
    </body>
  </file>
</xliff>

(Note the <trans-unit id="I&apos;m here">). It should be <trans-unit id="I'm here">. The element we edited has its quote incorrectly HTML-escaped, while the one we didn't edit is correctly unescaped.

It looks like editing an element makes it loose its context[:attribute_quote:].

@kou
Copy link
Member

kou commented Dec 2, 2022

@doc = REXML::Document.new(xml, {attribute_quote: :quote})
REXML::XPath.first(@doc, '//trans-unit').attributes['id'] = REXML::Attribute.new("id", "I'm here")
puts @doc.to_s

@edouard
Copy link
Author

edouard commented Dec 5, 2022

Hey @kou Thank you for your answer. Sorry about this, I think I got confused between #91 and #92.

Could you try this? I find the result unexpected.

require 'rexml'

xml = <<~XML
<?xml version="1.0" encoding="UTF-8"?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.2" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd">
  <file original="test.plist" source-language="en" datatype="plaintext" target-language="en">
    <header>
      <tool tool-id="com.apple.dt.xcode" tool-name="Xcode" tool-version="8.3.3" build-num="8E3004b" />
    </header>
    <body>
      <trans-unit id="I'm here">
        <source>I'm here</source>
        <target></target>
      </trans-unit>
      <trans-unit id="We're happy to see you">
        <source>We're happy to see you</source>
        <target></target>
      </trans-unit>
    </body>
  </file>
</xliff>
XML

file = 'test.plist'
id = "We're happy to see you"
@doc = REXML::Document.new(xml, {attribute_quote: :quote})
REXML::XPath.first(@doc, "//file[@original='#{file}']/body/trans-unit[@id=\"#{id}\"]").attributes['id'] = REXML::Attribute.new("id", "Hello")
puts @doc.to_s
<?xml version='1.0' encoding='UTF-8'?>
<xliff xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd" version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <file datatype="plaintext" original="test.plist" source-language="en" target-language="en">
    <header>
      <tool build-num="8E3004b" tool-id="com.apple.dt.xcode" tool-name="Xcode" tool-version="8.3.3"/>
    </header>
    <body>
      <trans-unit id="I&apos;m here">
        <source>I'm here</source>
        <target/>
      </trans-unit>
      <trans-unit id="Hello">
        <source>We're happy to see you</source>
        <target/>
      </trans-unit>
    </body>
  </file>
</xliff>

@edouard
Copy link
Author

edouard commented Dec 5, 2022

More weirdness:

require 'rexml'

xml = <<~XML
<?xml version="1.0" encoding="UTF-8"?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.2" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd">
  <file original="test.plist" source-language="en" datatype="plaintext" target-language="en">
    <header>
      <tool tool-id="com.apple.dt.xcode" tool-name="Xcode" tool-version="8.3.3" build-num="8E3004b" />
    </header>
    <body>
      <trans-unit id="I'm here">
        <source>I'm here</source>
        <target></target>
      </trans-unit>
      <trans-unit id="We're happy to see you">
        <source>We're happy to see you</source>
        <target></target>
      </trans-unit>
    </body>
  </file>
</xliff>
XML

file = 'test.plist'
id = "We're happy to see you"
@doc = REXML::Document.new(xml, {attribute_quote: :quote})
REXML::XPath.first(@doc, "//file[@original='#{file}']/body/trans-unit[@id=\"#{id}\"]").attributes['id'] = REXML::Attribute.new("id", "I'm not here")
puts @doc.to_s

id = "I'm not here"
REXML::XPath.first(@doc, "//file[@original='#{file}']/body/trans-unit[@id=\"#{id}\"]/source").text = REXML::Text.new("I'm not here")
puts @doc.to_s
<?xml version='1.0' encoding='UTF-8'?>
<xliff xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd" version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <file datatype="plaintext" original="test.plist" source-language="en" target-language="en">
    <header>
      <tool build-num="8E3004b" tool-id="com.apple.dt.xcode" tool-name="Xcode" tool-version="8.3.3"/>
    </header>
    <body>
      <trans-unit id="I&apos;m here">
        <source>I'm here</source>
        <target/>
      </trans-unit>
      <trans-unit id="I'm not here">
        <source>We're happy to see you</source>
        <target/>
      </trans-unit>
    </body>
  </file>
</xliff>
<?xml version='1.0' encoding='UTF-8'?>
<xliff xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd" version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <file datatype="plaintext" original="test.plist" source-language="en" target-language="en">
    <header>
      <tool build-num="8E3004b" tool-id="com.apple.dt.xcode" tool-name="Xcode" tool-version="8.3.3"/>
    </header>
    <body>
      <trans-unit id="I&apos;m here">
        <source>I'm here</source>
        <target/>
      </trans-unit>
      <trans-unit id="I&apos;m not here">
        <source>I&apos;m not here</source>
        <target/>
      </trans-unit>
    </body>
  </file>
</xliff>

@kou
Copy link
Member

kou commented Dec 8, 2022

Ah, it was caused by internal normalization(' -> &apos;)/unnormalization(&apos; -> ') that was happen by [@id=\"#{id}\"] XPath. REXML caches one of the normalized data and unnormalized data. If not cached one is requested, it's calculated from another one. It causes ' -> &apos; conversion.

So it's the current REXML spec.

BTW, what is your real problem? <trans-unit id="I&apos;m not here"> and <trans-unit id="I'm not here"> is the same in XML context.

@edouard
Copy link
Author

edouard commented Dec 8, 2022

First of all, thank you very much for your time looking into it.

Yes, it looks like it's due to the normalization when fetching the element via XPath is causing this issue. But unfortunately using [@id=\"#{id}\"] is the only way to find an element containing an HTML escaped quote, as demonstrated in #91.

It sounds like a bug to me when using {attribute_quote: :quote} because I haven’t encountered this issue when not using this option.

You are correct that <trans-unit id="I&apos;m not here"> and <trans-unit id="I'm not here"> are the same in XML context, so it shouldn’t matter too much actually. It's probably just my use-case.

The real problem is more file cosmetics in this case: I parse XML translation files that may or may not have some escaped HTML in them to add translations to their elements. But I have to leave their structure untouched, leave what was HTML escaped as it is. Files are translated on our platform and once they are translated, we add the translations to these files.

I know my customers won't like when they upload an original file containing <trans-unit id="I'm not here"> and receive a translated file containing <trans-unit id="I&apos;m not here">. It doesn't feel like a consistent behaviour and creates large changesets unnecessarily for them when they commit that XML file on their version control.

@kou kou closed this as completed in 20070d0 Dec 8, 2022
@kou
Copy link
Member

kou commented Dec 8, 2022

OK.
I've changed the current behavior. I hope that this doesn't break backward compatibility...

For the last script, you need to use REXML::Text.new("I'm not here", false, nil, true). The 4th argument (raw=true) is important. Note that the first argument must be a valid XML text context when you specify true for the 4th argument. It's your responsibility not REXML.

@edouard
Copy link
Author

edouard commented Dec 9, 2022

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants