Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change xml parser to xml-js #12

Merged
merged 8 commits into from
Mar 14, 2018
Merged

Conversation

probertson
Copy link
Contributor

Just to get this out of the way at the start: this probably seems like a drastic change, and I will freely admit that it does to me.

I'm working on a separate pull request to add support for <ph> tags in XLIFF 1.2. (This is very similar to #6 and could easily be extended to support other tags such as the ones specified there.) However, one issue I have run into is that the XML parsing library that's currently used in this project (xml2js) doesn't support mixed text content and tags properly. Specifically it doesn't preserve the order of the tag and text children -- it just groups all text together and all tags together.

This means that if an XLIFF file has a source value like this:

Hello there <ph>%(name)</ph>, how are you today?

using xml2js we can determine that there is a <ph> child element, but not where within the string it belongs.

I found another XML parsing library, xml-js, that does properly handle child element order. (The library's README includes a brief explanation of the problem.)


function js2xliff(obj, opt, cb) {

if (typeof opt === 'function') {
cb = opt;
opt = { pretty: true, indent: ' ', newline: '\n' };
opt = { indent: ' ' };
Copy link
Contributor Author

@probertson probertson Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only api difference that is caused by switching XML parsing libraries. xml2js supports the pretty, indent, and newline options, but xml-js only exposes spaces, which is functionally like indent and pretty combined. If spaces is 0 then that's the same as pretty: false. Otherwise spaces can be a number (the number of spaces) or actual characters (as shown here, using two space characters).

@adrai
Copy link
Contributor

adrai commented Mar 12, 2018

Can you try to make it compatible to node 4? i.e. replace let with var here: https://github.com/locize/xliff/pull/12/files#diff-cbaa01ed82bcf0de1d7640c93a7bb826R14

@adrai
Copy link
Contributor

adrai commented Mar 13, 2018

...and use a fixed dependency....?

@adrai
Copy link
Contributor

adrai commented Mar 13, 2018

...and it should work in the browser too: nashwaan/xml-js#54

@probertson
Copy link
Contributor Author

@adrai Thanks for the quick feedback. The node 4 compatibility and fixed dependency should be easy. I'll have to look into the "work in the browser" issue -- that looks like it may be beyond me ☹️

@probertson
Copy link
Contributor Author

@adrai This isn't a solution to the browser issue, but just an observation. As I started looking into potential alternatives I noticed that xml2js (the current XML parsing library) also depends on https://github.com/isaacs/sax-js/ -- so it (probably) has the same issue that xml-js has.

I'm continuing to look into this more though to see what I can figure out.

@adrai
Copy link
Contributor

adrai commented Mar 13, 2018

ok, sounds good

@adrai
Copy link
Contributor

adrai commented Mar 13, 2018

As soon as you have more infos or think it's ok like that, I will merge ;-)

@probertson
Copy link
Contributor Author

@adrai Thanks. I've been digging around in the xml-js source and in the sax-js source but I can't see an easy way to resolve that issue. I also haven't found any other XML parsing libraries that handle the mixed element ordering issue other than xml-js. So I'm pretty confident there's no quick workaround. At this point I think that the best course of action is to let the xml-js or sax-js maintainers fix the browser support issue. 😐

If you're okay merging this anyway then that would be very helpful for me!

@adrai
Copy link
Contributor

adrai commented Mar 13, 2018

Ok, will try merge this tomorrow.

@adrai adrai merged commit d6b2946 into locize:master Mar 14, 2018
@adrai
Copy link
Contributor

adrai commented Mar 14, 2018

v3.0.0

@probertson probertson deleted the change-xml-parser branch March 15, 2018 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants