Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCTYPE declaration cannot be parsed #13

Closed
misitoth opened this issue Mar 6, 2017 · 14 comments
Closed

DOCTYPE declaration cannot be parsed #13

misitoth opened this issue Mar 6, 2017 · 14 comments

Comments

@misitoth
Copy link

misitoth commented Mar 6, 2017

I want to convert standard MLP (Mobile Positioning Protocol) xml to JSON and back.
Here is the first 3 line:

%extension;]>

<svc_init ver="3.0.0">

After it is converted to JSON:
{
"_declaration": {
"_attributes": {
"version": "1.0",
"encoding": "UTF-8"
}
},
"svc_init": {
"_attributes": {
"ver": "3.0.0"
}

convert.xml2json(xml, { compact: true, spaces: 6, ignoreCdata: false });
I lost !DOCTYPE.

Can you add ignoreDoctype to the code?

@nashwaan
Copy link
Owner

nashwaan commented Mar 6, 2017

Sorry, I don't understand the issue here properly.

The first line %extension;]> in your example is not a proper xml format.

Also, do you need ignoreDoctype to keep the <!DOCTYPE>? What do you expect the output json to look like?

@misitoth
Copy link
Author

!DOCTYPE is valid XML tag, but during the convert.xml2json call parser doesn't parse it and after the conversation we lost the information from that tag. Can you keep !DOCTYPE field in JSON and after json2xml? I propose ingoreDoctype in the options and parser can convert !DOCTYPE like ![CDATA].

Example xml was't visible in my first comment.
'

'

So the second line is doctype of MLP(Mobile Location Protocol), I would like to keep in the conversation.

My proposal in JSON (compact mode) is following:
{ "_doctype" : "scv_init SYSTEM MLP_SVC_INIT_300.DTD"}

@misitoth
Copy link
Author

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE svc_init SYSTEM "MLP_SVC_INIT_300.DTD" [<!ENTITY % extension SYSTEM "PIF_EXTENSION_100.DTD">%extension;]>

@nashwaan
Copy link
Owner

Ok now I see the issue. Thanks for providing clear example.

I am working on adding support for !DOCTYPE. Currently, I get this output (not published):

{"_doctype": " svc_init SYSTEM \"MLP_SVC_INIT_300.DTD\" [<!ENTITY % extension SYSTEM \"PIF_EXTENSION_100.DTD\">%extension;]"
}

As you can see, ENTITY is not parsed.
I prefer to produce more meaningful result. Something like:

{"_doctype": {
    "name_": "svc_init SYSTEM \"MLP_SVC_INIT_300.DTD\"",
     "extension" : "SYSTEM \"PIF_EXTENSION_100.DTD\""
    }
}

It seems the content of DOCTYPE is DTD which I am not very familiar with. I need to find time to study it in order to generate meaningful output.

@DenisCarriere
Copy link
Contributor

👍 +1 This issue, I've ran into the same problem.

@nashwaan your proposal by adding _doctype would solve this, except an easier solution would be to extend it using the _comment property since it's the closest one to an inner text without adding any new properties to xml-js.

DOCTYPE (proposed)

Here's my XML example I would like to convert from JSON to XML.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note SYSTEM "Note.dtd">
const json = {
  _declaration: {_attributes: { version: '1.0', encoding: 'utf-8' }},
  _doctype: {
    _comment: 'note SYSTEM "Note.dtd"'
  },
}
const xml = convert.js2xml(json, { compact: true })

DOCTYPE + ENTITY (proposed)

There is also !ENTITY in !DOCTYPE which is described here: https://www.w3schools.com/xml/xml_dtd.asp. Might need to include an extra params (_entity) to support that as well. Entities are typically (always?) used in a list.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note [
<!ENTITY nbsp "&#xA0;">
<!ENTITY writer "Writer: Donald Duck.">
<!ENTITY copyright "Copyright: W3Schools.">
]>
const json = {
  _declaration: {_attributes: { version: '1.0', encoding: 'utf-8' }},
  _doctype: {
    _comment: 'note',
    _entity: [
      {_comment: 'nbsp "&#xA0;"'},
      {_comment: 'writer "Writer: Donald Duck."'},
      {_comment: 'copyright "Copyright: W3Schools."'},
    ]
  },
}
const xml = convert.js2xml(json, { compact: true })

CC: @misitoth @nashwaan Any thoughts?

@nashwaan
Copy link
Owner

@DenisCarriere , Thank you for your valuable feedback.

I don't think using _comment for any DTD content is the best thing to do, because _comment should be used for XML comments only (i.e <!-- some comment -->). Also, comments can exists in DOCTYPE as well.

I still haven't studied DTD, but my impression is it requires a lot of parsing because of the scenarios it can support. Also, again from my shallow understanding, I think DTD influence how to parse rest of XML contents as it contains some instructions and variables (right?).

What I am planning to do is to put the content of DTD in _doctype like I showed before:

{"_doctype": " svc_init SYSTEM \"MLP_SVC_INIT_300.DTD\" [<!ENTITY % extension SYSTEM \"PIF_EXTENSION_100.DTD\">%extension;]"
}

And if any needs further processing, then he can use DTD parser like https://github.com/calibr/dtd-file.

Or if you are satisfied with this parser or can recommend me a good parser then I can integrate it with this xml-js library.

@DenisCarriere
Copy link
Contributor

@nashwaan Awesome stuff, agreed, maybe the _comment is out of place and trying to solve the DTD implementation might be too complex for what it's worth.

👍 I like the idea of having a simple _doctype property, I think that solves 99% of the issue.

The JS example would be even easier:

const json = {
  _declaration: {_attributes: { version: '1.0', encoding: 'utf-8' }},
  _doctype: 'note SYSTEM "Note.dtd"'
}
const xml = convert.js2xml(json, { compact: true })

Equals

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note SYSTEM "Note.dtd">

@nashwaan
Copy link
Owner

I have published v1.1.0 to support _doctype.

Please let me know whether this is good or not.

@DenisCarriere
Copy link
Contributor

Woot 🎉 Looks like only 1.0.2 is published.

https://www.npmjs.com/package/xml-js

@nashwaan
Copy link
Owner

Ooops.
I have published it now.

@DenisCarriere
Copy link
Contributor

Woohoo! Works like a charm.

  const json = {
    _declaration: {_attributes: { version: '1.0', encoding: 'utf-8' }},
    ServiceExceptionReport: {
      _attributes: {version: '1.1.1'},
      _doctype: ' ServiceExceptionReport SYSTEM "http://schemas.opengis.net/wms/1.1.1/exception_1_1_1.dtd"',
      ServiceException: {
        _text: message
      }
    }
  }
  const xml = convert.js2xml(json, { compact: true, spaces: spaces })

Results

<?xml version="1.0" encoding="utf-8"?>
<ServiceExceptionReport version="1.1.1">
  <!DOCTYPE ServiceExceptionReport SYSTEM "http://schemas.opengis.net/wms/1.1.1/exception_1_1_1.dtd">
  <ServiceException>foo</ServiceException>
</ServiceExceptionReport>

Only odd thing I've noticed is you need to include a blank space in your text field, if not it immediately places the text right after <!DOCTYPE (don't know if that was intentional or not).

6a6090e#commitcomment-21926878

@nashwaan
Copy link
Owner

@DenisCarriere Have implemented your suggestion and improved other things (see release notes). These are now published in v1.2.0.

@DenisCarriere
Copy link
Contributor

+1 awesome stuff

@nashwaan
Copy link
Owner

nashwaan commented May 4, 2017

@misitoth What do you think of the new approach published in the latest version?

Do you have any comment or feedback before I close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants