Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for parsing JSON #15

Closed
pawelmhm opened this issue Oct 7, 2016 · 5 comments
Closed

add support for parsing JSON #15

pawelmhm opened this issue Oct 7, 2016 · 5 comments

Comments

@pawelmhm
Copy link
Member

pawelmhm commented Oct 7, 2016

Sometimes I'm dealing with deeply nested JSON and I'd like to parse it with xpaths. I know I can use JMESPATH but I dont like JMESPATH syntax, I'm used to xpaths. Also seems like some features of xpaths are not supported by Jmespath, e.g. I can't do recursive traversal, so I can't write //mynode/text() in Jmes I have to do root.foo.bar[].foo2.bar2.product.mynode.

It seems like JSON is currently not supported by js2xml, e.g. something like this:

import js2xml
import json

alfa = json.dumps({"aa": "bb"})
print(alfa)
parsed = js2xml.parse(alfa)
print(parsed.pretty_print())

fails.

Is there some way to add support for JSON parser to js2xml? This would probably require conversion from Python dictionary to xml, seems like there are packages that do this https://github.com/delfick/python-dict2xml so maybe we could learn something from them.

@Granitosaurus
Copy link

Granitosaurus commented Oct 7, 2016

Actually you can parse json with js2xml but it's really ugly and you need to wrap it with a variable assignment:

from lxml import etree
import js2xml

data = """{
    "one": {
        "two": [{
            "four": {
                "name": "four1_name"
            }
        }, {
            "four": {
                "name": "four2_name"
            }
        }]
    }
}"""
print(etree.tostring(js2xml.parse('var foo = ' + data), pretty_print=True))

Will give this:

<program>
  <var name="foo">
    <object>
      <property name="one">
        <object>
          <property name="two">
            <array>
              <object>
                <property name="four">
                  <object>
                    <property name="name">
                      <string>four1_name</string>
                    </property>
                  </object>
                </property>
              </object>
              <object>
                <property name="four">
                  <object>
                    <property name="name">
                      <string>four2_name</string>
                    </property>
                  </object>
                </property>
              </object>
            </array>
          </property>
        </object>
      </property>
    </object>
  </var>
</program>

which works but is really ugly because it's using property, object as node names instead of original names from the json. In otherwords it's parsing javascript from what clearly is a json.
As mentioned by Pawel there already a package called dict2xml that already does this really well and is pretty simple (200 loc):

from dict2xml import dict2xml
print(dict2xml(json.loads(data)))

result:

<one>
  <two>
    <four>
      <name>four1_name</name>
    </four>
  </two>
  <two>
    <four>
      <name>four2_name</name>
    </four>
  </two>
</one>

It seems to be quite simple and maybe we could adapt it into js2xml?

@Granitosaurus
Copy link

Granitosaurus commented Oct 7, 2016

There's also this: https://github.com/quandyfactory/dicttoxml which seems to be a bit more popular but it's essentially the same as dict2xml it just adds types to nodes.

from dicttoxml import dicttoxml
print(etree.tostring(etree.fromstring(dicttoxml(data)), pretty_print=True))
<root>
  <one type="dict">
    <two type="list">
      <item type="dict">
        <four type="dict">
          <name type="str">four1_name</name>
        </four>
      </item>
      <item type="dict">
        <four type="dict">
          <name type="str">four2_name</name>
        </four>
      </item>
    </two>
  </one>
</root>

Personally I don't like that it wraps every list element in <item> tag and it seems to have few more unnecessary quirks like that but in overal it might be a bit more robust than dict2xml. Both packages are worth lookin into for this imo.
Edit: actually the recent dicttoxml version allows to customize some of this stuff to get rid of wrapping and adding types to nodes.

@Granitosaurus
Copy link

Related Jmespath issue for non-rooted expressions: jmespath/jmespath.py#110

@redapple
Copy link
Contributor

redapple commented Oct 7, 2016

@pawelmhm , I think using js2xml to query data from JSON is out of scope of what this library tries to do.

js2xml is very handy when extracting strings, numbers, JavaScript objects and arrays from assignments and function arguments (when writing regexes for them is tedious)

"XPath for JSON" is kind of a different (yet interesting) use-case.
You mention JmesPath but there's also JSONPath and JSONiq.

As @Granitas mentions, the AST-like tree that js2xml outputs for a JSON dict is not that easy to work with. Which is why js2xml has methods to convert JavaScript objects to dicts.

I would personally leave the querying of data inside dicts out of js2xml.

As for why parsing a JSON object directly, without an assignment, does not work, it has to do with how slimit interprets snippets of code. May be worth fixing.

@redapple
Copy link
Contributor

redapple commented Aug 3, 2017

Closing this issue as traversing JSON using XPaths is not the purpose of js2xml.

@redapple redapple closed this as completed Aug 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants