Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XMLGenerated xml looks ugly #1327

Closed
Andrej730 opened this issue Jun 30, 2015 · 10 comments · Fixed by #2456
Closed

XMLGenerated xml looks ugly #1327

Andrej730 opened this issue Jun 30, 2015 · 10 comments · Fixed by #2456

Comments

@Andrej730
Copy link

Whenever i trying to get output items in xml i get something like:

<?xml version="1.0" encoding="utf8"?>
<news><item><name>0</name><content>00000</content></item><item><name>1</name><content>11111</content></item><item><name>2</name><content>22222</content></item></news>

Maybe scrapy have some already provided ways to "beautify" it to something like (like xmls i found in all docs examples):

<?xml version="1.0" encoding="UTF-8"?>
<news>
   <item>
      <name>0</name>
      <content>00000</content>
   </item>
   <item>
      <name>1</name>
      <content>11111</content>
   </item>
   <item>
      <name>2</name>
      <content>22222</content>
   </item>
</news>
@yiakwy
Copy link

yiakwy commented Jun 30, 2015

Use JQuery like tool, we call it PyQuery, and u can use Pq("name") to get a tags collection just like using jQuery in javascript runtime.

@Andrej730
Copy link
Author

What i realy was need - bring somehow exported xml to beauty and readable look. I thought that was generic case and maybe have some implementation in scrapy.
Anyway, i finded a solution for this problem - to beautify xml i used xml.dom.minidom parse() method to parse data from exported file to dom-object and then i saved results from toprettyxml() method of this object.

@kmike
Copy link
Member

kmike commented Jul 5, 2015

I think that'd be a nice option to generate human-readable XML, and we can make it default; PRs are welcome :)

@nramirezuy
Copy link
Contributor

Before you start creating PRs; you must take into account that generated XMLs can be big so creating a dom isn't an option. 😄

@barraponto
Copy link
Contributor

We use xml.sax.saxutils.XMLGenerator, but it seems like lxml.etree.tostring has a pretty_print keyword argument that indents properly. And since we already depend on lxml, maybe we can leverage that.

http://lxml.de/tutorial.html#the-element-class

@yiakwy
Copy link

yiakwy commented Jul 10, 2015

@Andrej730 the most effective way is not just converting the xml to dom model in memory but add a jquery wrapper upon it. That is why jQuery is here. By invoking jquery we can manipulate dom efficiently in innner memory. dom tree is parser is essence to analyze files of hirarchical tags tree. I recommand that you consider it seriously.

@yiakwy
Copy link

yiakwy commented Jul 10, 2015

@nramirezuy Another method is to create a javascript runtime and clicent codes to send task to js runtime to process it. Phantom or v8 engine will help on this topic.

@nramirezuy
Copy link
Contributor

@yiakwy I like your enthusiasm but using Phantom or v8 to generate XML is a little bit too much 😄

I found this XMLIndentGenerator

@barraponto
Copy link
Contributor

As mentioned, you can just use lxml. Here's a Feed Exporter exporting a tidy XML: https://gist.github.com/413fa084152d6845cc3d

@kmike
Copy link
Member

kmike commented Sep 1, 2015

@barraponto it'd be nice to have a solution which doesn't require building the whole DOM tree, as @nramirezuy suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants