Convert various blog dumps to a standard JSON
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
blog_to_json
.gitignore
README.rst
disqus-xml-to-json
requirements.txt
setup.py
wordpress-xml-to-json

README.rst

blog-to-json

This tool will convert various blog dumps to a standard JSON format.

For example:

  • Wordpress XML dumps to JSON
  • Disqus XML dumps to JSON

It is opinionated and may at times remove data.

install

git clone https://github.com/russellballestrini/blog-to-json.git
cd blog-to-json
python setup.py develop

Wordpress XML to JSON

how to use:

wordpress-xml-to-json example.xml

example of schema:

{
  "homegrown-python-bread-crumb-module": {
    "name": "a-homegrown-python-bread-crumb-module",
    "title": "A homegrown python bread crumb module",
    "timestamp": 1293995686,
    "comments": [
      {
        "date": "2011-04-03 10:33:07",
        "timestamp": 1301841187,
        "content": "Hi, this was just what I needed, did a few modifications but basically worked out of the box. Thanks for posting",
        "email": "oops",
        "author": "Kristian",
        "author_ip": "192.168.1.5"
      },
      {
        "date": "2011-04-03 14:19:46",
        "timestamp": 1301854786,
        "content": "I'm interested in the modifications, I just placed the code into bitbucket.  Feel free to branch it.  \n\nI'm also interested in seeing your project that you used it in.  Thanks",
        "email": "oops",
        "author": "Russell Ballestrini",
        "author_ip": "192.168.1.6"
      }
    ],
    "content": "<p><strong>I wrote <a href=\"https://bitbucket.org/russellballestrini/bread/raw/tip/bread.py\">bread.py</a> a few days ago.</strong> <a href=\"https://bitbucket.org/russellballestrini/bread/raw/tip/bread.py\">Bread.py</a> is a simple to use python breadcrumb module. \n</p>\n\n<p>\nThe bread object accepts a url string and grants access to the url crumbs (parts) or url links (list of hrefs to each crumb) .\n</p>\n\n<p>\nI have released <a href=\"https://bitbucket.org/russellballestrini/bread/raw/tip/bread.py\">bread.py</a> into the public domain and you may view the full source code here: <a href=\"https://bitbucket.org/russellballestrini/bread/src\">https://bitbucket.org/russellballestrini/bread/src</a>\n</p>\n\n<p>\n<strong>Update</strong>\n</p>\n\n<p>\nI recently revisited this module and wrote a tutorial on how to <a href=\"http://russell.ballestrini.net/add-a-breadcrumb-subscriber-to-a-pyramid-project-using-4-simple-steps/\">Add a Breadcrumb Subscriber to a Pyramid project using 4 simple steps</a>.\n</p>\n\n<ul>\n<li>Demo of bread.py: <a href=\"http://school.yohdah.com/\">http://school.yohdah.com/</a></li>\n<li>Pyrawiki will use bread.py</li> \n</ul>\n\n<br />\n\n<strong>You should follow me on twitter <a href=\"http://twitter.com/russellbal\" target=\"_blank\">here</a></strong>\n\n<span style=\"font-size: 10px;\">\n<script src=\"https://bitbucket.org/russellballestrini/bread/src/50a1a20fc3f3/bread.py?embed=t\"></script>\n</span>",
    "link": "http://russell.ballestrini.net/a-homegrown-python-bread-crumb-module/",
    "date": "2011-01-02 14:14:46"
  }
}

Why?

It's your data, thats why!

I created and used this tool during my Migration from WordPress to Pelican. Others have used this tool to migrate comments from Disqus to Remarkbox.