Python script for moving Blogger blogs (with images and comments) to Kirby CMS
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
LICENSE
README.md
blogger2kirby.py

README.md

blogger2kirby

Python script for moving Blogger blogs (with images and comments) to Kirby CMS.

Blogger allows you to export your blog as a single large XML file. My script takes this file, parses out just the blog posts and comments, and creates a folder and a text file in Markdown format for each post.

  • Any images are downloaded and given a unique filename in the post's folder
  • Image links are converted to Kirby format (looks like (image: image01.jpg))
  • Comments are appended to the end of each post, with comment author names/links and timestamps
  • Tags are preserved in the post's metadata

The resulting folders can simply be dropped into the content folder of a Kirby-based site, and boom: the blog has moved.

Requirements

The script is in Python 3. I couldn't use v2 because of problems with Unicode data.

Pandoc (version 1.13.1 or later) is also required for the HTML to Markdown conversion process.

The script also requires these libraries:

  • lxml, for parsing XML
  • BeautifulSoup, for parsing HTML
  • python-rfc3339, for parsing Atom timestamps in RFC3339 format (https://github.com/tonyg/python-rfc3339) -- note, at this time Python has no native support for parsing strings in this format. There is an open issue for this in Python's bug tracker, and the latest comment on that page identifies the above library as being the best one for the job.
  • requests, for downloading images over HTTP

On my Mac running Yosemite, the simplest way to get all these prerequisites was to install Homebrew, then run the following commands:

brew install python3
pip3 install git+https://github.com/tonyg/python-rfc3339.git
pip3 install lxml
pip3 install beautifulsoup4
pip3 install requests

brew install pandoc

Usage

Place your Blogger XML file in the same folder as the script, and name it blog.xml. Then run python3 blogger2kirby.py.

You can also run chmod u+x blogger2kirby.py to make it executable and then just run it as ./blogger2kirby.py, assuming your python3 lives in /usr/local/bin/python3 (if you installed it with Homebrew, that's where it would be).

You will see a lot of messages fly by about the posts being parsed out.

Afterwards there will be a folder named out in the current folder, containing a single folder for each post in the format YYYYMMDD-post-slug -- the slug will be the same as the filename on the post's original Blogger URI but without the .html -- this will allow for easy redirects.

Acknowledgements

This is my first Python script so I'm sure it's very rough in places.

I had googled and stack-overflowed about halfway through it when I came across this gist https://gist.github.com/larsks/4022537 by Lars Kellogg-Stedman. His is much better-written, but output is formatted for some other blogging platform, and doesn't download images or attempt to retain comments. I adopted one of the markdownify functions from that script, and it was his code that put me on to the use of lxml instead of the included ElementTree library.

This post was very helpful in understanding the Unicode string processing problems I was encountering in Python 2: Solving Unicode Problems in Python 2.7