Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
A script to convert Wordpress XML dump to markdown files
Python
branch: master

This branch is 14 commits behind dreikanter:master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib
.editorconfig
.gitignore
LICENSE
README.md
TODO
setup.py
test-export.bat
wp2md.py

README.md

Wordpress to Markdown Exporter

A python script to convert Wordpress XML dump to a set of plain text/markdown files. Intended to be used for migration from Wordpress to public-static website generator, but could also be helpful as general purpose Wordpress content processor.

Installation

The script could be installed by command:

pip install -e git+git://github.com/dreikanter/wp2md#egg=wp2md

It will install wp2md and the following dependencies:

Usage

Export Wordpress data to XML file (Tools → Export → All content):

Wordpress content export

And then run the following command:

python wp2md.py -d /export/path/ wordpress-dump.xml

Where /export/path/ is the directory where post and page files will be generated, and wordpress-dump.xml is the XML file exported by Wordpress.

Use --help parameter to see the complete list of command line options:

usage: wp2md.py [options] source

Export Wordpress XML dump to markdown files

positional arguments:
  source      source XML dump exported from Wordpress

optional arguments:
  -h, --help  show this help message and exit
  -v          verbose logging
  -l FILE     log to file
  -d PATH     destination path for generated files
  -u FMT      <pubDate> date/time parsing format
  -o FMT      <wp:post_date> and <wp:post_date_gmt> parsing format
  -f FMT      date/time fields format for exported data
  -p FMT      date prefix format for generated files
  -m          preprocess content with Markdown (helpful for MD input)
  -n LEN      post name (slug) length limit for file naming
  -r          generate reference links instead of inline
  -ps PATH    post files path (see docs for variable names)
  -pg PATH    page files path
  -dr PATH    draft files path
  -url        keep absolute URLs in hrefs and image srcs
  -b URL      base URL to subtract from hrefs (default is the root)

The output

The script generates a separate file for each post, page and draft, and groups it by configurable directory structure. By default posts are grouped by year-named directories and pages are just stored to the output folder.

Exported files

But you could specify different directory structure and file naming pattern using -ps, -pg and -dr parameters for posts, pages and drafts respectively. For example -ps {year}/{month}/{day}/{title}.md will produce date-based subfolders for blog posts.

Each exported file has a straightforward structure intended for further processing with public-static website generator. It has an INI-like formatted header followed by markdown-formatted post (or page) contents:

title: Я.Субботник в Санкт-Петербурге, 3 декабря
link: http://paradigm.ru/yandex-subbotni
creator: admin
description: 
post_id: 635
post_date: 2011-11-23 22:10:35
post_date_gmt: 2011-11-23 19:10:35
comment_status: open
post_name: yandex-subbotnik
status: publish
post_type: post

# Я.Субботник в Санкт-Петербурге, 3 декабря

Я.Субботник в Санкт-Петербурге пройдет 3 декабря в [офисе Яндекса](http://company.yandex.ru/contacts/spb/).
...

If the post contains comments, they will be included below.

See also

Something went wrong with that request. Please try again.