Skip to content
A script to convert Wordpress XML dump to markdown files
Branch: master
Clone or download
Pull request Compare This branch is 2 commits ahead of dreikanter:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
wp2md
.editorconfig
.gitignore
LICENSE
README.md
setup.py

README.md

WordPress to Markdown Exporter

Update: I don't have much time to maintain this project, but I would really appreciate community help. If you looking for an open source project to contribute, it's a great opportunity. Pull request a very appreciated by me and migrating WordPress users.

A python script to convert WordPress XML dump to a set of plain text/markdown files. Intended to be used for migration from WordPress to public-static website generator, but could also be helpful as general purpose WordPress content processor.

Installation

The script could be installed by command:

pip install git+https://github.com/dreikanter/wp2md

It will install wp2md and the following dependencies:

Usage

Export WordPress data to XML file (Tools → Export → All content):

WordPress content export

And then run the following command:

wp2md -d /export/path/ wordpress-dump.xml

Where /export/path/ is the directory where post and page files will be generated, and wordpress-dump.xml is the XML file exported by WordPress.

Use --help parameter to see the complete list of command line options:

usage: wp2md [options] source

Export WordPress XML dump to markdown files

positional arguments:
  source      source XML dump exported from WordPress

optional arguments:
  -h, --help  show this help message and exit
  -v          verbose logging
  -l FILE     log to file
  -d PATH     destination path for generated files
  -u FMT      <pubDate> date/time parsing format
  -o FMT      <wp:post_date> and <wp:post_date_gmt> parsing format
  -f FMT      date/time fields format for exported data
  -p FMT      date prefix format for generated files
  -m          preprocess content with Markdown (helpful for MD input)
  -n LEN      post name (slug) length limit for file naming
  -r          generate reference links instead of inline
  -ps PATH    post files path (see docs for variable names)
  -pg PATH    page files path
  -dr PATH    draft files path
  -url        keep absolute URLs in hrefs and image srcs
  -b URL      base URL to subtract from hrefs (default is the root)

The output

The script generates a separate file for each post, page and draft, and groups it by configurable directory structure. By default posts are grouped by year-named directories and pages are just stored to the output folder.

Exported files

But you could specify different directory structure and file naming pattern using -ps, -pg and -dr parameters for posts, pages and drafts respectively. For example -ps {year}/{month}/{day}/{title}.md will produce date-based subfolders for blog posts.

Each exported file has a straightforward structure intended for further processing with public-static website generator. It has an INI-like formatted header followed by markdown-formatted post (or page) contents:

title: Я.Субботник в Санкт-Петербурге, 3 декабря
link: http://paradigm.ru/yandex-subbotni
creator: admin
description: 
post_id: 635
post_date: 2011-11-23 22:10:35
post_date_gmt: 2011-11-23 19:10:35
comment_status: open
post_name: yandex-subbotnik
status: publish
post_type: post

# Я.Субботник в Санкт-Петербурге, 3 декабря

Я.Субботник в Санкт-Петербурге пройдет 3 декабря в [офисе Яндекса](http://company.yandex.ru/contacts/spb/).
...

If the post contains comments, they will be included below.

See also

Copyright and licensing

Copyright © 2013 by Alex Musayev.
License: GNU (see LICENSE).

Project home: https://github.com/dreikanter/wp2md.

You can’t perform that action at this time.