Skip to content

palaniraja/blog2md

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

Blogger to Markdown

Convert Blogger & WordPress backup blog posts to hugo compatible markdown documents

Usage: node index.js b|w <BLOGGER BACKUP XML> <OUTPUT DIR>

For Blogger imports, blog posts and comments (as seperate file <postname>-comments.md) will be created in "out" directory

node index.js b your-blogger-backup-export.xml out

For WordPress imports, blog posts and comments (as seperate file <postname>-comments.md) will be created in "out" directory

node index.js w your-wordpress-backup-export.xml out

If you want the comments to be merged in your post file itself. you can use flag m at the end. Defaults to s for seperate comments file

node index.js w your-wordpress-backup-export.xml out m

If converting from WordPress, and you have posts that do not contain HTML, you can use a paragraph-fix flag at the end.

node index.js w your-wordpress-backup-export.xml out m paragraph-fix

Installation (usual node project)

  • Download or Clone this project
  • cd to directory
  • Run npm install to install dependencies
  • Run node index.js <arg...>

Notes to self

Script to convert posts from Blogger to Markdown.

  • Read XML
  • Parse Entries (Posts and comments) (with xpath?)
  • Parse Title, Link, Created, Updated, Content, Link
  • List Post & Respective comment counts
  • Content to MD - pandoc?
  • Parse Images, Files, Videos linked to the posts
  • Create output dir
  • List items that are not downloaded( or can't) along with their .md file for user to proceed

Reasons

  • Wrote this to consolidate and convert my blogs under one roof.
  • Plain simple workflow with hugo
  • Ideas was to download associated assets (images/files) linked to post. Gave up, because it was time consuming and anyhow I need to validate the markdown with assets of converted. And I don't see benefit.
  • Initial assumption was to parse with xpath but I found xml2json.js was easier
  • Also thought pandoc is a overkill and turndown.js was successful, though I had to wrap empty text to md instead of html.
  • I want to retain comments. Believe it or not, There were some good comments.
  • Was sick and spent around ~12 hrs over 5 days in coding and testing with my blog contents over ~150 posts. And also, I find parsing oddly satisfying when it result in success. ¯\_(ツ)_/¯

About

Convert Blogger & Wordpress backup blog posts to hugo compatible markdown documents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published