Skip to content

rrrene/word-to-markdown

 
 

Repository files navigation

Word to Markdown converter

A Ruby gem to liberate content from the jail that is Word documents

Build Status Gem Version

The problem

Link to blog post here

Demo

Getting HTML content out of Microsoft Word

  1. Open the file in Microsoft Word
  2. Select "File" -> "Save as Web Page"
  3. Hit save

Usage

doc = WordToMarkdown.new("/path/to/export.htm")
=> <WordToMarkdown path="/path/to/export.htm">

doc.to_s
=> "# Test\n\n This is a test"

doc.html
=> "<html>\n\n<head>..."

doc.doc
=> <Nokogiri Document>

Supports

  • Paragraphs
  • Numbered lists
  • Unnumbered lists
  • Italic
  • Bold
  • Explicit headings (e.g., selected as "Heading 1" or "Heading 2")
  • Implicit headings (e.g., text with a larger font size relative to paragraph text)
  • Images
  • Tables

Future Support

  • Nested lists

Testing

script/cibuild

Server

The development version of the gem contains a lightweight server for converting Word Documents as a service.

To run the server, simply run script/server and open localhost:9292 in your browser. The server can also be run on Heroku.

A live version runs at word-to-markdown.herokuapp.com.

You can also use it as a service by posting raw HTML to /raw, which will return the raw markdown in response.

About

A ruby gem to liberate content from Microsoft Word documents

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 86.6%
  • Shell 7.7%
  • CSS 5.3%
  • CoffeeScript 0.4%