At The Marshall Project, stories are edited in Google Docs. I wrote a quick tool to convert the HTML export from a Google Doc to Markdown. (Internally, our stories are stored as Markdown). Turns out, parsing CSS with regexes is not a great idea. This gem is the next iteration.
Here's the strategy:
- Inline the CSS for
font-style: italic;based on the
.c01(etc) classes with the
- Parse the inline styles into a hash of CSS properties with the
- Wrap the
<span>with either a
<em>based on the CSS properties on it. A single
<span>may get wrapped multiple times if the text is both bold and italic, for example. Then remove all the
- Pass this cleaned HTML to
kramdownto yield markdown.
Add this line to your application's Gemfile:
gem 'googledoc_markdown', github: 'ivarvong/googledoc_markdown', tag: 'v0.1.1'
And then execute:
This gem is not stable and probably shouldn't be used yet. The spec might be useful.
require 'googledoc_markdown' converter = GoogledocMarkdown::Converter.new(html: your_google_doc_html) markdown = converter.to_markdown
After checking out the repo, run
bin/setup to install dependencies. Then, run
guard to run the tests.
Bug reports and pull requests are welcome on GitHub at https://github.com/ivarvong/googledoc_markdown.
The gem is available as open source under the terms of the MIT License.