Skip to content
Convert HTML/Word DOCX into Asciidoctor syntax
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
lib
spec
.hound.yml
.rubocop.yml
.travis.yml
Gemfile
LICENSE.txt
README.adoc
Rakefile
appveyor.yml
reverse_asciidoctor.gemspec

README.adoc

reverse_asciidoctor

Gem Version Build Status Code Climate Appveyor Build Status

Transforms HTML into asciidoctor.

Requirements

  1. Nokogiri

  2. Ruby 1.9.3 or higher

Installation

Install the gem

[sudo] gem install reverse_asciidoctor

or add it to your Gemfile

gem 'reverse_asciidoctor'

Usage

Ruby

You can convert html content as string or Nokogiri document:

input  = '<strong>feelings</strong>'
result = ReverseAsciidoctor.convert input
result.inspect # " *feelings* "

Commandline

It’s also possible to convert html files to markdown using the binary:

$ bin/reverse_asciidoctor file.html > file.adoc
$ cat file.html | bin/reverse_asciidoctor > file.adoc

In addition, the bin/w2a script (adapted from the bin/w2m script in Ben Balter’s word-to-markdown) extracts HTML from Word docx documents, and converts it to Asciidoc.

$ bundle exec bin/w2a document.docx > document.adoc

The script presumes that LibreOffice has already been installed: it uses LibreOffice’s export to XHTML. LibreOffice’s export of XHTML is superior to the native Microsoft Word export to HTML: it exports lists (which Word keeps as paragraphs), and it exports OOMML into MathML. On the other hand, the LibreOffice export relies on default styling being used in the document, and it may not cope with ordered lists or headings with customised appearance. For best results, reset the styles in the document you’re converting to those in the default Normal.dot template.

If you wish to convert the MathML in the document to AsciiMath, run the script with the --mathml2asciimath option:

$ bundle exec bin/w2a --mathml2asciimath document.docx > document.adoc

Note that some information in OOMML is not preserved in the export to MathML from LibreOffice; in particular, font shifts such as double-struck fonts.

Note that the LibreOffice exporter does seem to drop some text (possibly associated with MathML); use with caution.

Features

As a port of reverse_markdown, reverse_asciidoctor shares its features:

  • Module based - if you miss a tag, just add it

  • Can deal with nested lists

  • Inline and block code is supported

  • Supports blockquote

It supports the following html tags supported by reverse_markdown:

  • a

  • blockquote

  • br

  • code, tt (added: kbd, samp, var)

  • div, article

  • em, i (added: cite)

  • h1, h2, h3, h4, h5, h6, hr

  • img

  • li, ol, ul (added: dir)

  • p, pre

  • strong, b

  • table, td, th, tr

Note
  • reverse_asciidoctor does not support del or strike, because Asciidoctor out of the box does not

  • As with reverse_markdown, pre is only treated as sourcecode if it is contained in a div@class = highlight- element, or has a @brush attribute naming the language (Confluence).

  • The gem does not support p@align, because Asciidoctor doesn’t

In addition, it supports:

  • aside

  • audio, video (with @src attributes)

  • figure, figcaption

  • mark

  • q

  • sub, sup

  • @id anchors

  • blockquote@cite

  • img/@width, img/@height

  • ol/@style, ol/@start, ol/@reversed, ul/@type

  • td/@colspan, td/@rowspan, td@/align, td@/valign

  • table/caption, table/@width, table/@frame (partial), table/@rules (partial)

  • Lists and paragraphs within cells

    • Not tables within cells: Asciidoctor cannot deal with nested tabls

It also supports MathML…​ sort of.

  • Asciidoctor supports AsciiMath and LaTeX for stem expressions. HTML uses MathML. The gem will recognise MathML expressions in HTML, and will wrap them in Asciidoctor \$ \$ macros. The result of this gem is not actually legal Asciidoctor for stem: Asciidoctor will presumably think this is AsciiMath in the \$ \$ macro, try to pass it into MathJax as AsciiMath, and fail. But of course, MathJax has no problem with MathML, and some postprocessing on the Asciidoctor output can ensure that the MathML is treated by MathJax (or whatever else uses the output) as such; so this is still much better than nothing for stem processing.

    • On the other hand, if you are using this gem in the context of Metanorma, Metanorma Asciidoctor accepts MathML as a native mathematical format. So you do not need to convert the MathML to AsciiMath.

    • The gem will optionally invoke the https://github.com/metanorma/mathml2asciimath gem, to convert MathML to AsciiMath. The conversion is not perfect, and will need to be post-edited; but it’s a lot better than nothing.

The gem does not support:

  • col, colgroup

  • source, picture

  • bdi, bdo, ruby, rt, rp, wbr

  • frame, frameset, iframe, noframes, noscript, script, input, output, progress

  • map, canvas, dialog, embed, object, param, svg, track

  • fieldset, button, datalist, form, label, legend, menu, menulist, optgroup, option, select, textarea

  • big, dfn, font, s, small, span, strike, u

  • center

  • data, meter

  • del, ins

  • footer, header, main, nav, details, section, summary, template

Configuration

The following options are available:

  • unknown_tags (default pass_through) - how to handle unknown tags. Valid options are:

    • pass_through - Include the unknown tag completely into the result

    • drop - Drop the unknown tag and its content

    • bypass - Ignore the unknown tag but try to convert its content

    • raise - Raise an error to let you know

  • tag_border (default ' ') - how to handle tag borders. valid options are:

    • ' ' - Add whitespace if there is none at tag borders.

    • '' - Do not not add whitespace.

  • mathml2asciimath - if true, will use the https://github.com/metanorma/mathml2asciimath gem to convert MathML to AsciiMath

As options

Just pass your chosen configuration options in after the input. The given options will last for this operation only.

ReverseAsciidoctor.convert(input, unknown_tags: :raise, mathml2asciimath: true)

Preconfigure

Or configure it block style on a initializer level. These configurations will last for all conversions until they are set to something different.

ReverseAsciidoctor.config do |config|
  config.unknown_tags     = :bypass
  config.mathml2asciimath  = true
  config.tag_border  = ''
end
You can’t perform that action at this time.