Skip to content

Conversion rules

Rand McKinney edited this page Aug 15, 2016 · 10 revisions

Overview

Conversion of Confluence HTML to Jekyll Markdown conversion will basically be a series of "search and replace" functions to be applied to every Confluence HTML file. The output will be a markdown file.

File names

In the Confluence export, file names are of the form Text-name_number.html, where the number is the Confluence revision number. Since we don't care about the revision number, stripg the underscore and the number* from the file name.

So, for example:

Connecting-to-MySQL_9634182.html

Would be converted to

Connecting-to-MySQL.md

NOTE: Some articles whose titles include special characters (usually a colon) have ONLY a number as the file name, e.g. 9634118.html. We can either handle these manually, or extract a text file name from the article title (see below).

Article title

Article title is in this block:

<h1 id="title-heading" class="pagetitle">
  <span id="title-text">... Article title ... </span>
</h1>

Use the contents of the <span id="title-text"> tag as the value for the title property in the article front-matter.

NOTE: If the title includes a colon character (:), Jekyll requires the title property to be quoted. In the Confluence export, these articles will have file names that are numbers instead of text.

Front matter

Every markdown file must start with some Jekyll front-matter that looks like this:

---
title: The article title goes here
lang: en
layout: page
keywords: LoopBack
tags:
sidebar: lb2_sidebar
permalink: /doc/en/lb2/The-file-name-goes-here.html
summary:
---

NOTE: The three dashes before and after front-matter are required.

In general, we don't have a consistent summary for every article, so we'll leave the summary property blank. Confluence export apparently does not include "labels" data, so we'll also leave the tags property blank. This seems pretty lame on the part of Confluence (Atlassian).

Article content

The actual article content is in:

... Content here ...
</div>```

Everything above and below this, i.e. outside of this tag, can be discarded.

## HTML to discard

Some pages may have these, which should just be discarded.

### Injected CSS

Discard all of this:
- Any `<span>...</span>` tags -- strip them out, but keep what's inside the tags.
- Injected CSS: `<style type='text/css'>/*<![CDATA[*/ .... /*]]>*/</style>`

### Confluence-generated TOC

Since our Jekyll theme has its own [automatic generated TOCs](http://idratherbewriting.com/documentation-theme-jekyll/mydoc_pages.html#automatic-mini-tocs), we should discard this HTML (that occurs only in some pages):
...
```

The class selector rbtoc1470354523244 varies by page.

Links

We need to process links whose href destination URL begins with https://docs.strongloop.com/display/APIC/ so they link to the new page here instead of the old page. All other links should be left "as is".

Convert

<a href="https://docs.strongloop.com/display/APIC/Creating+model+relations" rel="nofollow">
 Creating model relations
</a>

To

[Creating model relations](/doc/{{page.lang}}/lb2/Creating-model-relations.html)

Note the inclusion of the language page property for localization.

Headings

Convert headings as follows:

Confluence HTML Markdown
<h2> .. </h2> ##
<h3> .. </h3> ###
<h4> .. </h4> ####
<h5> .. </h5> #####
<h6> .. </h6> ######

Images

We'll copy all the image files (.png files, etc.) into the /images folder. It's not clear that it would be helpful to separate the LB2 images from LB3, etc. It might be easier just to keep all the image files in the same place.

Convert image tags to the Jekyll template image include.

So, for example, this HTML:

<img class="confluence-embedded-image" height="388" width="700" src="attachments/9634213/9830499.png" data-image-src="attachments/9634213/9830499.png">

Converts to:

{% include image.html file="9830499.png" alt="" %"}

NOTE: Most of the image content won't have an alt attribute, but let's add the attribute to the Jekyll include, to make it easier to add it later.

Other conversions

Other standard HTML -> Markdown conversion applies; for example, <p> ... </p> just becomes a block of text preceded by a blank line.

Code blocks

JavaScript code blocks like this:

<pre class="theme: Emacs; brush: jscript; gutter: false" style="font-size:12px;">
... Code here ...
</pre>

Converts to:

```js
... Code here ...
```

Notes, Warnings, Tips, etc.

These Confluence macros convert to Jekyll alerts. Unfortunately, the names are confusingly different.

We use the Confluence macro names here, but what really matters is the div class attribute.

Information

    <div class="aui-message hint shadowed information-macro">
      <span class="aui-icon icon-hint">Icon</span>
        <div class="message-content"> ... Text content ... </div>
    </div>

Converts to:

{% include note.html content="... Text content ..." %}

Tip

    <div class="aui-message success shadowed information-macro">
      <span class="aui-icon icon-success">Icon</span>
        <div class="message-content"> ... Text content ... </div>
    </div>

Converts to:

{% include tip.html content="... Text content ..." %}

Warning

    <div class="aui-message problem shadowed information-macro">
      <span class="aui-icon icon-problem">Icon</span>
        <div class="message-content"> ... Text content ... </div>
    </div>

Converts to:

{% include warning.html content="... Text content ..." %}

Note

    <div class="aui-message warning shadowed information-macro">
      <span class="aui-icon icon-warning">Icon</span>
        <div class="message-content"> ... Text content ... </div>
    </div>

Converts to:

{% include important.html content="... Text content ..." %}

Other macros

Review comment: This macro is used for internal review comments. The text should be hidden from readers.

In Confluence:

<div style="border: 1px dashed gray; background-color: #FFFF99; max-width: 800px; padding: 10px; margin: 5px 20px 5px 20px; " class="sl-hidden">
...
</div>

Markdown:

<div class="sl-hidden">
...
</div>