Skip to content

openSUSE/rstxml2docbook

Repository files navigation

Convert RST to DocBook XML

License GPL 3+ Travis CI Code Climate Scrutinizer Code Quality Code Coverage

The :program:`rstxml2db` script converts RST XML files to DocBook XML.

Quick Start

To use the program without :command:`pip` and virtual environment, use the following command after cloning this repository:

$ PYTHONPATH=src python3 -m rstxml2db -h

Installing

To install :program:`rstxml2db` in a Python virtual environment, use the following steps:

  1. Clone this repository:

    $ git clone http://github.com/openSUSE/rstxml2docbook.git
    $ cd rstxml2docbook
    
  2. Create a Python 3 environment and activate it:

    $ python3 -m venv .env
    $ source .env/bin/activate
    
  3. Update the pip and setuptools modules:

    $ pip install -U pip setuptools
    
  4. Install the package:

    $ ./setup.py develop
    

If you need to install it from GitHub directly, use this URL:

git+https://github.com/openSUSE/rstxml2docbook.git@develop

After the installation in your Python virtual environment, two executable scripts are available: :program:`rstxml2db` and :program:`rstxml2docbook`. Both are the same, it's just for convenience.

Workflow

The script does the following steps:

  1. Read the intermediate XML files from a previous Sphinx conversion step (see :ref:`sec.build.xml.files`).
  2. Resolves any references to external files and create a single XML tree in memory.
  3. Transform the tree with XSLT into DocBook and if requested, split it into several smaller files.
  4. Output to stdout or save it into one or more file, depending on if splitting mode is activated.

Building the Intermediate XML Files

Usually, you first create the intermediate XML file (using the XML builder with the :option:`-b` option):

$ sphinx-build -b xml -d .../build/html.doctree src/ xml/

The src/ directory contains all of your RST files, whereas the xml/ directory is the output directory.

Each RST file generates a corresponding XML file.

Building the DocBook Files

After you have created the intermediate XML files, it's now time to use the :program:`rstxml2db` script. The script reads in all XML files and creates DocBook files, for example:

$ rstxml2db xml/index.xml

By default, the previous step uses the :file:`index.xml` file and generates several DocBook files all located in the out/ directory.

If you need one DocBook file, use the option :option:`-ns` to output the result DocBook file on stdout.

The Internal Workflow

The workflow from converting RST XML files into DocBook involves these steps:

  1. Load the :file:`index.xml` file.
  2. Resolve all external references to other files; create one single RST XML tree.
  3. If :option:`--legalnotice` is used, add the legalnotice file into bookinfo.
  4. If :option:`--conventions` is used, replace first chapter with preface content.
  5. Clean up XML:
    1. Remove IDs with no corresponding <xref/>.
    2. Fix absolute colum width into relative value.
    3. Add processing instruction in <screen>, if the maximum characters inside screen exceeds a certain value.
  6. Output tree, either by saving it or by printing it to std out.

The transformation from separate RST XML files into a single RST XML tree uses mainly the element list_item[@classes='toctree-l1']. Anything that is referenced is used as a file for inclusion. Everything else is copied as it is.

The transformation from the single RST XML tree into DocBook 5 uses the :file:`rstxml2db.xsl` stylesheet.

Things to Know During Convertion

The convertion internally creates a single RST XML tree. This tree contains all information which is needed.

For example, the following things work:

  • Internal referencing from one section to another (element reference[@internal='True'])
  • Internal references to a glossary entry (element reference[@internal='True'], but with @refuri containing an # character
  • External referencing to a remote site (element reference[@refuri])
  • Different, nested sections are corretly converted into the DocBook structures (book, chapter, section etc.)
  • Admonition elements
  • Tables and figures
  • Lists like bullet_list, definition_list, and enumerated_list
  • Glossary entries
  • Inline elements like strong, literal_emphasis

The following issues are still problematic:

  • Double IDs When RST contains the same title, the same IDs are generated from the RST XML builder. I consider it as a bug.
  • Invalid Structures RST allows structures which are not valid for DocBook. For example, when you have sections and add after the last section you add more paragraphs. This will lead to validation errors in DocBook. The script currently does not detect these structural issues. You need to adapt the structure manually.