Skip to content

a collection of Pandoc writers and filters to export indices

License

Notifications You must be signed in to change notification settings

massifrg/pandoc-export-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pandoc-export-index

This software is a collection of Pandoc Lua filters and custom writers to export a document with indices in these formats:

  • InDesign ICML

  • docx

  • odt

  • ConTeXt

  • LaTeX

Currently there's only a Writer to export an index (one level only) to an ICML standalone document (-s option in Pandoc).

What is needed to specify indices

For indices, you need:

  • the names of the indices (many formats support only one index);

  • the terms (topics) for every index;

  • the references to those terms in the text.

An example document

In the doc directory there's an indices-example.md document.

Defining indices for Pandoc

This software considers an index definition a Div block with

  • an index class; this is mandatory, because makes the Div an index database;

  • an index-name; this is optional; if not set, its value is considered to be "index"; please do use simple names without numbers or symbols for indices' names, like "index", "names", "topics", "biblio", "subjects", etc.

  • a ref-class attribute that specifies the class that Span inlines must have to be considered references to this index; this is optional; if not set, its value is considered to be "index-ref";

  • a put-index-ref attribute that can be "before" or "after", see below; this is optional; if not set, its value is considered to be "after".

Why a Div? Because it's a Block that carries arbitrary data within the Attr structure; and it's a container of Blocks.

References

Index references are Span inlines with:

  • a class that matches the ref-class of an index defined somewhere in the document; this is mandatory, since it's what makes this Span an index reference;

  • an idref attribute that matches the id attribute of a term of that index; this is optional, but if not set, you won't get this occurrence in the index;

  • an optional indexed-text attribute with the text it refers to; this is useful when you use empty references (an empty Span just put at the left or at the right of the text it refers to)

Why a Span? Because it's among inlines and carries arbitrary data within the Attr structure.

Terms (topics)

Index terms are Div blocks with:

  • an index-term class; this is mandatory, because it's what makes this Div an index term, instead of a generic Div;

  • an id; this is mandatory too, otherwise you can't reference this term in the text;

  • an index-name attribute whose value matches the one of an index; this is optional, especially when the term Div is inside an index Div;

  • an optional sort-key attribute, specifying a simple text according to which the term must be sorted; generally the filters and writers of this repository don't do sorting.

Why a Div? A Para or a Plain are enough in many cases, but they have no data attached (no Attr). An index topic could also be quite long and multi-paragraph (i.e. think of an index of people with biographical profiles or a glossary with references to the pages where a topic is discussed).

Currently there's no support for sub-topics, but it's planned.

How indices are modelled in different formats

AFAIK we can divide formats into two families from the indexing point of view:

  • ICML, docx, odt: there's a database of terms and references to them in the text; rendering indices in HTML and epub could follow this model too;

  • ConTeXt, LaTeX: the database is built incrementally from macro calls like \index{term}, \index{head+sub} (ConTeXt), \index{head!sub} (LaTeX).

This package follows the first model, so writers for ConTeXt and LaTeX should do some work to adapt it.

In ConTeXt I know it's possible, because I used this workaround in a project of mine:

\defineregister[myIndex][deeptextcommand=\IdToTerm]

\starttext
... foo\myIndex[foo]{fooId} bar\myIndex[bar]{barId} ...

\placeregister[myIndex]
\stoptext

where \IdToTerm is a macro that gets an id as input and places the TeX tokens of the corresponding term, while \myIndex must be followed by two parameters: the sorting key in brackets and the term id in braces.

Extracting indices as JSON objects: the indices2json.lua Writer

indices2json.lua is a custom writer to extract indices and terms defined in a document as JSON objects, that you may then use to build an external database.

Example: enter the src directory and type

pandoc -f markdown -t indices2json.lua ../test/test.md

and you'll get something like this:

{
  "indices": [
    {
      "name": "subjects",
      "prefix": "subjects",
      "refClass": "index-ref",
      "refWhere": "after"
    }
  ],
  "terms": {
    "subjects": [
      {
        "blocks": [
          {
            "c": [
              {
                "c": "Consequo",
                "t": "Str"
              }
            ],
            "t": "Para"
          }
        ],
        "id": "consequo",
        "sortKey": "consequo",
        "text": "Consequo\n"
      },
      {
        "blocks": [
          {
            "c": [
              {
                "c": [
                  {
                    "c": "Labor",
                    "t": "Str"
                  }
                ],
                "t": "Emph"
              }
            ],
            "t": "Para"
          }
        ],
        "id": "labor",
        "sortKey": "labor",
        "text": "Labor\n"
      }
    ]
  }
}

Exporting an index to ICML: the icml_with_index.lua Writer

InDesign has only one index, so you can't define more indices inside a document (actually there's a workaround, using the first level of the index to discriminate among different indices, but it may an option for future versions of this software).

In ICML, the actual index is in a <Index> element that lives outside the main <Story> element, so you can't add it through a filter, because filters can only modify the contents of the <Story> element.

So it looks like the only way to add an index is through templates, and a custom writer:

pandoc -f markdown -t icml_with_index.lua -s test.md

The custom writer can modify the default template for ICML on the fly, putting an $index$ before <Story Self="pandoc_story", then fill the index variable with the index contents.

Here's the custom writer's main function:

function Writer(doc, opts)
 local collected = pandocIndices.collectIndices(doc)
  indices = collected.indices
  terms = collected.terms
  local filtered = doc
  for i = 1, #indices_filters do
    logging_info("applying filter #" .. i)
    local filter = indices_filters[i]
    filtered = filtered:walk(filter)
  end
  -- make a clone of opts and add the index variable
  local options = pandoc.WriterOptions(opts)
  options.variables.index = index_var
  return pandoc.write(filtered, 'icml', options)
end

Some filters are applied to collect index data and fill the index_var variable, whose value is put into options.variables.index before calling pandoc.write(filtered, 'icml', options). The writer then replaces $index$ in the template with the value of options.variables.index.

Exporting indices to DOCX: the docx_index.lua filter

docx_index.lua is a filter that injects references to index terms in the text.

Here's an example:

pandoc -f markdown -t docx -o doc-with-index.docx -L docx_index.lua doc.md

When you open the resulting DOCX file, you won't see an index. You must create it explicitly (e.g. References -> Insert index) with your word processing app (e.g. Word).

Exporting indices to DOCX: the odt_index.lua filter

odt_index.lua is a filter that injects references to index terms in the text.

Here's an example:

pandoc -f markdown -t odt -o doc-with-index.odt -L odt_index.lua doc.md

When you open the resulting ODT file, you won't see an index. You must create it explicitly with your word processing app. In LibreOffice, you can click on Insert - Table of Contents and Index - Table of Contents, Index or Bibliography.

Though LibreOffice supports many indices, for now the only one that is created is the alphabetical index.

Version

The current version is 0.3.0 (2024, April 25th).

Aknowledgements

This software

  • provides custom writers and filters for Pandoc;

  • and makes use of William Lupton's logging.lua module.

About

a collection of Pandoc writers and filters to export indices

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages