Skip to content

gruenerBogen/Serlo-Markup-Prototype

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Serlo-Markup-Prototype

A prototype for a markup language on serlo.org.

Idea

In order to develop a markup language for serlo.org, we construct a prototype converter between the Serlo API and the markup language. This allows for testing the markup language with real contents by converting them back and forth. The conversion from markup language to the API will be used to display the edited contents.

To implement these goals, the following tools will be written: A library which handles the conversion and a few command line tools for interfacing the library.

Currently, the idea is to use AsciiDoc as the underlying markup language. This is done for two main reasons: Firstly it is a language which naturally allows for content type extensions. This allows the vanilla use of an already defined markup language. Secondly this vanilla use of a markup language gives rise to the hope that some (external) editors already provide syntax highlighting for the markup language.

Executables

Using the command line, one can convert an article from serlo.org to AsciiDoc via the executable serlo2adoc. You can find a manpage for it int the folder doc. You can use cabal to build and install the library and executable. To get an impression, on what the programm can convert, you can look at the test article which roughly contains everything which can be converted.

Installation instructions are provided in INSTALL.adoc.

For downloading articles one can use the helper python scripts in the folder helper. They take as argument an article ID and print the article’s contents to the standard output. The script fetch-article.py only fetches the article without extracting it’s contents from the API request. This contains more metadata. The script fetch-article-content.py only outputs the article’s contents. When you redirect the output from the latter script, you can feed the file directly into serlo2adoc to see a conversion of the article to AsciiDoc.

State of Implementation

The conversion is done in a three step process: First there is a conversion from the JSON created by the Serlo API to an internal representation for the Serlo content. In the second step this internal representation is converted to an internal representation of an AsciiDoc document. The third step consists of converting the internal AsciiDoc representation to the actual AsciiDoc document. For the conversion in the other direction, both steps get reversed. Hence the implemtation consists of two parts: A translation between the Serlo API and the internal model and a translation between the internal model and the markup language. Graphically:

Serlo API JSON <--> internal Serlo content model <--> internal AsciiDoc model <--> AsciiDoc

Translation between API JSON and internal model

The Serlo Editor constists of a collection of plugins. These plugins each have a particular state, which determines their content. To track progress it is therefore useful to collect the plugins with which the translator can deal with. These are:

  • text (partially, for details see below)

  • injection (only from JSON to internal model)

  • rows

  • spoiler

  • image (only from JSON to internal model)

  • important (only from JSON to internal model)

The text plugin itself is complicated enough to deserve its own treatment: The text plugin itself has a substructure which denotes the text formatting. Currently supported are

  • bold/italic text

  • headings

  • normal text

  • paragraphs

  • maths

  • unordered lists

AsciiDoc parsing and printing

The internal model to represent an AsciiDoc document is still under heavy development. Regarding the conversion, there are two directions to cover: The printing, that is converting the internal model to an actual document, and the parsing, that is converting the actual document to the internal model. On the printing side, the following parts are working:

  • document header

  • sections

  • paragraph content (at least some parts)

The printable parts of the paragraph content are

  • plain text

  • bold/italic/highlighted text

  • subscript and superscript

  • maths

  • inline macro calls

Some representation for maths contents does exist, but it cant be rendered right now because it is unclear how to render it.

The parser is currently only capable of parsing the header and some blocks. These are

  • section

  • paragraph (for details see below)

Not all syntax elements of a paragraph are parseable right now. Currently the parser recognises the following syntax elements inside a paragraph

  • plain text (anything which is never a special char)

  • hard line breaks

  • end of a paragraph

  • macro calls

  • unconstrained bold/italic/monospace/highlighted text

  • sub- and superscript text

  • inline maths (asciimath and latex)

The unconstrained bold/italic/monospace implementation deviates from the specification by allowing stacking the formatting marks in any order.

Mapping between the internal models

There need to be two conversions for the internal model. Right now, there is a partial implementation to convert the Serlo internal model to the AsciiDoc internal model. In that, the following can be converted to AsciiDoc

  • rows

  • paragraphs (partially, see below)

  • spoilers

  • injections

  • images

  • important

For paragraphs, the following parts can be converted

  • plain text

  • bold/italic text

  • maths

Caveats

The Serlo editor encoding is not one of the best documented pieces of software. Nevertheless the source code gives understandable state descriptions for some plugins. Sadly, the text plugin, which is the bread and butter plugin for article creation, is so complicated, that the editor code itself is not a good documentation for the underlying data structure. Hence the API-side implementation of the text plugin is highly explorative.

Similar caveats concern the overall code quality. It is explorative code which probably ignores a bunch of best practices.

About

A prototype for a markup language on serlo.org

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published