Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.


Semantic Authoring Markdown

XML was designed to be human readable, but not human writable. Graphical editors help, though they have their issues (like trying to find the right insertion point to add permitted elements). But graphical editors have a problem: when they hide the tags, they also hide the structure.

Semantic Authoring Markdown brings the ideas behind Markdown -- a natural syntax for writing HTML documents -- to structured writing. It creates a syntax that captures structure, like XML, but is easy to write in a text editor -- like markdown.


See the documentation at


The SAM Language is still being defined and both the language itself and the way it is serialized and represented internally by the parser is subject to change. I hope to stabilize the language definition soon.

Backward-incompatible changes

Since SAM is under active development, there may be backward-incompatible changes in the language. They will be noted here as they occur, up until we get to the point where the language or its serialization are considered final.

  • Revision 227cb3dd7bb322f5579858806071c1ff8456c0b6 introduced a change in the way the XML representation of a record is generate. A record used to output as "row". It is now output as "record".

  • Revision 3fdd6528d88b1a7f0a72c10ce5b5e768433eaf19 introduced a change in how inline code is serialized. It is now serialized as a <code> element rather than as a <phrase> element with an <annotation type="code"> nested element.

  • Revision 8e8c6a0b4c9c41bd72fab5fd53e3d967e9688110 removed the === flag for a block of embedded code, which had been briefly introduced in an earlier revision. Blocks of embed code should now be represented as regular code blocks using an encoding attribute (=svg) rather than a language attribute (svg).

  • Revision fac3fea6a9570a20c825369417ab2eaf94d34d2b made annotation lookup case insensitive. Case sensitive lookup can be turned on using the declaration !annotation-lookup: case sensitive

  • Revision 828ef33d291f1364a6edf036588ac5f21fac0abb addressed issue #142 by detecting recursive includes. This had the side effect of changing the behavior when the parser encounters an error in an included file. Previously this error was demoted to a warning (not sure why). Now it is treated as an error and stops the parser. Without this change, the error would not get noted in the error count for batch processing, which is clearly not a good idea. To allow for more lenient error handling while retaining appropriate error reporting, we would need to introduce a reportable non-fatal error type. Issue #148 has been raised to consider this.

  • Revision e0fa711d14219cbad19636515e2dc2bbe3a82f28:

    • Changed the format of error messages to report the original line on which the error occurred rather than a representation of the object created.

    • Changed the format produced by the __str__() method on doc structure objects to a normalized representation of the input text generated by the new regurgitate() method.

    • Changed the serialization of citations on a block so they come before the title, not after it.

    • Changed the object model of Blockinsert and Inlineinsert object to make the type and item value separate fields of the object rather than simple attributes.

    • Changed serialization of block embed from "embed" to "embedblock" to be consistent with "codeblock".

    • Changed type of embedded blocks from Codeblock to Embedblock.

    • Removed support for embeded XML fragments per the discussion of issue #145. SAM has outgrown this feature, which is incompatible with the plan to introduce SAM Schemas.

  • Revision 1d16fd6d0544c32fa23930f303989b1b4a82c477 addressed #157 by changing the serialization of citations as described in #157 and adding support of the use of keys in citations.

  • Revision ad4365064bdfe61fa43228991a31b3174feb2957 removes the smart quotes parser option (the flag that turned smart quotes on and off on the command line) and introduced the !smart-quotes document declaration and the option to add custom smart quotes rules to the parser.

  • Revision b4ca40baa03233ff306ed20a59da92668e4e0872 changes the syntax for inserting a value by reference. It used to be >(#foo) but this was confusing because parenthese are used to create names and ids, not to reference them. The syntax for referencing a name or id is [#foo]. So, the syntax for inserting a value by reference is now >[#foo]. This applies to strings, ids, names, fragments, and keys. Note that the syntax for inserting a value by URI still uses parentheses, since this is new information, not a reference to another internal value. Also note the difference between [#foo] which means generate a reference to the content with the name foo and >[#foo] which means insert the content with the name foo at the current location. (These are of course, operations performed at application layer, not by the parser.)

  • Starting with revision dd07a4b798fcaa14a722a345b5ab8e07c3df42a1 the way attributes were modeled internally changed. Instead of using as separate Attribute object, attributes became Python attributes on the relevant block or phrase object. This does not affect command line use but would affect programmatic access to the document structure.

  • Starting with revision dd07a4b798fcaa14a722a345b5ab8e07c3df42a1, the use of fragment references with the syntax [~foo] was removed (see issue #166). Fragments can be inserted by name or ID just like any other block type.

  • From revision 3e9b8f6fd8cddf9cbedb25c44ab48323216ce71e

    • The change to insert by reference in b4ca40baa03233ff306ed20a59da92668e4e0872 is reversed. It caused slightly more confusion than the old version.

    • The ~ symbol or referencing an fragment is removed. Fragments should be referenced by name or id.

    • The strings feature has been renamed "variable". This chiefly affects the serialization of variable definitions and references.

  • In revision 1f20902624d29dab002353df8374952c63fff81d the serialization of citations has been changed to support compound identifiers and to support easier processing of citations. See the language docs for details.

  • In revision defbc97c9bd592ab454296852c3d9a65e1007996 the command line options changed to support three different output modes as subcommands. Other options changed as well. See above for the new command line options. Also, the serialization interface to the parser changed. When calling the parser from a python program you no longer call samparser.serialize('xml') or samparser.serialize('html') but samparser.doc.serialize_xml() and samparser.doc.serialize_html() respectively.

Please report any other backward incompatibilities you find so they can be added to this list.


Semantic Authoring Markdown




No releases published


No packages published
You can’t perform that action at this time.