Skip to content

A tool for converting IETF ABNF grammars to Invisible XML

License

Notifications You must be signed in to change notification settings

nineml/abnf2ixml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

abnf2ixml

This repository contains a small set of tools to parse ABNF grammars from IETF specifications and convert them to Invisible XML. I originally described this project with a short presentation during the “open mic” session at Balisage 2023.

The build.gradle script automates a few examples, try:

./gradlew parse_uri
./gradlew parse_json
./gradlew parse_jsonpath

(That’s .\gradlew.bat on Windows)

The steps involved are:

  1. Extract the ABNF grammar from the specification. See src/main/abnf for examples.
  2. Process the ABNF grammar with src/main/ixml/ABNFp.ixml. This converts the ABNF grammar into XML. The ABNFp.ixml grammar is a slightly edited version of the Invisible XML sample ABNF.ixml grammar.
  3. Transform the XML grammar into “VXML”, the visible XML grammar. (That is, the XML serialization of an iXML grammar.)
  4. Parse your input with the resulting VXML grammar.

Managing marks

A straight conversion of ABNF to iXML will parse inputs that conform to the original ABNF grammar, but there’s no provision for “marks” in the ABNF. This results in every nonterminal being present in the output. For example, parsing https://balisage.net/2023/Program.html#W1330 produces:

<URI>
   <scheme>https</scheme>:
   <hier-part>//
      <authority>
         <host>
            <reg-name>
               <unreserved>b</unreserved>
               <unreserved>a</unreserved>
               <unreserved>l</unreserved>
               <unreserved>i</unreserved>
               <unreserved>s</unreserved>
               <unreserved>a</unreserved>
               <unreserved>g</unreserved>
               <unreserved>e</unreserved>
               <unreserved>.</unreserved>
               <unreserved>n</unreserved>
               <unreserved>e</unreserved>
               <unreserved>t</unreserved>
            </reg-name>
         </host>
      </authority>
      <path-abempty>/
         <segment>
            <pchar>
               <unreserved>2</unreserved>
            </pchar>
            <pchar>
               <unreserved>0</unreserved>
            </pchar>
            <pchar>
               <unreserved>2</unreserved>
            </pchar>
            <pchar>
               <unreserved>3</unreserved>
            </pchar>
         </segment>/
         <segment>
            <pchar>
               <unreserved>P</unreserved>
            </pchar>
            <pchar>
               <unreserved>r</unreserved>
            </pchar>
            <pchar>
               <unreserved>o</unreserved>
            </pchar>
            <pchar>
               <unreserved>g</unreserved>
            </pchar>
            <pchar>
               <unreserved>r</unreserved>
            </pchar>
            <pchar>
               <unreserved>a</unreserved>
            </pchar>
            <pchar>
               <unreserved>m</unreserved>
            </pchar>
            <pchar>
               <unreserved>.</unreserved>
            </pchar>
            <pchar>
               <unreserved>h</unreserved>
            </pchar>
            <pchar>
               <unreserved>t</unreserved>
            </pchar>
            <pchar>
               <unreserved>m</unreserved>
            </pchar>
            <pchar>
               <unreserved>l</unreserved>
            </pchar>
         </segment>
      </path-abempty>
   </hier-part>#
   <fragment>
      <pchar>
         <unreserved>W</unreserved>
      </pchar>
      <pchar>
         <unreserved>1</unreserved>
      </pchar>
      <pchar>
         <unreserved>3</unreserved>
      </pchar>
      <pchar>
         <unreserved>3</unreserved>
      </pchar>
      <pchar>
         <unreserved>0</unreserved>
      </pchar>
   </fragment>
</URI>

Editing the VXML file directly to add marks would be wrong, so the stylesheet that converts ABNF XML to VXML takes a marks file as a parameter.

Marks allows you to identify elements in the ABNF XML that should have marks applied. The syntax of the marks file is defined by the src/main/xslt/marks.ixml grammar.

With this marks file applied:

mark rule unreserved with "-"
mark rule pchar with “-”
mark token //char-val[. = ('/', ':', '//')] with “-”
mark token /rulelist/rule[rulename = 'URI']//char-val with ‘-’

The resulting URI parse is much less “noisy”:

<URI>
   <scheme>https</scheme>
   <hier-part>
      <authority>
         <host>
            <reg-name>balisage.net</reg-name>
         </host>
      </authority>
      <path-abempty>
         <segment>2023</segment>
         <segment>Program.html</segment>
      </path-abempty>
   </hier-part>
   <fragment>W1330</fragment>
</URI>

The start symbol

In iXML, the first rule defined in a grammar must be used as the start symbol. That’s not always convenient for grammars converted from ABNF. The grammar for URIs, for example, includes several nonterminals not reachable from the first rule in the grammar. Using the converted grammar directly, you can’t parse an absolute path such as /uri/path.

CoffeePot provides a --start-symbol option that allows you to select an alternate start symbol:

coffeepot -g:uri.vxml --start-symbol:path --pretty-print /uri/path

This will produce

<path xmlns:ixml='http://invisiblexml.org/NS' ixml:state='ambiguous'>
   <path-absolute>
      <segment-nz>uri</segment-nz>
      <segment>path</segment>
   </path-absolute>
</path>

Specifying an alternate start symbol is nonconformant behavior. It might be better to provide a new top-level nonterminal that makes all of the possibilities reachable.

About

A tool for converting IETF ABNF grammars to Invisible XML

Topics

Resources

License

Stars

Watchers

Forks

Languages