GitHub - milasudril/coin: Compact Infoset Notation

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.gitignore		.gitignore
LICENSE		LICENSE
coin2xml.cpp		coin2xml.cpp
dombuilder.hpp		dombuilder.hpp
dombuilder_test.cpp		dombuilder_test.cpp
element.hpp		element.hpp
element_test.cpp		element_test.cpp
elementindex.hpp		elementindex.hpp
errorpolicy.hpp		errorpolicy.hpp
input.hpp		input.hpp
lexercoin.hpp		lexercoin.hpp
maikeconfig.json		maikeconfig.json
node.hpp		node.hpp
parser.hpp		parser.hpp
processstatus.hpp		processstatus.hpp
readme.coin		readme.coin
statemachine.dot		statemachine.dot
tag.hpp		tag.hpp
token.hpp		token.hpp
writerplaintext.hpp		writerplaintext.hpp
writerxml.hpp		writerxml.hpp

Repository files navigation

{!:This file demonstrates the CoIN file format}{html:
	{head:
		{meta@charset=UTF-8:}
		{title:CoIN: Compact Infoset Notation}
	}
	{body:
		{h1:Compact Infoset Notation}
		{p:Compact Infoset Notation, or CoIN, is a way of writing XML-like documents in a more compact form. CoIN allows smaller, and more readable XML files to be written, while maintaining similar markup features. The following table shows a comparison between CoIN and XML.}

		{table:
			{tr:{th:XML}{:CoIN}}
			{!:A tag name can be ommited to repeat the previous tag in the same level. Comments do not affect the previous tag.}
			{:{td:{code:<element>content</element>}}                               {:{code:\{element:{var:content}\}}}}
			{:{td:{code:<element foo="a" bar="b">content</element>}}               {:{code:\{element@foo=a;bar=b:{var:content}\}}}}
			{:{td:{code:<element />}}                                              {:{code:\{element:\}}}}
			{:{td:{code:<element foo="1">a</element><element foo="1">b</element>}} {:{code:\{element@foo=1:a\}\{:b\}}}}
			{:{td:{code:<!--comment-->}}                                           {:{code:\{!:comment\}}}}
			{:{td:{code:<![CDATA[character data]]>}}                               {:Not needed, use the general escape character}}
			{:{td:Different escape sequences {code:&lt;} et.al.}                   {:{code:\\{var:reserved character}}}}
			{:{td:Custum character entities}                                       {:Feature not present}}
			{:{td:Stylesheet or DTD references}                                    {:Feature not present}}
		}

		{p:There are three differences between CoIN and XML that is worth mentioning. Firstly, CoIN uses braces as tag markers instead of < and >. Also, CoIN takes advantage of the fact that a "well formed" document must not have interleaved elements, and uses an end brace to terminate the current element. The content of the element is written between {code::} and the end brace ({code:\}}). This results in smaller, and less verbose documents. The third difference is that CoIN uses a general escape mechanism instead of various escape constructs.}

		{p:Not repeating the element name for the end tag saves some space. In CoIN, it is also possible to use element repetiotion, in case the previous tag is identical (this includes its name {em:and} its attributes) to the current tag.}

		{h2:Compatiblity notes}
		{p:By construction, CoIN syntax allows for arbitrary tag and attribute names. It is however good practice to keep tag and attribute names valid XML. A future reference implementation may issue a warning when a name would not be valid XML.}

		{p:Whitespaces at begin of an input file are ignored. Otherwise, this implementation does not do any whitespace normailzation. One reason for this is that no whitespace character is ever used as token delimiter. Another reason is that it implies that a set of rules must be written, and also it may introduce some limitation on the storage capabilites: In particular: In CoIN there is no problem to store a newline character inside an attribtue.}

		{h2:Rules for writing CoIN documents}

		{p:While CoIN is quite liberal, there are some rules that must be followed, in order for a CoIN document to be valid.}

		{ul:
			{li:{p:An attribute list cannot be empty. This means that the following is illegal:}
				{pre:\{foo@:\}}
				{p:{strong:Reason:} It looks ugly and is redundant}
			}
			{li:An attribute name must not be empty}
			{:{p:Element begin and end markers must be escaped, also where they are not used as part of the syntax i.e. within tags:}
				{:{strong:Legal:}   {code:\{foo@\\\{bar\\\}=\\\{hello\\\}:\}}}
				{:{strong:Illegal:} {code:\{foo@\{bar\}=\{hello\}:\}}}
				{:{strong:Reason:} It would have meant that a new element begins within the attribute list, which is not possible.}
			}
			{:{p:An empty element must have a content separator. This means that the following is illegal:}
				{pre:\{foo\}}
				{p:{strong:Reason:} Support for this notation requiers extra rules in the tokenizer}
			}
		}
	}
}