Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Fetching contributors…

Cannot retrieve contributors at this time

11646 lines (10532 sloc) 429.287 kb
<?xml version="1.0"?>
<!DOCTYPE book
PUBLIC "-//OASIS//DTD DocBook V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<book>
<bookinfo>
<title>
YAML&nbsp;Ain<q/>t&nbsp;Markup&nbsp;Language&nbsp;(<trademark>YAML</trademark>)
Version&nbsp;1.2
</title>
<subtitle>
Working&nbsp;Draft&nbsp;2009-03-20
</subtitle>
<authorgroup>
<author>
<firstname>Oren</firstname>
<surname>Ben-Kiki</surname>
<email>oren@ben-kiki.org</email>
</author>
<author>
<firstname>Clark</firstname>
<surname>Evans</surname>
<email>cce@clarkevans.com</email>
</author>
<author>
<firstname>Ingy</firstname>
<surname>d&ouml;t Net</surname>
<email>ingy@ttul.org</email>
</author>
</authorgroup>
<copyright>
<year>2001-2009</year>
<holder>Oren Ben-Kiki<fo>,&nbsp;</fo></holder>
<holder>Clark Evans<fo>,&nbsp;</fo></holder>
<holder>Ingy d&ouml;t Net</holder>
</copyright>
<legalnotice>
<fo>&nbsp;<sbr/>&nbsp;<sbr/></fo>
This document may be freely copied, provided it is not modified.
</legalnotice>
<releaseinfo id="releaseinfo">
<emphasis>This version:</emphasis><sbr/>
&nbsp;&nbsp;HTML: <ulink url="http://yaml.org/spec/cvs/current.html"/><sbr/>
&nbsp;&nbsp;PDF: <ulink url="http://yaml.org/spec/cvs/current.pdf"/><sbr/>
&nbsp;&nbsp;PS: <ulink url="http://yaml.org/spec/cvs/current.ps"/>
</releaseinfo>
<abstract>
<title>Status of this Document</title>
<para>
This specification is a "last call for comments" prior to finalizing the
YAML specification. It reflects the consensus reached by members of the
yaml-core mailing list at <ulink
url="http://lists.sourceforge.net/lists/listinfo/yaml-core"/>. Any
questions regarding this draft should be raised on this list.
</para>
<para>
We wish to thank implementers, who have tirelessly tracked earlier
versions of this specification, as well as our fabulous user community
whose feedback has both validated and clarified our direction. We ask
them to review this document for any last minute corrections.
</para>
</abstract>
<abstract>
<title>Abstract</title>
<para>
<trademark>YAML</trademark> (rhymes with <quote>camel</quote>) is a
human-friendly, cross language, Unicode based data serialization
language designed around the common native data types of agile
programming languages. It is broadly useful for programming needs
ranging from configuration files to Internet messaging to object
persistence to data auditing. Together with the <ulink
url="http://www.unicode.org/">Unicode standard for characters</ulink>,
this specification provides all the information necessary to understand
YAML Version 1.2 and to create programs that process YAML information.
</para>
</abstract>
</bookinfo>
<chapter id="Introduction">
<title>Introduction</title>
<para>
<quote>YAML Ain<q/>t Markup Language</quote> (abbreviated YAML) is a data
serialization language designed to be human-friendly and work well with
modern programming languages for common everyday tasks. This
specification is both an introduction to the YAML language and the
concepts supporting it, and also a complete specification of the
information needed to develop <refterm
primary="application">applications</refterm> for processing YAML.
</para>
<para>
Open, interoperable and readily understandable tools have advanced
computing immensely. YAML was designed from the start to be useful and
friendly to people working with data. It uses Unicode <refterm
primary="printable character">printable</refterm> characters, <refterm
primary="indicator">some</refterm> of which provide structural
information and the rest containing the data itself. YAML achieves a
unique cleanness by minimizing the amount of structural characters and
allowing the data to show itself in a natural and meaningful way. For
example, <refterm primary="space"
secondary="indentation">indentation</refterm> may be used for structure,
<refterm primary=": mapping value">colons</refterm> separate <refterm
primary="key: value pair" >key:&nbsp;value pairs</refterm>, and <refterm
primary="- block sequence entry">dashes</refterm> are used to create
<quote>bullet</quote> <refterm primary="sequence">lists</refterm>.
</para>
<para>
There are myriad flavors of <refterm primary="native data structure">data
structures</refterm>, but they can all be adequately <refterm
primary="represent">represented</refterm> with three basic primitives:
<refterm primary="mapping">mappings</refterm> (hashes/dictionaries),
<refterm primary="sequence">sequences</refterm> (arrays/lists) and
<refterm primary="scalar">scalars</refterm> (strings/numbers). YAML
leverages these primitives, and adds a simple typing system and <refterm
primary="alias">aliasing</refterm> mechanism to form a complete language
for <refterm primary="serialize">serializing</refterm> any <refterm
primary="native data structure">native data structure</refterm>. While
most programming languages can use YAML for data serialization, YAML
excels in working with those languages that are fundamentally built
around the three basic primitives. These include the new wave of agile
languages such as Perl, Python, PHP, Ruby, and Javascript.
</para>
<para>
There are hundreds of different languages for programming, but only a
handful of languages for storing and transferring data. Even though its
potential is virtually boundless, YAML was specifically created to work
well for common use cases such as: configuration files, log files,
interprocess messaging, cross-language data sharing, object persistence,
and debugging of complex data structures. When data is easy to view and
understand, programming becomes a simpler task.
</para>
<sect1>
<title>Goals</title>
<para>
The design goals for YAML are, in decreasing priority:
</para>
<orderedlist>
<listitem>
YAML is easily readable by humans.
</listitem>
<listitem>
YAML matches the <refterm primary="native data structure">native data
structures</refterm> of agile languages.
</listitem>
<listitem>
YAML data is portable between programming languages.
</listitem>
<listitem>
YAML has a consistent model to support generic tools.
</listitem>
<listitem>
YAML supports one-pass processing.
</listitem>
<listitem>
YAML is expressive and extensible.
</listitem>
<listitem>
YAML is easy to implement and use.
</listitem>
</orderedlist>
</sect1>
<sect1>
<title>Prior Art</title>
<keep-together>
<para>
YAML<q/>s initial direction was set by the data serialization and
markup language discussions among <ulink
url="http://www.docuverse.com/smldev/">SML-DEV members</ulink>. Later
on, it directly incorporated experience from Ingy d&ouml;t Net<q/>s
Perl module <ulink
url="http://search.cpan.org/doc/INGY/Data-Denter-0.13/Denter.pod"
>Data::Denter</ulink>. Since then, YAML has matured through ideas and
support from its user community.
</para>
<para>
YAML integrates and builds upon concepts described by <ulink
url="http://cm.bell-labs.com/cm/cs/cbook/index.html">C</ulink>,
<ulink url="http://java.sun.com/">Java</ulink>, <ulink
url="http://www.perl.org/">Perl</ulink>, <ulink
url="http://www.python.org/">Python</ulink>, <ulink
url="http://www.ruby-lang.org/">Ruby</ulink>, <ulink
url="http://www.ietf.org/rfc/rfc0822.txt">RFC0822</ulink> (MAIL),
<ulink
url="http://www.ics.uci.edu/pub/ietf/html/rfc1866.txt">RFC1866</ulink>
(HTML), <ulink
url="http://www.ietf.org/rfc/rfc2045.txt">RFC2045</ulink> (MIME),
<ulink url="http://www.ietf.org/rfc/rfc2396.txt">RFC2396</ulink>
(URI), <ulink url="http://www.w3.org/TR/REC-xml.html">XML</ulink>,
<ulink url="http://www.saxproject.org/">SAX</ulink>, <ulink
url="http://www.w3.org/TR/SOAP">SOAP</ulink>, and <ulink
url="http://www.json.org/">JSON</ulink>.
</para>
<para>
The syntax of YAML was motivated by Internet Mail (RFC0822) and
remains partially compatible with that standard. Further, borrowing
from MIME (RFC2045), YAML<q/>s top-level production is a <refterm
primary="stream">stream</refterm> of independent <refterm
primary="document">documents</refterm>, ideal for message-based
distributed processing systems.
</para>
<para>
YAML<q/>s <refterm primary="space"
secondary="indentation">indentation</refterm>-based scoping is
similar to Python<q/>s (without the ambiguities caused by <refterm
primary="tab">tabs</refterm>). <refterm primary="style"
secondary="block">Indented blocks</refterm> facilitate easy
inspection of the data<q/>s structure. YAML<q/>s <refterm
primary="style" secondary="block" tertiary="literal">literal
style</refterm> leverages this by enabling formatted text to be
cleanly mixed within an <refterm primary="space"
secondary="indentation">indented</refterm> structure without
troublesome <refterm primary="escaping" secondary="in double-quoted
scalars">escaping</refterm>. YAML also allows the use of traditional
<refterm primary="indicator">indicator</refterm>-based scoping
similar to JSON<q/>s and Perl<q/>s. Such <refterm primary="style"
secondary="flow">flow content</refterm> can be freely nested inside
<refterm primary="style" secondary="block">indented blocks</refterm>.
</para>
<para>
YAML<q/>s <refterm primary="style" secondary="flow"
tertiary="double-quoted">double-quoted style</refterm> uses familiar
C-style <refterm primary="escaping" secondary="in double-quoted
scalars">escape sequences</refterm>. This enables ASCII encoding of
non-<refterm primary="printable character">printable</refterm> or
8-bit (ISO 8859-1) characters such as <link
linkend="ns-esc-8-bit"><uquote>\x3B</uquote></link>. Non-<refterm
primary="printable character">printable</refterm> 16-bit Unicode and
32-bit (ISO/IEC 10646) characters are supported with <refterm
primary="escaping" secondary="in double-quoted scalars">escape
sequences</refterm> such as <link
linkend="ns-esc-16-bit"><uquote>\u003B</uquote></link> and <link
linkend="ns-esc-32-bit"><uquote>\U0000003B</uquote></link>.
</para>
<para>
Motivated by HTML<q/>s end-of-line normalization, YAML<q/>s <refterm
primary="line folding">line folding</refterm> employs an intuitive
method of handling <refterm primary="line break">line
breaks</refterm>. A single <refterm primary="line break">line
break</refterm> is <refterm primary="line folding">folded</refterm>
into a single <refterm primary="space">space</refterm>, while
<refterm primary="empty line">empty lines</refterm> are interpreted
as <refterm primary="line break">line break</refterm> characters.
This technique allows for paragraphs to be word-wrapped without
affecting the <refterm primary="scalar" secondary="canonical
form">canonical form</refterm> of the <refterm
primary="scalar">scalar content</refterm>.
</para>
<para>
YAML<q/>s core type system is based on the requirements of agile
languages such as Perl, Python, and Ruby. YAML directly supports both
<refterm primary="collection">collections</refterm> (<refterm
primary="mapping">mappings</refterm>, <refterm
primary="sequence">sequences</refterm>) and <refterm
primary="scalar">scalars</refterm>. Support for these common types
enables programmers to use their language<q/>s <refterm
primary="native data structure">native data structures</refterm> for
YAML manipulation, instead of requiring a special document object
model (DOM).
</para>
<para>
Like XML<q/>s SOAP, YAML supports <refterm
primary="serialize">serializing</refterm> a graph of <refterm
primary="native data structure">native data structures</refterm>
through an <refterm primary="alias">aliasing</refterm> mechanism.
Also like SOAP, YAML provides for <refterm
primary="application">application</refterm>-defined <refterm
primary="tag">types</refterm>. This allows YAML to <refterm
primary="represent">represent</refterm> rich data structures required
for modern distributed computing. YAML provides globally unique
<refterm primary="tag" secondary="global">type names</refterm> using
a namespace mechanism inspired by Java<q/>s DNS-based package naming
convention and XML<q/>s URI-based namespaces. In addition, YAML
allows for private <refterm primary="tag"
secondary="local">types</refterm> specific to a single <refterm
primary="application">application</refterm>.
</para>
<para>
YAML was designed to support incremental interfaces that include both
input (<uquote>getNextEvent()</uquote>) and output
(<uquote>sendNextEvent()</uquote>) one-pass interfaces. Together,
these enable YAML to support the processing of large <refterm
primary="document">documents</refterm> (e.g. transaction logs) or
continuous <refterm primary="stream">streams</refterm> (e.g. feeds
from a production machine).
</para>
</keep-together>
</sect1>
<sect1>
<title>Relation to XML</title>
<para>
Newcomers to YAML often search for its correlation to the eXtensible
Markup Language (XML). Although the two languages may actually compete
in several application domains, there is no direct correlation between
them.
</para>
<para>
YAML is primarily a data serialization language. XML was designed to be
backwards compatible with the Standard Generalized Markup Language
(SGML), which was designed to support structured documentation. XML
therefore had many design constraints placed on it that YAML does not
share. XML is a pioneer in many domains, YAML is the result of lessons
learned from XML and other technologies.
</para>
<para>
It should be mentioned that there are ongoing efforts to define
standard XML/YAML mappings. This generally requires that a subset of
each language be used. For more information on using both XML and YAML,
please visit <ulink url="http://yaml.org/xml"/>.
</para>
</sect1>
<sect1>
<title>Relation to JSON</title>
<para>
Both JSON and YAML aim to be human readable data interchange formats.
However, JSON and YAML have different priorities. JSON<q/>s foremost
design goal is simplicity and universality. Thus, JSON is trivial to
generate and parse, at the cost of reduced human readability. It also
uses a lowest common denominator information model, ensuring any JSON
data can be easily processed by every modern programming environment.
</para>
<para>
In contrast, YAML<q/>s foremost design goals are human readability and
support for <refterm primary="serialize">serializing</refterm>
arbitrary <refterm primary="native data structure">native data
structures</refterm>. Thus, YAML allows for extremely readable files,
but is more complex to generate and parse. In addition, YAML ventures
beyond the lowest common denominator data types, requiring more complex
processing when crossing between different programming environments.
</para>
<para>
YAML can therefore be viewed as a natural superset of JSON, offering
improved human readability and a more complete information model. This
is also the case in practice; every JSON file is also a valid YAML
file. This makes it easy to migrate from JSON to YAML if/when the
additional features are required.
</para>
<para>
It may be useful to define a intermediate format between YAML and JSON.
Such a format would be trivial to parse (but not very human readable),
like JSON. At the same time, it would allow for <refterm
primary="serialize">serializing</refterm> arbitrary <refterm
primary="native data structure">native data structures</refterm>, like
YAML. Such a format might also serve as YAML<q/>s "canonical format".
</para>
<para>
Defining such a <quote>YSON</quote> format (YSON is a Serialized Object
Notation) can be done either by enhancing the JSON specification or by
restricting the YAML specification. Such a definition is beyond the
scope of this specification.
</para>
</sect1>
<sect1>
<title>Terminology</title>
<keep-together>
<para>
This specification uses key words based on <ulink
url="http://www.ietf.org/rfc/rfc2119.txt">RFC2119</ulink> to indicate
requirement level. In particular, the following words are used to
describe the actions of a YAML <refterm
primary="processor">processor</refterm>:
</para>
<variablelist>
<varlistentry>
<term>May</term>
<listitem>
The word <emphasis>may</emphasis>, or the adjective
<emphasis>optional</emphasis>, mean that conforming YAML <refterm
primary="processor">processors</refterm> are permitted to, but
<defterm primary="need not">need not</defterm> behave as
described.
</listitem>
</varlistentry>
<varlistentry>
<term>Should</term>
<listitem>
The word <emphasis>should</emphasis>, or the adjective
<emphasis>recommended</emphasis>, mean that there could be
reasons for a YAML <refterm
primary="processor">processor</refterm> to deviate from the
behavior described, but that such deviation could hurt
interoperability and should therefore be advertised with
appropriate notice.
</listitem>
</varlistentry>
<varlistentry>
<term>Must</term>
<listitem>
The word <emphasis>must</emphasis>, or the term <defterm
primary="required">required</defterm> or <defterm
primary="shall">shall</defterm>, mean that the behavior described
is an absolute requirement of the specification.
</listitem>
</varlistentry>
</variablelist>
</keep-together>
</sect1>
<para>
The rest of this document is arranged as follows. Chapter <link
linkend="Preview">2</link> provides a short preview of the main YAML
features. Chapter <link linkend="Processing">3</link> describes the YAML
information model, and the processes for converting from and to this
model and the YAML text format. The bulk of the document, chapters <link
linkend="Syntax">4</link> through <link linkend="YAML">9</link>, formally
define this text format. Finally, chapter <link
linkend="Syntax">10</link> recommends basic YAML schemas.
</para>
</chapter>
<chapter id="Preview">
<title>Preview</title>
<para>
This section provides a quick glimpse into the expressive power of YAML.
It is not expected that the first-time reader grok all of the examples.
Rather, these selections are used as motivation for the remainder of the
specification.
</para>
<sect1>
<title>Collections</title>
<keep-together>
<para>
YAML<q/>s <refterm primary="style" secondary="block"
tertiary="collection">block collections</refterm> use <refterm
primary="space" secondary="indentation">indentation</refterm> for
scope and begin each entry on its own line. <refterm primary="style"
secondary="block" tertiary="sequence">Block sequences</refterm>
indicate each entry with a dash and space&nbsp;( <refterm primary="-
block sequence entry"><uquote>-&nbsp;</uquote></refterm>). <refterm
primary="mapping">Mappings</refterm> use a colon and
space&nbsp;(<refterm primary=": mapping
value"><uquote>:&nbsp;</uquote></refterm>) to mark each <refterm
primary="key: value pair">key:&nbsp;value pair</refterm>.
</para>
<simplelist type="horiz" columns="2">
<member>
<example>
<title>
Sequence of Scalars<sbr/>
(ball players)
</title>
<programlisting>- Mark McGwire<sbr/>
- Sammy Sosa
- Ken Griffey
</programlisting>
</example>
</member>
<member>
<example>
<title>
Mapping Scalars to Scalars<sbr/>
(player statistics)
</title>
<programlisting>hr: 65 # Home runs<sbr/>
avg: 0.278 # Batting average
rbi: 147 # Runs Batted In
</programlisting>
</example>
</member>
<member>
<example>
<title>
Mapping Scalars to Sequences<sbr/>
(ball clubs in each league)
</title>
<programlisting>american:<sbr/>
- Boston Red Sox
- Detroit Tigers
- New York Yankees
national:
- New York Mets
- Chicago Cubs
- Atlanta Braves
</programlisting>
</example>
</member>
<member>
<example>
<title>
Sequence of Mappings<sbr/>
(players<q/> statistics)
</title>
<programlisting>-<sbr/>
name: Mark McGwire
hr: 65
avg: 0.278
-
name: Sammy Sosa
hr: 63
avg: 0.288
</programlisting>
</example>
</member>
</simplelist>
</keep-together>
<keep-together>
<para>
YAML also has <refterm primary="style" secondary="flow">flow
styles</refterm>, using explicit <refterm
primary="indicator">indicators</refterm> rather than <refterm
primary="space" secondary="indentation">indentation</refterm> to
denote scope. The <refterm primary="style" secondary="flow"
tertiary="sequence">flow sequence</refterm> is written as a <refterm
primary=", end flow entry">comma</refterm> separated list within
<refterm primary="[ start flow sequence">square</refterm> <refterm
primary="] end flow sequence">brackets</refterm>. In a similar
manner, the <refterm primary="style" secondary="flow"
tertiary="mapping">flow mapping</refterm> uses <refterm primary="{
start flow mapping">curly</refterm> <refterm primary="} end flow
mapping">braces</refterm>.
</para>
<simplelist type="horiz" columns="2">
<member>
<example>
<title>Sequence of Sequences</title>
<programlisting>- [name , hr, avg ]<sbr/>
- [Mark McGwire, 65, 0.278]
- [Sammy Sosa , 63, 0.288]
</programlisting>
</example>
</member>
<member>
<example>
<title>Mapping of Mappings</title>
<programlisting>Mark McGwire: {hr: 65, avg: 0.278}<sbr/>
Sammy Sosa: {
hr: 63,
avg: 0.288
}
</programlisting>
</example>
</member>
</simplelist>
</keep-together>
</sect1>
<sect1>
<title>Structures</title>
<keep-together>
<para>
YAML uses three dashes&nbsp;(<refterm primary="marker"
secondary="directives end"><uquote>---</uquote></refterm>) to
separate <refterm primary="directive">directives</refterm> from
<refterm primary="document">document</refterm> <refterm
primary="content">content</refterm>. This also serves to signal the
start of a document if no <refterm
primary="directive">directives</refterm> are present. Three
dots&nbsp;( <refterm primary="marker" secondary="document
end"><uquote>...</uquote></refterm>) indicate the end of a document
without starting a new one, for use in communication channels.
<refterm primary="comment">Comments</refterm> begin with an
octothorpe (also called a <quote>hash</quote>, <quote>sharp</quote>,
<quote>pound</quote>, or <quote>number sign</quote> - <refterm
primary="# comment"> <uquote>#</uquote></refterm>).
</para>
<simplelist type="horiz" columns="2">
<member>
<example>
<title>
Two Documents in a Stream<sbr/>
(each with a leading comment)
</title>
<programlisting># Ranking of 1998 home runs<sbr/>
---
- Mark McGwire
- Sammy Sosa
- Ken Griffey
# Team ranking
---
- Chicago Cubs
- St Louis Cardinals
</programlisting>
</example>
</member>
<member>
<example>
<title>
Play by Play Feed<sbr/>
from a Game
</title>
<programlisting>---<sbr/>
time: 20:03:20
player: Sammy Sosa
action: strike (miss)
...
---
time: 20:03:47
player: Sammy Sosa
action: grand slam
...
</programlisting>
</example>
</member>
</simplelist>
</keep-together>
<keep-together>
<para>
Repeated <refterm primary="node">nodes</refterm> (objects) are first
<refterm primary="alias" secondary="identified">identified</refterm>
by an <refterm primary="anchor">anchor</refterm> (marked with the
ampersand&nbsp;-&nbsp;<refterm primary="&amp;
anchor"><uquote>&amp;</uquote></refterm>), and are then <refterm
primary="alias">aliased</refterm> (referenced with an
asterisk&nbsp;-&nbsp;<refterm primary="*
alias"><uquote>*</uquote></refterm>) thereafter.
</para>
<simplelist type="horiz" columns="2">
<member>
<example>
<title>
Single Document with<sbr/>
Two Comments
</title>
<programlisting>---<sbr/>
hr: # 1998 hr ranking
- Mark McGwire
- Sammy Sosa
rbi:
# 1998 rbi ranking
- Sammy Sosa
- Ken Griffey
</programlisting>
</example>
</member>
<member>
<example>
<title>
Node for <uquote>Sammy Sosa</uquote><sbr/>
appears twice in this document
</title>
<programlisting>---<sbr/>
hr:
- Mark McGwire
# Following node labeled SS
- &amp;SS Sammy Sosa
rbi:
- *SS # Subsequent occurrence
- Ken Griffey
</programlisting>
</example>
</member>
</simplelist>
</keep-together>
<keep-together>
<para>
A question mark and space&nbsp;(<refterm primary="? mapping
key"><uquote>?&nbsp;</uquote></refterm>) indicate a complex <refterm
primary="mapping">mapping</refterm> <refterm
primary="key">key</refterm>. Within a <refterm primary="style"
secondary="block" tertiary="collection">block collection</refterm>,
<refterm primary="key: value pair">key:&nbsp;value pairs</refterm>
can start immediately following the <refterm primary="- block
sequence entry">dash</refterm>, <refterm primary=": mapping
value">colon</refterm>, or <refterm primary="? mapping key">question
mark</refterm>.
</para>
<simplelist type="horiz" columns="2">
<member>
<example>
<title>Mapping between Sequences</title>
<programlisting>? - Detroit Tigers<sbr/>
- Chicago cubs
:
- 2001-07-23
? [ New York Yankees,
Atlanta Braves ]
: [ 2001-07-02, 2001-08-12,
2001-08-14 ]
</programlisting>
</example>
</member>
<member>
<example>
<title>In-Line Nested Mapping</title>
<programlisting>---<sbr/>
# Products purchased
- item : Super Hoop
quantity: 1
- item : Basketball
quantity: 4
- item : Big Shoes
quantity: 1
</programlisting>
</example>
</member>
</simplelist>
</keep-together>
</sect1>
<sect1>
<title>Scalars</title>
<keep-together>
<para>
<refterm primary="scalar">Scalar content</refterm> can be written in
<refterm primary="style" secondary="block">block</refterm> form,
using a <refterm primary="style" secondary="block"
tertiary="literal">literal style</refterm>&nbsp;(indicated by
<refterm primary="| literal style"><uquote>|</uquote></refterm>)
where all <refterm primary="line break">line breaks</refterm> are
significant. Alternatively, they can be written with the <refterm
primary="style" secondary="block" tertiary="folded">folded
style</refterm>&nbsp;<refterm primary="&gt; folded style">(denoted by
<uquote>&gt;</uquote></refterm>) where each <refterm primary="line
break">line break</refterm> is <refterm primary="line
folding">folded</refterm> to a <refterm
primary="space">space</refterm> unless it ends an <refterm
primary="empty line">empty</refterm> or a <refterm
primary="more-indented">more-indented</refterm> line.
</para>
<simplelist type="horiz" columns="2">
<member>
<example>
<title>
In literals,<sbr/>
newlines are preserved
</title>
<programlisting># ASCII Art<sbr/>
--- |
\//||\/||
// || ||__
</programlisting>
</example>
</member>
<member>
<example>
<title>
In the folded scalars,<sbr/>
newlines become spaces
</title>
<programlisting>--- &gt;<sbr/>
Mark McGwire's
year was crippled
by a knee injury.
</programlisting>
</example>
</member>
<member>
<example>
<title>
Folded newlines are preserved<sbr/>
for "more indented" and blank lines
</title>
<programlisting>&gt;<sbr/>
Sammy Sosa completed another
fine season with great stats.
63 Home Runs
0.288 Batting Average
What a year!
</programlisting>
</example>
</member>
<member>
<example>
<title>
Indentation determines scope<sbr/>
&nbsp;
</title>
<programlisting>name: Mark McGwire<sbr/>
accomplishment: &gt;
Mark set a major league
home run record in 1998.
stats: |
65 Home Runs
0.278 Batting Average
</programlisting>
</example>
</member>
</simplelist>
</keep-together>
<keep-together>
<para>
YAML<q/>s <refterm primary="style" secondary="flow"
tertiary="scalar">flow scalars</refterm> include the <refterm
primary="style" secondary="flow" tertiary="plain">plain
style</refterm> (most examples thus far) and two quoted styles. The
<refterm primary="style" secondary="flow"
tertiary="double-quoted">double-quoted style</refterm> provides
<refterm primary="escaping" secondary="in double-quoted
scalars">escape sequences</refterm>. The <refterm primary="style"
secondary="flow" teriary="single-quoted">single-quoted
style</refterm> is useful when <refterm primary="escaping"
secondary="in double-quoted scalars">escaping</refterm> is not
needed. All <refterm primary="style" secondary="flow"
tertiary="scalar">flow scalars</refterm> can span multiple lines;
<refterm primary="line break">line breaks</refterm> are always
<refterm primary="line folding">folded</refterm>.
</para>
<simplelist type="horiz" columns="2">
<member>
<example>
<title>Quoted Scalars</title>
<programlisting>unicode: "Sosa did fine.\u263A"<sbr/>
control: "\b1998\t1999\t2000\n"
hex esc: "\x0d\x0a is \r\n"
single: '"Howdy!" he cried.'
quoted: ' # Not a ''comment''.'
tie-fighter: '|\-*-/|'
</programlisting>
</example>
</member>
<member>
<example>
<title>Multi-line Flow Scalars</title>
<programlisting>plain:<sbr/>
This unquoted scalar
spans many lines.
quoted: "So does this
quoted scalar.\n"
</programlisting>
</example>
</member>
</simplelist>
</keep-together>
</sect1>
<sect1>
<title>Tags</title>
<keep-together>
<para>
In YAML, <refterm primary="tag" secondary="non-specific">untagged
nodes</refterm> are given a type depending on the <refterm
primary="application">application</refterm>. The examples in this
specification generally use the <refterm primary="tag"
secondary="repository" tertiary="seq">
<userinput>seq</userinput></refterm>, <refterm primary="tag"
secondary="repository" tertiary="map">
<userinput>map</userinput></refterm> and <refterm primary="tag"
secondary="repository" tertiary="str">
<userinput>str</userinput></refterm> types from the <refterm
primary="schema" secondary="failsafe">fail safe schema</refterm>. A
few examples also use the <refterm primary="tag"
secondary="repository"
tertiary="int"><userinput>int</userinput></refterm>, <refterm
primary="tag" secondary="repository"
tertiary="float"><userinput>float</userinput></refterm>, and <refterm
primary="tag" secondary="repository"
tertiary="null"><userinput>null</userinput></refterm> types from the
<refterm primary="schema" secondary="JSON">JSON schema</refterm>. The
<refterm primary="tag" secondary="repository">repository</refterm>
includes additional types such as <ulink
url="http://yaml.org/type/binary.html"
><userinput>binary</userinput></ulink>, <ulink
url="http://yaml.org/type/omap.html"><userinput>omap</userinput></ulink>,
<ulink
url="http://yaml.org/type/set.html"><userinput>set</userinput></ulink>
and others.
</para>
<simplelist type="horiz" columns="2">
<member>
<example>
<title>Integers</title>
<programlisting>canonical: 12345<sbr/>
decimal: +12345
octal: 014
hexadecimal: 0xC
</programlisting>
</example>
</member>
<member>
<example>
<title>Floating Point</title>
<programlisting>canonical: 1.23015e+3<sbr/>
exponential: 12.3015e+02
fixed: 1230.15
negative infinity: -.inf
not a number: .NaN
</programlisting>
</example>
</member>
<member>
<example>
<title>Miscellaneous</title>
<programlisting>null:<sbr/>
true: boolean
false: boolean
string: '12345'
</programlisting>
</example>
</member>
<member>
<example>
<title>Timestamps</title>
<programlisting>canonical: 2001-12-15T02:59:43.1Z<sbr/>
iso8601: 2001-12-14t21:59:43.10-05:00
spaced: 2001-12-14 21:59:43.10 -5
date: 2002-12-14
</programlisting>
</example>
</member>
</simplelist>
</keep-together>
<keep-together>
<para>
Explicit typing is denoted with a <refterm
primary="tag">tag</refterm> using the exclamation point (<refterm
primary="! tag indicator"><uquote>!</uquote></refterm>) symbol.
<refterm primary="tag" secondary="global">Global tags</refterm> are
URIs and may be specified in a <refterm primary="tag"
secondary="shorthand">tag shorthand</refterm> form using a <refterm
primary="tag" secondary="handle">handle</refterm>. <refterm
primary="application">Application</refterm>-specific <refterm
primary="tag" secondary="local">local tags</refterm> may also be
used.
</para>
<simplelist type="horiz" columns="2">
<member>
<example>
<title>Various Explicit Tags</title>
<programlisting>---<sbr/>
not-date: !!str 2002-04-28
picture: !!binary |
R0lGODlhDAAMAIQAAP//9/X
17unp5WZmZgAAAOfn515eXv
Pz7Y6OjuDg4J+fn5OTk6enp
56enmleECcgggoBADs=
application specific tag: !something |
The semantics of the tag
above may be different for
different documents.
</programlisting>
</example>
</member>
<member>
<example>
<title>Global Tags</title>
<programlisting>%TAG ! tag:clarkevans.com,2002:<sbr/>
--- !shape
# Use the ! handle for presenting
# tag:clarkevans.com,2002:circle
- !circle
center: &amp;ORIGIN {x: 73, y: 129}
radius: 7
- !line
start: *ORIGIN
finish: { x: 89, y: 102 }
- !label
start: *ORIGIN
color: 0xFFEEBB
text: Pretty vector drawing.
</programlisting>
</example>
</member>
</simplelist>
</keep-together>
<keep-together>
<simplelist type="horiz" columns="2">
<member>
<example>
<title>Unordered Sets</title>
<programlisting># Sets are represented as a<sbr/>
# Mapping where each key is
# associated with a null value
--- !!set
? Mark McGwire
? Sammy Sosa
? Ken Griff
</programlisting>
</example>
</member>
<member>
<example>
<title>Ordered Mappings</title>
<programlisting># Ordered maps are represented as<sbr/>
# A sequence of mappings, with
# each mapping having one key
--- !!omap
- Mark McGwire: 65
- Sammy Sosa: 63
- Ken Griffy: 58
</programlisting>
</example>
</member>
</simplelist>
</keep-together>
</sect1>
<sect1>
<title>Full Length Example</title>
<keep-together>
<para>
Below are two full-length examples of YAML. On the left is a sample
invoice; on the right is a sample log file.
</para>
<simplelist type="horiz" columns="2">
<member>
<example>
<title>Invoice</title>
<programlisting>--- !&lt;tag:clarkevans.com,2002:invoice&gt;<sbr/>
invoice: 34843
date : 2001-01-23
bill-to: &amp;id001
given : Chris
family : Dumars
address:
lines: |
458 Walkman Dr.
Suite #292
city : Royal Oak
state : MI
postal : 48046
ship-to: *id001
product:
- sku : BL394D
quantity : 4
description : Basketball
price : 450.00
- sku : BL4438H
quantity : 1
description : Super Hoop
price : 2392.00
tax : 251.42
total: 4443.52
comments:
Late afternoon is best.
Backup contact is Nancy
Billsmer @ 338-4338.
</programlisting>
</example>
</member>
<member>
<example>
<title>Log File</title>
<programlisting>---<sbr/>
Time: 2001-11-23 15:01:42 -5
User: ed
Warning:
This is an error message
for the log file
---
Time: 2001-11-23 15:02:31 -5
User: ed
Warning:
A slightly different error
message.
---
Date: 2001-11-23 15:03:17 -5
User: ed
Fatal:
Unknown variable "bar"
Stack:
- file: TopClass.py
line: 23
code: |
x = MoreObject("345\n")
- file: MoreClass.py
line: 58
code: |-
foo = bar
</programlisting>
</example>
</member>
</simplelist>
</keep-together>
</sect1>
</chapter>
<chapter id="Processing">
<title>Processing YAML Information</title>
<para>
YAML is both a text format and a method for <refterm
primary="present">presenting</refterm> any <refterm primary="native data
structure">native data structure</refterm> in this format. Therefore,
this specification defines two concepts: a class of data objects called
YAML <refterm primary="representation">representations</refterm>, and a
syntax for <refterm primary="present">presenting</refterm> YAML <refterm
primary="representation">representations</refterm> as a series of
characters, called a YAML <refterm primary="stream">stream</refterm>. A
YAML <defterm primary="processor">processor</defterm> is a tool for
converting information between these complementary views. It is assumed
that a YAML processor does its work on behalf of another module, called
an <defterm primary="application">application</defterm>. This chapter
describes the information structures a YAML processor must provide to or
obtain from the application.
</para>
<para>
YAML information is used in two ways: for machine processing, and for
human consumption. The challenge of reconciling these two perspectives is
best done in three distinct translation stages: <refterm
primary="representation">representation</refterm>, <refterm
primary="serialization">serialization</refterm>, and <refterm
primary="presentation">presentation</refterm>. <refterm
primary="representation">Representation</refterm> addresses how YAML
views <refterm primary="native data structure">native data
structures</refterm> to achieve portability between programming
environments. <refterm primary="serialization">Serialization</refterm>
concerns itself with turning a YAML <refterm
primary="representation">representation</refterm> into a serial form,
that is, a form with sequential access constraints. <refterm
primary="presentation">Presentation</refterm> deals with the formatting
of a YAML <refterm primary="serialization">serialization</refterm> as a
series of characters in a human-friendly manner.
</para>
<figure>
<title>Processing Overview</title>
<mediaobject>
<imageobject>
<imagedata fileref="overview2.eps" format="eps"/>
</imageobject>
</mediaobject>
</figure>
<para>
A YAML processor need not expose the <refterm
primary="serialization">serialization</refterm> or <refterm
primary="representation">representation</refterm> stages. It may
translate directly between <refterm primary="native data
structure">native data structures</refterm> and a character <refterm
primary="stream">stream</refterm> (<defterm primary="dump">dump</defterm>
and <defterm primary="load">load</defterm> in the diagram above).
However, such a direct translation should take place so that the <refterm
primary="native data structure">native data structures</refterm> are
<refterm primary="construct">constructed</refterm> only from information
available in the <refterm
primary="representation">representation</refterm>.
</para>
<sect1>
<title>Processes</title>
<para>
This section details the processes shown in the diagram above. Note
that a YAML <refterm primary="processor">processor</refterm> need not
provide all these processes. For example, a YAML library may provide
only YAML input ability, for loading configuration files, or only
output ability, for sending data to other <refterm
primary="application">applications</refterm>.
</para>
<sect2>
<title>Representing Native Data Structures</title>
<keep-together>
<para>
YAML <defterm primary="represent">represents</defterm> any <defterm
primary="native data structure">native data structure</defterm> using
three <refterm primary="kind">node kinds</refterm>: <refterm
primary="sequence">sequence</refterm> - an ordered series of entries;
<refterm primary="mapping">mapping</refterm> - an unordered
association of <refterm primary="equality">unique</refterm> <refterm
primary="key">keys</refterm> to <refterm
primary="value">values</refterm>; and <refterm
primary="scalar">scalar</refterm> - any datum with opaque structure
<refterm primary="present">presentable</refterm> as a series of
Unicode characters. Combined, these primitives generate directed
graph structures. These primitives were chosen because they are both
powerful and familiar: the <refterm
primary="sequence">sequence</refterm> corresponds to a Perl array and
a Python list, the <refterm primary="mapping">mapping</refterm>
corresponds to a Perl hash table and a Python dictionary. The
<refterm primary="scalar">scalar</refterm> represents strings,
integers, dates, and other atomic data types.
</para>
<para>
Each YAML <refterm primary="node">node</refterm> requires, in
addition to its <refterm primary="kind">kind</refterm> and <refterm
primary="content">content</refterm>, a <refterm
primary="tag">tag</refterm> specifying its data type. Type specifiers
are either <refterm primary="tag" secondary="global">global</refterm>
URIs, or are <refterm primary="tag" secondary="local">local</refterm>
in scope to a single <refterm
primary="application">application</refterm>. For example, an integer
is represented in YAML with a <refterm
primary="scalar">scalar</refterm> plus the <refterm primary="tag"
secondary="global">global tag</refterm>
<uquote>tag:yaml.org,2002:int</uquote>. Similarly, an invoice object,
particular to a given organization, could be represented as a
<refterm primary="mapping">mapping</refterm> together with the
<refterm primary="tag" secondary="local">local tag</refterm>
<uquote>!invoice</uquote>. This simple model can represent any data
structure independent of programming language.
</para>
</keep-together>
</sect2>
<sect2>
<title>Serializing the Representation Graph</title>
<para>
For sequential access mediums, such as an event callback API, a YAML
<refterm primary="representation">representation</refterm> must be
<defterm primary="serialize">serialized</defterm> to an ordered tree.
Since in a YAML <refterm
primary="representation">representation</refterm>, <refterm
primary="key">mapping keys</refterm> are unordered and <refterm
primary="node">nodes</refterm> may be referenced more than once (have
more than one incoming <quote>arrow</quote>), the serialization
process is required to impose an <refterm primary="key"
secondary="order">ordering</refterm> on the <refterm
primary="key">mapping keys</refterm> and to replace the second and
subsequent references to a given <refterm
primary="node">node</refterm> with place holders called <refterm
primary="alias">aliases</refterm>. YAML does not specify how these
<defterm primary="serialization" secondary="detail">serialization
details</defterm> are chosen. It is up to the YAML <refterm
primary="processor">processor</refterm> to come up with
human-friendly <refterm primary="key" secondary="order">key
order</refterm> and <refterm primary="anchor">anchor</refterm> names,
possibly with the help of the <refterm
primary="application">application</refterm>. The result of this
process, a YAML <refterm primary="serialization">serialization
tree</refterm>, can then be traversed to produce a series of event
calls for one-pass processing of YAML data.
</para>
</sect2>
<sect2>
<title>Presenting the Serialization Tree</title>
<para>
The final output process is <defterm
primary="present">presenting</defterm> the YAML <refterm
primary="serialization">serializations</refterm> as a character
<refterm primary="stream">stream</refterm> in a human-friendly
manner. To maximize human readability, YAML offers a rich set of
stylistic options which go far beyond the minimal functional needs of
simple data storage. Therefore the YAML <refterm
primary="processor">processor</refterm> is required to introduce
various <defterm primary="presentation"
secondary="detail">presentation details</defterm> when creating the
<refterm primary="stream">stream</refterm>, such as the choice of
<refterm primary="style">node styles</refterm>, how to <refterm
primary="scalar" secondary="content format">format scalar
content</refterm>, the amount of <refterm primary="space"
secondary="indentation">indentation</refterm>, which <refterm
primary="tag" secondary="handle">tag handles</refterm> to use, the
<refterm primary="tag">node tags</refterm> to leave <refterm
primary="tag" secondary="non-specific">unspecified</refterm>, the set
of <refterm primary="directive" >directives</refterm> to provide and
possibly even what <refterm primary="comment">comments</refterm> to
add. While some of this can be done with the help of the <refterm
primary="application">application</refterm>, in general this process
should be guided by the preferences of the user.
</para>
</sect2>
<sect2>
<title>Parsing the Presentation Stream</title>
<para>
<defterm primary="parse">Parsing</defterm> is the inverse process of
<refterm primary="present">presentation</refterm>, it takes a
<refterm primary="stream">stream</refterm> of characters and produces
a series of events. Parsing discards all the <refterm
primary="presentation" secondary="detail">details</refterm>
introduced in the <refterm primary="present">presentation</refterm>
process, reporting only the <refterm
primary="serialization">serialization</refterm> events. Parsing can
fail due to <refterm primary="stream"
secondary="ill-formed">ill-formed</refterm> input.
</para>
</sect2>
<sect2>
<title>Composing the Representation Graph</title>
<para>
<defterm primary="compose">Composing</defterm> takes a series of
<refterm primary="serialization">serialization</refterm> events and
produces a <refterm primary="representation">representation
graph</refterm>. Composing discards all the <refterm
primary="serialization" secondary="detail">details</refterm>
introduced in the <refterm
primary="serialize">serialization</refterm> process, producing only
the <refterm primary="representation">representation graph</refterm>.
Composing can fail due to any of several reasons, detailed <refterm
primary="load" secondary="failure point">below</refterm>.
</para>
</sect2>
<sect2>
<title>Constructing Native Data Structures</title>
<para>
The final input process is <defterm
primary="construct">constructing</defterm> <refterm primary="native
data structure">native data structures</refterm> from the YAML
<refterm primary="representation">representation</refterm>.
Construction must be based only on the information available in the
<refterm primary="representation">representation</refterm>, and not
on additional <refterm
primary="serialization">serialization</refterm> or <refterm
primary="presentation" secondary="detail">presentation
details</refterm> such as <refterm
primary="comment">comments</refterm>, <refterm
primary="directive">directives</refterm>, <refterm primary="key"
secondary="order">mapping key order</refterm>, <refterm
primary="style">node styles</refterm>, <refterm primary="scalar"
secondary="content format">scalar content format</refterm>, <refterm
primary="space" secondary="indentation">indentation</refterm> levels
etc. Construction can fail due to the <refterm primary="tag"
secondary="unavailable">unavailability</refterm> of the required
<refterm primary="native data structure">native data types</refterm>.
</para>
</sect2>
</sect1>
<pagebreak/>
<sect1>
<title>Information Models</title>
<para>
This section specifies the formal details of the results of the above
processes. To maximize data portability between programming languages
and implementations, users of YAML should be mindful of the distinction
between <refterm primary="serialization">serialization</refterm> or
<refterm primary="presentation">presentation</refterm> properties and
those which are part of the YAML <refterm
primary="representation">representation</refterm>. Thus, while imposing
a <refterm primary="key" secondary="order">order</refterm> on <refterm
primary="key">mapping keys</refterm> is necessary for flattening YAML
<refterm primary="representation">representations</refterm> to a
sequential access medium, this <refterm primary="serialization"
secondary="detail">serialization detail</refterm> must not be used to
convey <refterm primary="application">application</refterm> level
information. In a similar manner, while <refterm primary="space"
secondary="indentation">indentation</refterm> technique and a choice of
a <refterm primary="style">node style</refterm> are needed for the
human readability, these <refterm primary="presentation"
secondary="detail">presentation details</refterm> are neither part of
the YAML <refterm primary="serialization">serialization</refterm> nor
the YAML <refterm primary="representation">representation</refterm>. By
carefully separating properties needed for <refterm
primary="serialization">serialization</refterm> and <refterm
primary="presentation">presentation</refterm>, YAML <refterm
primary="representation">representations</refterm> of <refterm
primary="application">application</refterm> information will be
consistent and portable between various programming environments.
</para>
<para>
The following diagram summarizes the three <defterm
primary="information model">information models</defterm>. Full arrows
denote composition, hollow arrows denote inheritance,
<uquote>1</uquote> and <uquote>*</uquote> denote <quote>one</quote> and
<quote>many</quote> relationships. A single <uquote>+</uquote> denotes
<refterm primary="serialization">serialization</refterm> details, a
double <uquote>++</uquote> denotes <refterm
primary="presentation">presentation</refterm> details.
</para>
<figure>
<title>Information Models</title>
<mediaobject>
<imageobject>
<imagedata fileref="model2.eps" format="eps"/>
</imageobject>
</mediaobject>
</figure>
<pagebreak/>
<sect2>
<title>Representation Graph</title>
<para>
YAML<q/>s <defterm primary="representation">representation</defterm>
of <refterm primary="native data structure">native data
structure</refterm> is a rooted, connected, directed graph of
<refterm primary="tag">tagged</refterm> <refterm
primary="node">nodes</refterm>. By <quote>directed graph</quote> we
mean a set of <refterm primary="node">nodes</refterm> and directed
edges (<quote>arrows</quote>), where each edge connects one <refterm
primary="node">node</refterm> to another (see <ulink
url="http://www.nist.gov/dads/HTML/directedGraph.html">a formal
definition</ulink>). All the <refterm primary="node">nodes</refterm>
must be reachable from the <defterm primary="node"
secondary="root">root node</defterm> via such edges. Note that the
YAML graph may include cycles, and a <refterm
primary="node">node</refterm> may have more than one incoming edge.
</para>
<para>
<refterm primary="node">Nodes</refterm> that are defined in terms of
other <refterm primary="node">nodes</refterm> are <refterm
primary="collection">collections</refterm>; <refterm
primary="node">nodes</refterm> that are independent of any other
<refterm primary="node">nodes</refterm> are <refterm
primary="scalar">scalars</refterm>. YAML supports two <refterm
primary="kind">kinds</refterm> of <refterm
primary="collection">collection nodes</refterm>: <refterm
primary="sequence">sequences</refterm> and <refterm
primary="mapping">mappings</refterm>. <refterm
primary="mapping">Mapping nodes</refterm> are somewhat tricky because
their <refterm primary="key">keys</refterm> are unordered and must be
<refterm primary="equality">unique</refterm>.
</para>
<figure>
<title>Representation Model</title>
<mediaobject>
<imageobject>
<imagedata fileref="represent2.eps" format="eps"/>
</imageobject>
</mediaobject>
</figure>
<sect3>
<title>Nodes</title>
<para>
A YAML <defterm primary="node">node</defterm> <refterm
primary="representation">represents</refterm> a single <refterm
primary="native data structure">native data structure</refterm>.
Such nodes have <defterm primary="content">content</defterm> of one
of three <defterm primary="kind">kinds</defterm>: scalar, sequence,
or mapping. In addition, each node has a <refterm
primary="tag">tag</refterm> which serves to restrict the set of
possible values the content can have.
</para>
<variablelist>
<varlistentry>
<term>Scalar</term>
<listitem>
The content of a <defterm primary="scalar">scalar</defterm>
node is an opaque datum that can be <refterm
primary="present">presented</refterm> as a series of zero or
more Unicode characters.
</listitem>
</varlistentry>
<varlistentry>
<term>Sequence</term>
<listitem>
The content of a <defterm primary="sequence">sequence</defterm>
node is an ordered series of zero or more nodes. In particular,
a sequence may contain the same node more than once. It could
even contain itself (directly or indirectly).
</listitem>
</varlistentry>
</variablelist>
<pagebreak/>
<variablelist>
<varlistentry>
<term>Mapping</term>
<listitem>
The content of a <defterm primary="mapping">mapping</defterm>
node is an unordered set of <defterm
primary="key">key:</defterm>&nbsp;<defterm
primary="value">value</defterm> node <defterm primary="key:
value pair">pairs</defterm>, with the restriction that each of
the keys is <refterm primary="equality">unique</refterm>. YAML
places no further restrictions on the nodes. In particular,
keys may be arbitrary nodes, the same node may be used as the
value of several key:&nbsp;value pairs, and a mapping could
even contain itself as a key or a value (directly or
indirectly).
</listitem>
</varlistentry>
</variablelist>
<para>
When appropriate, it is convenient to consider sequences and
mappings together, as <defterm
primary="collection">collections</defterm>. In this view, sequences
are treated as mappings with integer keys starting at zero. Having
a unified collections view for sequences and mappings is helpful
both for theoretical analysis and for creating practical YAML tools
and APIs. This strategy is also used by the Javascript programming
language.
</para>
</sect3>
<sect3>
<title>Tags</title>
<para>
YAML <refterm primary="represent">represents</refterm> type
information of <refterm primary="native data structure">native data
structures</refterm> with a simple identifier, called a <defterm
primary="tag">tag</defterm>. <defterm primary="tag"
secondary="global">Global tags</defterm> are <ulink
url="http://www.ietf.org/rfc/rfc2396.txt">URIs</ulink> and hence
globally unique across all <refterm
primary="application">applications</refterm>. The
<uquote>tag:</uquote> <ulink
url="http://www.faqs.org/rfcs/rfc4151.html">URI scheme</ulink> is
recommended for all global YAML tags. In contrast, <defterm
primary="tag" secondary="local">local tags</defterm> are specific
to a single <refterm primary="application">application</refterm>.
Local tags start with <defterm primary="! tag indicator"
secondary="! local tag"><uquote>!</uquote></defterm>, are not URIs
and are not expected to be globally unique. YAML provides a
<refterm primary="directive" secondary="TAG"><uquote>TAG</uquote>
directive</refterm> to make tag notation less verbose; it also
offers easy migration from local to global tags. To ensure this,
local tags are restricted to the URI character set and use URI
character <refterm primary="% escaping in URI">escaping</refterm>.
</para>
<para>
YAML does not mandate any special relationship between different
tags that begin with the same substring. Tags ending with URI
fragments (containing <uquote>#</uquote>) are no exception; tags
that share the same base URI but differ in their fragment part are
considered to be different, independent tags. By convention,
fragments are used to identify different <quote>variants</quote> of
a tag, while <uquote>/</uquote> is used to define nested tag
<quote>namespace</quote> hierarchies. However, this is merely a
convention, and each tag may employ its own rules. For example,
Perl tags may use <uquote>::</uquote> to express namespace
hierarchies, Java tags may use <uquote>.</uquote>, etc.
</para>
<para>
YAML tags are used to associate meta information with each <refterm
primary="node">node</refterm>. In particular, each tag must specify
the expected <refterm primary="kind">node kind</refterm> (<refterm
primary="scalar">scalar</refterm>, <refterm
primary="sequence">sequence</refterm>, or <refterm
primary="mapping">mapping</refterm>). <refterm
primary="scalar">Scalar</refterm> tags must also provide a
mechanism for converting <refterm primary="scalar"
secondary="content format">formatted content</refterm> to a
<refterm primary="scalar" secondary="canonical form">canonical
form</refterm> for supporting <refterm
primary="equality">equality</refterm> testing. Furthermore, a tag
may provide additional information such as the set of allowed
<refterm primary="content">content</refterm> values for validation,
a mechanism for <refterm primary="tag" secondary="resolution">tag
resolution</refterm>, or any other data that is applicable to all
of the tag<q/>s <refterm primary="node">nodes</refterm>.
</para>
</sect3>
<sect3>
<title>Node Comparison</title>
<para>
Since YAML <refterm primary="mapping">mappings</refterm> require
<refterm primary="key">key</refterm> uniqueness, <refterm
primary="representation">representations</refterm> must include a
mechanism for testing the equality of <refterm
primary="node">nodes</refterm>. This is non-trivial since YAML
allows various ways to <refterm primary="scalar" secondary="content
format">format scalar content</refterm>. For example, the integer
eleven can be written as <uquote>013</uquote> (octal) or
<uquote>0xB</uquote> (hexadecimal). If both forms are used as
<refterm primary="key">keys</refterm> in the same <refterm
primary="mapping">mapping</refterm>, only a YAML <refterm
primary="processor">processor</refterm> which recognizes integer
<refterm primary="scalar" secondary="content
format">formats</refterm> would correctly flag the duplicate
<refterm primary="key">key</refterm> as an error.
</para>
<variablelist>
<varlistentry>
<term>Canonical Form</term>
<listitem>
YAML supports the need for <refterm
primary="scalar">scalar</refterm> equality by requiring that
every <refterm primary="scalar">scalar</refterm>&nbsp;<refterm
primary="tag">tag</refterm> must specify a mechanism for
producing the <defterm primary="scalar" secondary="canonical
form">canonical form</defterm> of any <refterm primary="scalar"
secondary="content format">formatted content</refterm>. This
form is a Unicode character string which also <refterm
primary="present">presents</refterm> the same <refterm
primary="scalar" secondary="content format">content</refterm>,
and can be used for equality testing. While this requirement is
stronger than a well defined equality operator, it has other
uses, such as the production of digital signatures.
</listitem>
</varlistentry>
<varlistentry>
<term>Equality</term>
<listitem>
Two <refterm primary="node">nodes</refterm> must have the same
<refterm primary="tag">tag</refterm> and <refterm
primary="content">content</refterm> to be <defterm
primary="equality">equal</defterm>. Since each <refterm
primary="tag">tag</refterm> applies to exactly one <refterm
primary="kind">kind</refterm>, this implies that the two
<refterm primary="node">nodes</refterm> must have the same
<refterm primary="kind">kind</refterm> to be equal. Two
<refterm primary="scalar">scalars</refterm> are equal only
when their <refterm primary="tag">tags</refterm> and canonical
forms are equal character-by-character. Equality of <refterm
primary="collection">collections</refterm> is defined
recursively. Two <refterm
primary="sequence">sequences</refterm> are equal only when
they have the same <refterm primary="tag">tag</refterm> and
length, and each <refterm primary="node">node</refterm> in one
<refterm primary="sequence">sequence</refterm> is equal to the
corresponding <refterm primary="node">node</refterm> in the
other <refterm primary="sequence">sequence</refterm>. Two
<refterm primary="mapping">mappings</refterm> are equal only
when they have the same <refterm primary="tag">tag</refterm>
and an equal set of <refterm primary="key">keys</refterm>, and
each <refterm primary="key">key</refterm> in this set is
associated with equal <refterm
primary="value">values</refterm> in both <refterm
primary="mapping">mappings</refterm>.
</listitem>
</varlistentry>
<varlistentry>
<term>Identity</term>
<listitem>
Two <refterm primary="node">nodes</refterm> are <defterm
primary="identity">identical</defterm> only when they <refterm
primary="represent">represent</refterm> the same <refterm
primary="native data structure">native data
structure</refterm>. Typically, this corresponds to a single
memory address. Identity should not be confused with equality;
two equal <refterm primary="node">nodes</refterm> need not have
the same identity. A YAML <refterm
primary="processor">processor</refterm> may treat equal
<refterm primary="scalar">scalars</refterm> as if they were
identical. In contrast, the separate identity of two distinct
but equal <refterm primary="collection">collections</refterm>
must be preserved.
</listitem>
</varlistentry>
</variablelist>
</sect3>
</sect2>
<sect2>
<title>Serialization Tree</title>
<para>
To express a YAML <refterm
primary="representation">representation</refterm> using a serial API,
it is necessary to impose an <refterm primary="key"
secondary="order">order</refterm> on <refterm primary="key" >mapping
keys</refterm> and employ <refterm primary="alias">alias
nodes</refterm> to indicate a subsequent occurrence of a previously
encountered <refterm primary="node">node</refterm>. The result of
this process is a <defterm primary="serialization">serialization
tree</defterm>, where each <refterm primary="node">node</refterm> has
an ordered set of children. This tree can be traversed for a serial
event-based API. <refterm primary="construct">Construction</refterm>
of <refterm primary="native data structure">native data
structures</refterm> from the serial interface should not use
<refterm primary="key" secondary="order">key order</refterm> or
<refterm primary="anchor">anchor names</refterm> for the preservation of
<refterm primary="application">application</refterm> data.
</para>
<figure>
<title>Serialization Model</title>
<mediaobject>
<imageobject>
<imagedata fileref="serialize2.eps" format="eps"/>
</imageobject>
</mediaobject>
</figure>
<sect3>
<title>Keys Order</title>
<para>
In the <refterm primary="representation">representation</refterm>
model, <refterm primary="key">mapping keys</refterm> do not have an
order. To <refterm primary="serialize">serialize</refterm> a
<refterm primary="mapping">mapping</refterm>, it is necessary to
impose an <defterm primary="key"
secondary="order">ordering</defterm> on its <refterm primary="key"
>keys</refterm>. This order is a <refterm primary="serialization"
secondary="detail">serialization detail</refterm> and should not be
used when <refterm primary="compose">composing</refterm> the
<refterm primary="representation">representation graph</refterm>
(and hence for the preservation of <refterm
primary="application">application</refterm> data). In every case
where <refterm primary="node">node</refterm> order is significant,
a <refterm primary="sequence">sequence</refterm> must be used. For
example, an ordered <refterm primary="mapping">mapping</refterm>
can be <refterm primary="represent">represented</refterm> as a
<refterm primary="sequence">sequence</refterm> of <refterm
primary="mapping">mappings</refterm>, where each <refterm
primary="mapping">mapping</refterm> is a single <refterm
primary="key: value pair">key:&nbsp;value pair</refterm>. YAML
provides convenient compact notation for this case.
</para>
</sect3>
<sect3>
<title>Anchors and Aliases</title>
<para>
In the <refterm primary="representation">representation
graph</refterm>, a <refterm primary="node">node</refterm> may
appear in more than one <refterm
primary="collection">collection</refterm>. When <refterm
primary="serialize">serializing</refterm> such data, the first
occurrence of the <refterm primary="node">node</refterm> is
<defterm primary="alias"
secondary="identified">identified</defterm> by an <defterm
primary="anchor">anchor</defterm>. Each subsequent occurrence is
<refterm primary="serialize">serialized</refterm> as an <refterm
primary="alias">alias node</refterm> which refers back to this
anchor. Otherwise, anchor names are a <refterm
primary="serialization" secondary="detail">serialization
detail</refterm> and are discarded once <refterm
primary="compose">composing</refterm> is completed. When <refterm
primary="compose">composing</refterm> a <refterm
primary="representation">representation graph</refterm> from
<refterm primary="serialize">serialized</refterm> events, an alias
node refers to the most recent <refterm
primary="node">node</refterm> in the <refterm
primary="serialization">serialization</refterm> having the
specified anchor. Therefore, anchors need not be unique within a
<refterm primary="serialization">serialization</refterm>. In
addition, an anchor need not have an alias node referring to it. It
is therefore possible to provide an anchor for all <refterm
primary="node">nodes</refterm> in <refterm
primary="serialization">serialization</refterm>.
</para>
</sect3>
</sect2>
<sect2>
<title>Presentation Stream</title>
<para>
A YAML <defterm primary="presentation">presentation</defterm> is a
<refterm primary="stream">stream</refterm> of Unicode characters
making use of of <refterm primary="style">styles</refterm>, <refterm
primary="scalar" secondary="content format">scalar content
formats</refterm>, <refterm primary="comment" >comments</refterm>,
<refterm primary="directive">directives</refterm> and other <refterm
primary="presentation" secondary="detail">presentation
details</refterm> to <refterm primary="present">present</refterm> a
YAML <refterm primary="serialization">serialization</refterm> in a
human readable way. Although a YAML <refterm
primary="processor">processor</refterm> may provide these <refterm
primary="presentation" secondary="detail">details</refterm> when
<refterm primary="parse">parsing</refterm>, they should not be
reflected in the resulting <refterm
primary="serialization">serialization</refterm>. YAML allows several
<refterm primary="serialization">serialization trees</refterm> to be
contained in the same YAML character stream, as a series of <refterm
primary="document">documents</refterm> separated by <refterm
primary="marker">markers</refterm>. Documents appearing in the same
stream are independent; that is, a <refterm
primary="node">node</refterm> must not appear in more than one
<refterm primary="serialization">serialization tree</refterm> or
<refterm primary="representation">representation graph</refterm>.
</para>
<figure>
<title>Presentation Model</title>
<mediaobject>
<imageobject>
<imagedata fileref="present2.eps" format="eps"/>
</imageobject>
</mediaobject>
</figure>
<pagebreak/>
<sect3>
<title>Node Styles</title>
<para>
Each <refterm primary="node">node</refterm> is presented in some
<defterm primary="style">style</defterm>, depending on its <refterm
primary="kind">kind</refterm>. The node style is a <refterm
primary="presentation" secondary="detail">presentation
detail</refterm> and is not reflected in the <refterm
primary="serialization">serialization tree</refterm> or <refterm
primary="representation">representation graph</refterm>. There are
two groups of styles. <refterm primary="style"
secondary="block">Block styles</refterm> use <refterm
primary="space" secondary="indentation">indentation</refterm> to
denote structure; In contrast, <refterm primary="style"
secondary="flow">flow styles</refterm> styles rely on explicit
<refterm primary="indicator">indicators</refterm>.
</para>
<para>
YAML provides a rich set of <defterm primary="style"
secondary="scalar">scalar styles</defterm>. <refterm
primary="style" secondary="block" tertiary="scalar">Block
scalar</refterm> styles include the <refterm primary="style"
secondary="block" tertiary="literal">literal style</refterm> and
the <refterm primary="style" secondary="block"
tertiary="folded">folded style</refterm>. <refterm primary="style"
secondary="flow" tertiary="scalar">Flow scalar</refterm> styles
include the <refterm primary="style" secondary="flow"
tertiary="plain">plain style</refterm> and two quoted styles, the
<refterm primary="style" secondary="flow"
tertiary="single-quoted">single-quoted style</refterm> and the
<refterm primary="style" secondary="flow"
tertiary="double-quoted">double-quoted style</refterm>. These
styles offer a range of trade-offs between expressive power and
readability.
</para>
<para>
Normally, <refterm primary="style" secondary="block"
tertiary="sequence">block sequences</refterm> and <refterm
primary="style" secondary="block"
tertiary="mapping">mappings</refterm> begin on the next line. In
some cases, YAML also allows nested <refterm primary="style"
secondary="block">block</refterm> <refterm
primary="collection">collections</refterm> to start <refterm
primary="style" secondary="in-line collection">in-line</refterm>
for a more compact notation. In addition, YAML provides a compact
notation for <refterm primary="style" secondary="flow"
tertiary="mapping">flow mappings</refterm> with a single <refterm
primary="key: value pair">key:&nbsp;value pair</refterm>, nested
inside a <refterm primary="style" secondary="flow"
tertiary="sequence">flow sequence</refterm>. These allow for a
natural <quote>ordered mapping</quote> notation.
</para>
<figure>
<title>Kind/Style Combinations</title>
<mediaobject>
<imageobject>
<imagedata fileref="styles2.eps" format="eps"/>
</imageobject>
</mediaobject>
</figure>
</sect3>
<sect3>
<title>Scalar Formats</title>
<para>
YAML allows <refterm primary="scalar">scalars</refterm> to be
<refterm primary="present">presented</refterm> in several <defterm
primary="scalar" secondary="content format">formats</defterm>. For
example, the integer <uquote>11</uquote> might also be written as
<uquote>0xB</uquote>. <refterm primary="tag">Tags</refterm> must
specify a mechanism for converting the formatted content to a
<refterm primary="scalar" secondary="canonical form">canonical
form</refterm> for use in <refterm
primary="equality">equality</refterm> testing. Like <refterm
primary="style">node style</refterm>, the format is a <refterm
primary="presentation" secondary="detail">presentation
detail</refterm> and is not reflected in the <refterm
primary="serialization">serialization tree</refterm> and <refterm
primary="representation">representation graph</refterm>.
</para>
</sect3>
<sect3>
<title>Comments</title>
<para>
<refterm primary="comment">Comments</refterm> are a <refterm
primary="presentation" secondary="detail">presentation
detail</refterm> and must not have any effect on the <refterm
primary="serialization">serialization tree</refterm> or <refterm
primary="representation">representation graph</refterm>. In
particular, comments are not associated with a particular <refterm
primary="node">node</refterm>. The usual purpose of a comment is to
communicate between the human maintainers of a file. A typical
example is comments in a configuration file. Comments must not
appear inside <refterm primary="scalar">scalars</refterm>, but may
be interleaved with such <refterm
primary="scalar">scalars</refterm> inside <refterm
primary="collection">collections</refterm>.
</para>
</sect3>
<pagebreak/>
<sect3>
<title>Directives</title>
<para>
Each <refterm primary="document">document</refterm> may be
associated with a set of <refterm
primary="directive">directives</refterm>. A directive has a name
and an optional sequence of parameters. Directives are instructions
to the YAML <refterm primary="processor">processor</refterm>, and
like all other <refterm primary="presentation"
secondary="detail">presentation details</refterm> are not reflected
in the YAML <refterm primary="serialization">serialization
tree</refterm> or <refterm primary="representation">representation
graph</refterm>. This version of YAML defines a two directives,
<refterm primary="directive"
secondary="YAML"><uquote>YAML</uquote></refterm> and <refterm
primary="directive" secondary="TAG"><uquote>TAG</uquote></refterm>.
All other directives are <refterm primary="directive"
secondary="reserved">reserved</refterm> for future versions of
YAML.
</para>
</sect3>
</sect2>
</sect1>
<sect1>
<title>Loading Failure Points</title>
<para>
The process of <refterm primary="load">loading</refterm> <refterm
primary="native data structure">native data structures</refterm> from a
YAML <refterm primary="stream">stream</refterm> has several potential
<defterm primary="load" secondary="failure point">failure
points</defterm>. The character <refterm
primary="stream">stream</refterm> may be <refterm primary="stream"
secondary="ill-formed">ill-formed</refterm>, <refterm
primary="alias">aliases</refterm> may be <refterm primary="alias"
secondary="unidentified">unidentified</refterm>, <refterm primary="tag"
secondary="non-specific">unspecified tags</refterm> may be <refterm
primary="tag" secondary="unresolved">unresolvable</refterm>, <refterm
primary="tag">tags</refterm> may be <refterm primary="tag"
secondary="unrecognized">unrecognized</refterm>, the <refterm
primary="content">content</refterm> may be <refterm primary="invalid
content">invalid</refterm>, and a native type may be <refterm
primary="tag" secondary="unavailable">unavailable</refterm>. Each of
these failures results with an incomplete loading.
</para>
<para>
A <defterm primary="representation" secondary="partial">partial
representation</defterm> need not <refterm primary="tag"
secondary="resolution">resolve</refterm> the <refterm primary="tag"
>tag</refterm> of each <refterm primary="node">node</refterm>, and the
<refterm primary="scalar" secondary="canonical form">canonical
form</refterm> of <refterm primary="scalar" secondary="content
format">formatted scalar content</refterm> need not be available. This
weaker representation is useful for cases of incomplete knowledge of
the types used in the <refterm primary="document">document</refterm>.
In contrast, a <defterm primary="representation"
secondary="complete">complete representation</defterm> specifies the
<refterm primary="tag">tag</refterm> of each <refterm
primary="node">node</refterm>, and provides the <refterm
primary="scalar" secondary="canonical form">canonical form</refterm> of
<refterm primary="scalar" secondary="content format">formatted scalar
content</refterm>, allowing for <refterm
primary="equality">equality</refterm> testing. A complete
representation is required in order to <refterm
primary="construct">construct</refterm> <refterm primary="native data
structure">native data structures</refterm>.
</para>
<figure>
<title>Loading Failure Points</title>
<mediaobject>
<imageobject>
<imagedata fileref="validity2.eps" format="eps"/>
</imageobject>
</mediaobject>
</figure>
<sect2>
<title>Well-Formed Streams and Identified Aliases</title>
<para>
A <refterm primary="stream"
secondary="well-formed">well-formed</refterm> character <refterm
primary="stream">stream</refterm> must match the BNF productions
specified in the following chapters. Successful loading also requires
that each <refterm primary="alias">alias</refterm> shall refer to a
previous <refterm primary="node">node</refterm> <refterm
primary="alias" secondary="identified">identified</refterm> by the
<refterm primary="anchor">anchor</refterm>. A YAML <refterm
primary="processor">processor</refterm> should reject <defterm
primary="stream" secondary="ill-formed">ill-formed streams</defterm>
and <defterm primary="alias" secondary="unidentified">unidentified
aliases</defterm>. A YAML <refterm
primary="processor">processor</refterm> may recover from syntax
errors, possibly by ignoring certain parts of the input, but it must
provide a mechanism for reporting such errors.
</para>
</sect2>
<sect2>
<title>Resolved Tags</title>
<para>
Typically, most <refterm primary="tag">tags</refterm> are not
explicitly specified in the character <refterm
primary="stream">stream</refterm>. During <refterm
primary="parse">parsing</refterm>, <refterm
primary="node">nodes</refterm> lacking an explicit <refterm
primary="tag">tag</refterm> are given a <defterm primary="tag"
secondary="non-specific">non-specific tag</defterm>: <defterm
primary="! tag indicator" secondary="! non-specific
tag"><uquote>!</uquote></defterm> for non-<refterm primary="style"
secondary="flow" tertiary="plain">plain scalars</refterm>, and
<defterm primary="? non-specific tag"><uquote>?</uquote></defterm>
for all other <refterm primary="node">nodes</refterm>. <refterm
primary="compose">Composing</refterm> a <refterm
primary="representation" secondary="complete">complete
representation</refterm> requires each such non-specific tag to be
<defterm primary="tag" secondary="resolution">resolved</defterm> to a
<defterm primary="tag" secondary="specific">specific tag</defterm>,
be it a <refterm primary="tag" secondary="global">global
tag</refterm> or a <refterm primary="tag" secondary="local">local
tag</refterm>.
</para>
<para>
Resolving the <refterm primary="tag">tag</refterm> of a <refterm
primary="node">node</refterm> must only depend on the following three
parameters: (1) the non-specific tag of the <refterm
primary="node">node</refterm>, (2) the path leading from the <refterm
primary="node" secondary="root">root</refterm> to the <refterm
primary="node">node</refterm>, and (3) the <refterm
primary="content">content</refterm> (and hence the <refterm
primary="kind">kind</refterm>) of the <refterm
primary="node">node</refterm>. When a <refterm
primary="node">node</refterm> has more than one occurrence (using
<refterm primary="alias">aliases</refterm>), tag resolution must
depend only on the path to the first (<refterm
primary="anchor">anchored</refterm>) occurrence of the <refterm
primary="node">node</refterm>.
</para>
<para>
Note that resolution must not consider <refterm
primary="presentation" secondary="detail">presentation
details</refterm> such as <refterm
primary="comment">comments</refterm>, <refterm primary="space"
secondary="indentation">indentation</refterm> and <refterm
primary="style">node style</refterm>. Also, resolution must not
consider the <refterm primary="content">content</refterm> of any
other <refterm primary="node">node</refterm>, except for the <refterm
primary="content">content</refterm> of the <refterm primary="key">key
nodes</refterm> directly along the path leading from the <refterm
primary="node" secondary="root">root</refterm> to the resolved
<refterm primary="node" >node</refterm>. Finally, resolution must not
consider the <refterm primary="content">content</refterm> of a
sibling <refterm primary="node">node</refterm> in a <refterm
primary="collection">collection</refterm>, or the <refterm
primary="content">content</refterm> of the <refterm
primary="value">value node</refterm> associated with a <refterm
primary="key">key node</refterm> being resolved.
</para>
<para>
These rules ensure that tag resolution can be performed as soon as a
<refterm primary="node">node</refterm> is first encountered in the
<refterm primary="stream">stream</refterm>, typically before its
<refterm primary="content">content</refterm> is <refterm
primary="parse">parsed</refterm>. Also, tag resolution only requires
referring to a relatively small number of previously parsed <refterm
primary="node">nodes</refterm>. Thus, in most cases, tag resolution
in one-pass <refterm primary="processor">processors</refterm> is both
possible and practical.
</para>
<para>
YAML <refterm primary="processor">processors</refterm> should resolve
<refterm primary="node">nodes</refterm> having the <uquote>!</uquote>
non-specific tag as <uquote>tag:yaml.org,2002:seq</uquote>,
<uquote>tag:yaml.org,2002:map</uquote> or
<uquote>tag:yaml.org,2002:str</uquote> depending on their <refterm
primary="kind">kind</refterm>. This <defterm primary="tag"
secondary="resolution" tertiary="convention">tag resolution
convention</defterm> allows the author of a YAML character <refterm
primary="stream">stream</refterm> to effectively
<quote>disable</quote> the tag resolution process. By explicitly
specifying a <uquote>!</uquote> non-specific <refterm primary="tag"
secondary="property">tag property</refterm>, the <refterm
primary="node">node</refterm> would then be resolved to a
<quote>vanilla</quote> <refterm
primary="sequence">sequence</refterm>, <refterm
primary="mapping">mapping</refterm>, or string, according to its
<refterm primary="kind">kind</refterm>.
</para>
<para>
<refterm primary="application">Application</refterm> specific tag
resolution rules should be restricted to resolving the
<uquote>?</uquote> non-specific tag, most commonly to resolving
<refterm primary="style" secondary="flow" tertiary="plain">plain
scalars</refterm>. These may be matched against a set of regular
expressions to provide automatic resolution of integers, floats,
timestamps, and similar types. An <refterm
primary="application">application</refterm> may also match the
<refterm primary="content">content</refterm> of <refterm
primary="mapping">mapping nodes</refterm> against sets of expected
<refterm primary="key">keys</refterm> to automatically resolve
points, complex numbers, and similar types. Resolved <refterm
primary="sequence">sequence node</refterm> types such as the
<quote>ordered mapping</quote> are also possible.
</para>
<para>
That said, tag resolution is specific to the <refterm
primary="application">application</refterm>. YAML <refterm
primary="processor">processors</refterm> should therefore provide a
mechanism allowing the <refterm
primary="application">application</refterm> to override and expand
these default tag resolution rules.
</para>
<para>
If a <refterm primary="document">document</refterm> contains <defterm
primary="tag" secondary="unresolved">unresolved tags</defterm>, the
YAML <refterm primary="processor">processor</refterm> is unable to
<refterm primary="compose">compose</refterm> a <refterm
primary="representation" secondary="complete">complete
representation</refterm> graph. In such a case, the YAML <refterm
primary="processor">processor</refterm> may <refterm
primary="compose">compose</refterm> a <refterm
primary="representation" secondary="partial">partial
representation</refterm>, based on each <refterm
primary="kind">node<q/>s kind</refterm> and allowing for non-specific
tags.
</para>
</sect2>
<sect2>
<title>Recognized and Valid Tags</title>
<para>
To be <defterm primary="content" secondary="valid">valid</defterm>, a
<refterm primary="node">node</refterm> must have a <refterm
primary="tag">tag</refterm> which is <defterm primary="tag"
secondary="recognized">recognized</defterm> by the YAML <refterm
primary="processor">processor</refterm> and its <refterm
primary="content">content</refterm> must satisfy the constraints
imposed by this <refterm primary="tag">tag</refterm>. If a <refterm
primary="document">document</refterm> contains a <refterm
primary="scalar">scalar node</refterm> with an <defterm primary="tag"
secondary="unrecognized">unrecognized tag</defterm> or <defterm
primary="invalid content">invalid content</defterm>, only a <refterm
primary="representation" secondary="partial">partial
representation</refterm> may be <refterm
primary="compose">composed</refterm>. In contrast, a YAML <refterm
primary="processor">processor</refterm> can always <refterm
primary="compose">compose</refterm> a <refterm
primary="representation" secondary="complete">complete
representation</refterm> for an unrecognized or an invalid <refterm
primary="collection">collection</refterm>, since <refterm
primary="collection">collection</refterm> <refterm
primary="equality">equality</refterm> does not depend upon knowledge
of the <refterm primary="collection">collection<q/>s</refterm> data
type. However, such a <refterm primary="representation"
secondary="complete">complete representation</refterm> cannot be
used to <refterm primary="construct">construct</refterm> a <refterm
primary="native data structure">native data structure</refterm>.
</para>
</sect2>
<sect2>
<title>Available Tags</title>
<para>
In a given processing environment, there need not be an <defterm
primary="tag" secondary="available">available</defterm> native type
corresponding to a given <refterm primary="tag">tag</refterm>. If a
<refterm primary="tag">node<q/>s tag</refterm> is <defterm
primary="tag" secondary="unavailable">unavailable</defterm>, a YAML
<refterm primary="processor">processor</refterm> will not be able to
<refterm primary="construct">construct</refterm> a <refterm
primary="native data structure">native data structure</refterm> for
it. In this case, a <refterm primary="representation"
secondary="complete">complete representation</refterm> may still be
<refterm primary="compose">composed</refterm>, and an <refterm
primary="application">application</refterm> may wish to use this
<refterm primary="representation">representation</refterm> directly.
</para>
</sect2>
</sect1>
</chapter>
<chapter id="Syntax">
<title>Syntax Conventions</title>
<para>
The following chapters formally define the syntax of YAML character
<refterm primary="stream">streams</refterm>, using parameterized BNF
productions. Each BNF production is both named and numbered for easy
reference. Whenever possible, basic structures are specified before the
more complex structures using them in a <quote>bottom up</quote> fashion.
</para>
<para>
The order of alternatives inside a production is significant. Subsequent
alternatives are only considered when previous ones fails. See for
example the <nonterminal
def="#b-break"><userinput>b-break</userinput></nonterminal> production.
In addition, production matching is expected to be greedy. Optional
(<userinput>?</userinput>), zero-or-more (<userinput>*</userinput>) and
one-or-more (<userinput>+</userinput>) patterns are always expected to
match as much of the input as possible.
</para>
<para>
The productions are accompanied by examples, which are given side-by-side
next to equivalent YAML text in an explanatory format. This format uses
only <refterm primary="style" secondary="flow" tertiary="collection">flow
collections</refterm>, <refterm primary="style" secondary="flow"
tertiary="double-quoted">double-quoted scalars</refterm>, and explicit
<refterm primary="tag">tags</refterm> for each <refterm
primary="node">node</refterm>.
</para>
<para>
A reference implementation using the productions is available as the
<ulink
url="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/YamlReference"
>YamlReference</ulink> Haskell package. This reference implementation is
also available as an interactive web application at <ulink
url="http://dev.yaml.org/ypaste"/>.
</para>
<sect1>
<title>Production Naming Conventions</title>
<para>
To make it easier to follow production combinations, production names
use a Hungarian-style naming convention. Each production is given a
prefix based on the type of characters it begins and ends with.
</para>
<variablelist>
<varlistentry>
<term><userinput>e-</userinput></term>
<listitem>
A production matching no characters.
</listitem>
</varlistentry>
<varlistentry>
<term><userinput>c-</userinput></term>
<listitem>
A production starting and ending with a special character.
</listitem>
</varlistentry>
<varlistentry>
<term><userinput>b-</userinput></term>
<listitem>
A production matching a single <refterm primary="line break">line
break</refterm>.
</listitem>
</varlistentry>
<varlistentry>
<term><userinput>nb-</userinput></term>
<listitem>
A production starting and ending with a non-<refterm primary="line
break">break</refterm> character.
</listitem>
</varlistentry>
<varlistentry>
<term><userinput>s-</userinput></term>
<listitem>
A production starting and ending with a <refterm primary="space"
secondary="white">white space</refterm> character.
</listitem>
</varlistentry>
<varlistentry>
<term><userinput>ns-</userinput></term>
<listitem>
A production starting and ending with a non-<refterm
primary="space" secondary="white">space</refterm> character.
</listitem>
</varlistentry>
<varlistentry>
<term>
<varname>X</varname><userinput>-</userinput
><varname>Y</varname><userinput>-</userinput>
</term>
<listitem>
A production starting with an
<varname>X</varname><userinput>-</userinput> character and ending
with a <varname>Y</varname><userinput>-</userinput> character.
</listitem>
</varlistentry>
<varlistentry>
<term><userinput>l-</userinput></term>
<listitem>
A production matching complete line(s).
</listitem>
</varlistentry>
<varlistentry>
<term>
<varname>X</varname><userinput>+</userinput>,
<varname>X</varname><userinput>-</userinput
><varname>Y</varname><userinput>+</userinput>
</term>
<listitem>
A production as above, with the additional property that the
matched content <refterm primary="space"
secondary="indentation">indentation</refterm> level is greater than
the specified <varname>n</varname> parameter.
</listitem>
</varlistentry>
</variablelist>
</sect1>
<sect1>
<title>Production Parameters</title>
<keep-together>
<para>
YAML<q/>s syntax is designed for maximal human readability. This
requires <refterm primary="parse">parsing</refterm> to depend on the
surrounding text. For notational compactness, this dependency is
expressed using parameterized BNF productions.
</para>
<para>
This sensitivity is the cause of most of the complexity of the YAML
syntax definition. It is further complicated by struggling with the
human tendency to look ahead when interpreting text. These
complications are of course the source of most of YAML<q/>s power to
<refterm primary="presentation">present</refterm> data in a very
human readable way.
</para>
<para>
Productions use any of the following parameters:
</para>
<variablelist termlength="15">
<varlistentry>
<term>
Indentation: <varname>n</varname> or <varname>m</varname>
</term>
<listitem>
Many productions use an explicit <refterm primary="space"
secondary="indentation">indentation</refterm> level parameter.
This is less elegant than Python<q/>s <quote>indent</quote> and
<quote>undent</quote> conceptual tokens. However it is required
to formally express YAML<q/>s indentation rules.
</listitem>
</varlistentry>
</variablelist>
<variablelist termlength="15">
<varlistentry>
<term>Context: <varname>c</varname></term>
<listitem>
<keep-together>
<para>
This parameter allows productions to tweak their behavior
according to their surrounding. YAML supports two groups of
<defterm primary="context">contexts</defterm>, distinguishing
between <refterm primary="style" secondary="block">block
styles</refterm> and <refterm primary="style"
secondary="flow">flow styles</refterm>.
</para>
<para>
In <refterm primary="style" secondary="block">block
styles</refterm>, <refterm primary="space"
secondary="indentation">indentation</refterm> is used to
delineate structure. To capture human perception of <refterm
primary="space" secondary="indentation">indentation</refterm>
the rules require special treatment of the <refterm
primary="- block sequence entry"><uquote>-</uquote></refterm>
character, used in <refterm primary="style" secondary="block"
tertiary="sequence">block sequences</refterm>. Hence in some
cases productions need to behave differently inside <refterm
primary="style" secondary="block" tertiary="sequence">block
sequences</refterm> (<defterm primary="context"
secondary="block-in">block-in context</defterm>) and outside
them (<defterm primary="context"
secondary="block-out">block-out context</defterm>).
</para>
<para>
In <refterm primary="style" secondary="flow">flow
styles</refterm>, explicit <refterm
primary="indicator">indicators</refterm> are used to
delineate structure. These styles can be viewed as the
natural extension of JSON to cover <refterm
primary="tag">tagged</refterm>, <refterm primary="style"
secondary="flow"
tertiary="single-quoted">single-quoted</refterm> and <refterm
primary="style" secondary="flow" tertiary="plain">plain
scalars</refterm>. Since the latter have no delineating
<refterm primary="indicator">indicators</refterm>, they are
subject to some restrictions to avoid ambiguities. These
restrictions depend on where they appear: as implicit keys
directly inside a <refterm primary="style" secondary="block"
tertiary="mapping">block mapping</refterm> (<defterm
primary="context" secondary="block-key">block-key</defterm>);
as implicit keys inside a <refterm primary="style"
secondary="flow" tertiary="mapping">flow mapping</refterm>
(<defterm primary="context"
secondary="flow-key">flow-key</defterm>); as values inside a
<refterm primary="style" secondary="flow"
tertiary="collection">flow collection</refterm> (<defterm
primary="context" secondary="flow-in">flow-in</defterm>); or
as values inside one (<defterm primary="context"
secondary="flow-out">flow-out</defterm>).
</para>
</keep-together>
</listitem>
</varlistentry>
<varlistentry>
<term>(Block) Chomping: <varname>t</varname></term>
<listitem>
Block scalars offer three possible mechanisms for <refterm
primary="chomping">chomping</refterm> any trailing <refterm
primary="line break">line breaks</refterm>: <refterm
primary="chomping" secondary="strip">strip</refterm>, <refterm
primary="chomping" secondary="clip">clip</refterm> and <refterm
primary="chomping" secondary="keep">keep</refterm>. Unlike the
previous parameters, this only controls interpretation; the
<refterm primary="line break">line breaks</refterm> are valid in
either case.
</listitem>
</varlistentry>
</variablelist>
</keep-together>
</sect1>
</chapter>
<chapter id="Characters">
<title>Characters</title>
<sect1>
<title>Character Set</title>
<keep-together>
<para>
To ensure readability, YAML <refterm
primary="stream">streams</refterm> use only the <defterm
primary="printable character">printable</defterm> subset of the
Unicode character set. The allowed character range explicitly
excludes the C0 control block <userinput>#x0-#x1F</userinput> (except
for TAB <userinput>#x9</userinput>, LF <userinput>#xA</userinput>,
and CR <userinput>#xD</userinput> which are allowed), DEL
<userinput>#x7F</userinput>, the C1 control block
<userinput>#x80-#x9F</userinput> (except for NEL
<userinput>#x85</userinput> which is allowed), the surrogate block
<userinput>#xD800-#xDFFF</userinput>, <userinput>#xFFFE</userinput>,
and <userinput>#xFFFF</userinput>.
</para>
<para>
On input, a YAML <refterm primary="processor">processor</refterm>
must accept all Unicode characters except those explicitly excluded
above.
</para>
<para>
On output, a YAML <refterm primary="processor">processor</refterm>
must only produce acceptable characters. Any excluded characters must
be <refterm primary="present">presented</refterm> using <refterm
primary="escaping" secondary="in double-quoted
scalars">escape</refterm> sequences. In addition, any allowed
characters known to be non-printable should also be <refterm
primary="escaping" secondary="in double-quoted
scalars">escaped</refterm>. This isn<q/>t mandatory since a full
implementation would require extensive character property tables.
</para>
<productionset>
<production id="c-printable">
<lhs>c-printable</lhs>
<rhs>
&nbsp;&nbsp;#x9 | #xA | #xD | [#x20-#x7E]&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;/* 8 bit */<sbr/>
| #x85 | [#xA0-#xD7FF] | [#xE000-#xFFFD] /* 16 bit */<sbr/>
| [#x10000-#x10FFFF]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;/* 32 bit */
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<para>
To ensure <refterm primary="JSON compatibility">JSON
compatibility</refterm>, YAML <refterm
primary="processor">processors</refterm> must allow all non-control
characters inside <refterm primary="style" secondary="flow"
tertiary="double-quoted">quoted</refterm> <refterm primary="style"
secondary="flow" tertiary="single-quoted">scalars</refterm>. To
ensure readability, non-printable characters should be <refterm
primary="escaping" secondary="in double-quoted
scalars">escaped</refterm> on output, even inside such <refterm
primary="style" secondary="flow" tertiary="double-quoted">quoted
scalars</refterm>.
</para>
<productionset>
<production id="c-json">
<lhs>c-json</lhs>
<rhs>
#x9 | #xA | #xD | [#x20-#x10FFFF]
</rhs>
</production>
</productionset>
</keep-together>
</sect1>
<sect1>
<title>Character Encodings</title>
<para>
All characters mentioned in this specification are Unicode code points.
Each such code point is written as one or more bytes depending on the
<defterm primary="character encoding">character encoding</defterm>
used. Note that in UTF-16, characters above
<userinput>#xFFFF</userinput> are written as four bytes, using a
surrogate pair.
</para>
<para>
The character encoding is a <refterm primary="presentation"
secondary="detail">presentation detail</refterm> and must not be used
to convey <refterm primary="content">content</refterm> information.
</para>
<para>
On input, a YAML <refterm primary="processor">processor</refterm> must
support the UTF-8 and UTF-16 character encodings. For <refterm
primary="JSON compatibility">JSON compatibility</refterm>, the UTF-32
encodings must also be supported.
</para>
<para>
If a character <refterm primary="stream">stream</refterm> begins with a
<defterm primary="byte order mark">byte order mark</defterm>, the
character encoding will be taken to be as as indicated by the byte
order mark. Otherwise, the <refterm primary="stream">stream</refterm>
must begin with an ASCII character. This allows the encoding to be
deduced by the pattern of null (<userinput>#x00</userinput>)
characters.
</para>
<keep-together>
<para>
The encoding can therefore be deduced by matching the first few bytes
of the <refterm primary="stream">stream</refterm> with the following
table rows (in order):
</para>
<informaltable>
<tgroup cols="6">
<tbody>
<row>
<entry></entry>
<entry>&nbsp;<emphasis>Byte0&nbsp;</emphasis></entry>
<entry>&nbsp;<emphasis>Byte1&nbsp;</emphasis></entry>
<entry>&nbsp;<emphasis>Byte2&nbsp;</emphasis></entry>
<entry>&nbsp;<emphasis>Byte3&nbsp;</emphasis></entry>
<entry>&nbsp;<emphasis>Encoding</emphasis></entry>
</row>
<row>
<entry>&nbsp;<emphasis>&nbsp;Explicit&nbsp;BOM</emphasis></entry>
<entry>&nbsp;<userinput>#x00</userinput></entry>
<entry>&nbsp;<userinput>#x00</userinput></entry>
<entry>&nbsp;<userinput>#xFE</userinput></entry>
<entry>&nbsp;<userinput>#xFF</userinput></entry>
<entry>&nbsp;UTF-32BE</entry>
</row>
<row>
<entry>&nbsp;<emphasis>&nbsp;ASCII&nbsp;first&nbsp;character</emphasis></entry>
<entry>&nbsp;<userinput>#x00</userinput></entry>
<entry>&nbsp;<userinput>#x00</userinput></entry>
<entry>&nbsp;<userinput>#x00</userinput></entry>
<entry>&nbsp;<emphasis>any</emphasis></entry>
<entry>&nbsp;UTF-32BE</entry>
</row>
<row>
<entry>&nbsp;<emphasis>&nbsp;Explicit&nbsp;BOM</emphasis></entry>
<entry>&nbsp;<userinput>#xFF</userinput></entry>
<entry>&nbsp;<userinput>#xFE</userinput></entry>
<entry>&nbsp;<userinput>#x00</userinput></entry>
<entry>&nbsp;<userinput>#x00</userinput></entry>
<entry>&nbsp;UTF-32LE</entry>
</row>
<row>
<entry>&nbsp;<emphasis>&nbsp;ASCII&nbsp;first&nbsp;character</emphasis></entry>
<entry>&nbsp;<emphasis>any</emphasis></entry>
<entry>&nbsp;<userinput>#x00</userinput></entry>
<entry>&nbsp;<userinput>#x00</userinput></entry>
<entry>&nbsp;<userinput>#x00</userinput></entry>
<entry>&nbsp;UTF-32LE</entry>
</row>
<row>
<entry>&nbsp;<emphasis>&nbsp;Explicit&nbsp;BOM</emphasis></entry>
<entry>&nbsp;<userinput>#xFE</userinput></entry>
<entry>&nbsp;<userinput>#xFF</userinput></entry>
<entry></entry>
<entry></entry>
<entry>&nbsp;UTF-16BE</entry>
</row>
<row>
<entry>&nbsp;<emphasis>&nbsp;ASCII&nbsp;first&nbsp;character</emphasis></entry>
<entry>&nbsp;<userinput>#x00</userinput></entry>
<entry>&nbsp;<emphasis>any</emphasis></entry>
<entry></entry>
<entry></entry>
<entry>&nbsp;UTF-16BE</entry>
</row>
<row>
<entry>&nbsp;<emphasis>&nbsp;Explicit&nbsp;BOM</emphasis></entry>
<entry>&nbsp;<userinput>#xFF</userinput></entry>
<entry>&nbsp;<userinput>#xFE</userinput></entry>
<entry></entry>
<entry></entry>
<entry>&nbsp;UTF-16LE</entry>
</row>
<row>
<entry>&nbsp;<emphasis>&nbsp;ASCII&nbsp;first&nbsp;character</emphasis></entry>
<entry>&nbsp;<emphasis>any</emphasis></entry>
<entry>&nbsp;<userinput>#x00</userinput></entry>
<entry></entry>
<entry></entry>
<entry>&nbsp;UTF-16LE</entry>
</row>
<row>
<entry>&nbsp;<emphasis>&nbsp;Explicit&nbsp;BOM</emphasis></entry>
<entry>&nbsp;<userinput>#xEF</userinput></entry>
<entry>&nbsp;<userinput>#xBB</userinput></entry>
<entry>&nbsp;<userinput>#xBF</userinput></entry>
<entry></entry>
<entry>&nbsp;UTF-8</entry>
</row>
<row>
<entry>&nbsp;<emphasis>&nbsp;Default</emphasis></entry>
<entry></entry>
<entry></entry>
<entry></entry>
<entry></entry>
<entry>&nbsp;UTF-8</entry>
</row>
</tbody>
</tgroup>
</informaltable>
</keep-together>
<para>
The recommended output encoding is UTF-8. If another encoding is
used, it is recommended that an explicit byte order mark be used,
even if the first <refterm primary="stream">stream</refterm>
character is ASCII.
</para>
<keep-together>
<para>
For more information about the byte order mark and the Unicode
character encoding schemes see the <ulink
url="http://www.unicode.org/unicode/faq/utf_bom.html">Unicode
FAQ</ulink>.
</para>
<productionset>
<production id="c-byte-order-mark">
<lhs>c-byte-order-mark</lhs>
<rhs>
#xFEFF
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<para>
In the examples, byte order mark characters are displayed as
<uquote>&hArr;</uquote>.
</para>
<example>
<title>Byte Order Mark</title>
<simplelist type="horiz" columns="2">
<member>
<programlisting><hl1>&hArr;</hl1># Comment only.<sbr/>
</programlisting>
<synopsis>Legend:
<hl1><link linkend="c-byte-order-mark">c-byte-order-mark</link></hl1>
</synopsis>
</member>
<member>
<programlisting># This stream contains no<sbr/>
# documents, only comments.
</programlisting>
</member>
</simplelist>
</example>
</keep-together>
<example>
<title>Invalid Byte Order Mark</title>
<simplelist type="horiz" columns="2">
<member>
<screen>- Invalid use of BOM<sbr/>
<hl1>&hArr;</hl1>
- Inside a document.
</screen>
</member>
<member>
<screen>ERROR:<sbr/>
A <hl1>BOM</hl1> must not appear
inside a document.
</screen>
</member>
</simplelist>
</example>
</sect1>
<sect1>
<title>Indicator Characters</title>
<para>
<defterm primary="indicator">Indicators</defterm> are characters that
have special semantics.
</para>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-sequence-entry">
<lhs>c-sequence-entry</lhs>
<rhs>
<quote>-</quote>
</rhs>
</production>
</productionset>
</member>
<member>
A <refterm primary="- block sequence
entry"><uquote>-</uquote></refterm> (<userinput>#2D</userinput>,
hyphen) denotes a <refterm primary="style" secondary="block"
tertiary="sequence">block sequence</refterm> entry.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-mapping-key">
<lhs>c-mapping-key</lhs>
<rhs>
<quote>?</quote>
</rhs>
</production>
</productionset>
</member>
<member>
A <refterm primary="? mapping key"><uquote>?</uquote></refterm>
(<userinput>#3F</userinput>, question mark) denotes a <refterm
primary="key">mapping key</refterm>.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-mapping-value">
<lhs>c-mapping-value</lhs>
<rhs>
<quote>:</quote>
</rhs>
</production>
</productionset>
</member>
<member>
A <refterm primary=": mapping value"><uquote>:</uquote></refterm>
(<userinput>#3A</userinput>, colon) denotes a <refterm
primary="value">mapping value</refterm>.
</member>
</simplelist>
<example>
<title>Block Structure Indicators</title>
<simplelist type="horiz" columns="2">
<member>
<programlisting>sequence<hl3>:</hl3><sbr/>
<hl1>-</hl1> one
<hl1>-</hl1> two
mapping<hl3>:</hl3>
<hl2>?</hl2> sky
<hl3>:</hl3> blue
sea <hl3>:</hl3> green
</programlisting>
<synopsis>Legend:
<hl1><link linkend="c-sequence-entry">c-sequence-entry</link></hl1>
<hl2><link linkend="c-mapping-key">c-mapping-key</link></hl2> <hl3><link linkend="c-mapping-value">c-mapping-value</link></hl3>
</synopsis>
</member>
<member>
<programlisting>%YAML 1.2<sbr/>
---
!!map {
? !!str "sequence"
: !!seq [ !!str "one", !!str "two" ],
? !!str "mapping"
: !!map {
? !!str "sky" : !!str "blue",
? !!str "sea" : !!str "green",
},
}
</programlisting>
</member>
</simplelist>
</example>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-collect-entry">
<lhs>c-collect-entry</lhs>
<rhs>
<quote>,</quote>
</rhs>
</production>
</productionset>
</member>
<member>
A <refterm primary=", end flow entry"><uquote>,</uquote></refterm>
(<userinput>#2C</userinput>, comma) ends a <refterm primary="style"
secondary="flow" tertiary="collection">flow collection</refterm>
entry.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-sequence-start">
<lhs>c-sequence-start</lhs>
<rhs>
<quote>[</quote>
</rhs>
</production>
</productionset>
</member>
<member>
A <refterm primary="[ start flow
sequence"><uquote>[</uquote></refterm> (<userinput>#5B</userinput>,
left bracket) starts a <refterm primary="style" secondary="flow"
tertiary="sequence">flow sequence</refterm>.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-sequence-end">
<lhs>c-sequence-end</lhs>
<rhs>
<quote>]</quote>
</rhs>
</production>
</productionset>
</member>
<member>
A <refterm primary="] end flow sequence"><uquote>]</uquote></refterm>
(<userinput>#5D</userinput>, right bracket) ends a <refterm
primary="style" secondary="flow" tertiary="sequence">flow
sequence</refterm>.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-mapping-start">
<lhs>c-mapping-start</lhs>
<rhs>
<quote>{</quote>
</rhs>
</production>
</productionset>
</member>
<member>
A <refterm primary="{ start flow
mapping"><uquote>{</uquote></refterm> (<userinput>#7B</userinput>,
left brace) starts a <refterm primary="style" secondary="flow"
tertiary="mapping">flow mapping</refterm>.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-mapping-end">
<lhs>c-mapping-end</lhs>
<rhs>
<quote>}</quote>
</rhs>
</production>
</productionset>
</member>
<member>
A <refterm primary="} end flow mapping"><uquote>}</uquote></refterm>
(<userinput>#7D</userinput>, right brace) ends a <refterm
primary="style" secondary="flow" tertiary="mapping">flow
mapping</refterm>.
</member>
</simplelist>
<example>
<title>Flow Collection Indicators</title>
<simplelist type="horiz" columns="2">
<member>
<programlisting>sequence: <hl1>[</hl1> one<hl3>,</hl3> two<hl3>,</hl3> <hl1>]</hl1><sbr/>
mapping: <hl2>{</hl2> sky: blue<hl3>,</hl3> sea: green <hl2>}</hl2>
</programlisting>
<synopsis>Legend:
<hl1><link linkend="c-sequence-start">c-sequence-start</link></hl1> <hl1><link linkend="c-sequence-end">c-sequence-end</link></hl1>
<hl2><link linkend="c-mapping-start">c-mapping-start</link></hl2> <hl2><link linkend="c-mapping-end">c-mapping-end</link></hl2>
<hl3><link linkend="c-collect-entry">c-collect-entry</link></hl3>
</synopsis>
</member>
<member>
<programlisting>%YAML 1.2<sbr/>
---
!!map {
? !!str "sequence"
: !!seq [ !!str "one", !!str "two" ],
? !!str "mapping"
: !!map {
? !!str "sky" : !!str "blue",
? !!str "sea" : !!str "green",
},
}
</programlisting>
</member>
</simplelist>
</example>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-comment">
<lhs>c-comment</lhs>
<rhs>
<quote>#</quote>
</rhs>
</production>
</productionset>
</member>
<member>
An <refterm primary="# comment"><uquote>#</uquote></refterm>
(<userinput>#23</userinput>, octothorpe, hash, sharp, pound, number
sign) denotes a <refterm primary="comment">comment</refterm>.
</member>
</simplelist>
<example>
<title>Comment Indicator</title>
<simplelist type="horiz" columns="2">
<member>
<programlisting><hl1>#</hl1> Comment only.<sbr/>
</programlisting>
<synopsis>Legend:
<hl1><link linkend="c-comment">c-comment</link></hl1>
</synopsis>
</member>
<member>
<programlisting># This stream contains no<sbr/>
# documents, only comments.
</programlisting>
</member>
</simplelist>
</example>
<pagebreak/>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-anchor">
<lhs>c-anchor</lhs>
<rhs>
<quote>&amp;</quote>
</rhs>
</production>
</productionset>
</member>
<member>
An <refterm primary="&amp; anchor"><uquote>&amp;</uquote></refterm>
(<userinput>#26</userinput>, ampersand) denotes a <refterm
primary="anchor">node<q/>s anchor property</refterm>.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-alias">
<lhs>c-alias</lhs>
<rhs>
<quote>*</quote>
</rhs>
</production>
</productionset>
</member>
<member>
An <refterm primary="* alias"><uquote>*</uquote></refterm>
(<userinput>#2A</userinput>, asterisk) denotes an <refterm
primary="alias">alias node</refterm>.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-tag">
<lhs>c-tag</lhs>
<rhs>
<quote>!</quote>
</rhs>
</production>
</productionset>
</member>
<member>
The <refterm primary="! tag indicator"><uquote>!</uquote></refterm>
(<userinput>#21</userinput>, exclamation) is heavily overloaded for
specifying <refterm primary="tag">node tags</refterm>. It is used to
denote <refterm primary="tag" secondary="handle">tag
handles</refterm> used in <refterm primary="directive"
secondary="TAG">tag directives</refterm> and <refterm primary="tag"
secondary="property">tag properties</refterm>; to denote <refterm
primary="tag" secondary="local">local tags</refterm>; and as the
<refterm primary="tag" secondary="non-specific">non-specific
tag</refterm> for non-<refterm primary="style" secondary="flow"
tertiary="plain">plain scalars</refterm>.
</member>
</simplelist>
<example>
<title>Node Property Indicators</title>
<simplelist type="horiz" columns="2">
<member>
<programlisting>anchored: <hl1>!</hl1>local <hl2>&amp;</hl2>anchor value<sbr/>
alias: <hl3>*</hl3>anchor
</programlisting>
<synopsis>Legend:
<hl1><link linkend="c-tag">c-tag</link></hl1> <hl2><link linkend="c-anchor">c-anchor</link></hl2> <hl3><link linkend="c-alias">c-alias</link></hl3>
</synopsis>
</member>
<member>
<programlisting>%YAML 1.2<sbr/>
---
!!map {
? !!str "anchored"
: !local &amp;A1 "value",
? !!str "alias"
: *A1,
}
</programlisting>
</member>
</simplelist>
</example>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-literal">
<lhs>c-literal</lhs>
<rhs>
<quote>|</quote>
</rhs>
</production>
</productionset>
</member>
<member>
A <refterm primary="| literal style"><uquote>|</uquote></refterm>
(<userinput>7C</userinput>, vertical bar) denotes a <refterm
primary="style" secondary="block" tertiary="literal">literal block
scalar</refterm>.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-folded">
<lhs>c-folded</lhs>
<rhs>
<quote>&gt;</quote>
</rhs>
</production>
</productionset>
</member>
<member>
A <refterm primary="&gt; folded
style"><uquote>&gt;</uquote></refterm> (<userinput>#3E</userinput>,
greater than) denotes a <refterm primary="style" secondary="block"
tertiary="folded">folded block scalar</refterm>.
</member>
</simplelist>
<example>
<title>Block Scalar Indicators</title>
<simplelist type="horiz" columns="2">
<member>
<programlisting>literal: <hl1>|</hl1><sbr/>
some
text
folded: <hl2>&gt;</hl2>
some
text
</programlisting>
<synopsis>Legend:
<hl1><link linkend="c-literal">c-literal</link></hl1> <hl2><link linkend="c-folded">c-folded</link></hl2>
</synopsis>
</member>
<member>
<programlisting>%YAML 1.2<sbr/>
---
!!map {
? !!str "literal"
: !!str "some\ntext\n",
? !!str "folded"
: !!str "some text\n",
}
</programlisting>
</member>
</simplelist>
</example>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-single-quote">
<lhs>c-single-quote</lhs>
<rhs>
<quote>'</quote>
</rhs>
</production>
</productionset>
</member>
<member>
An <refterm primary="' single-quoted
style"><uquote>'</uquote></refterm> (<userinput>#27</userinput>,
apostrophe, single quote) surrounds a <refterm primary="style"
secondary="flow" tertiary="single-quoted">single-quoted flow
scalar</refterm>.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-double-quote">
<lhs>c-double-quote</lhs>
<rhs>
<quote>"</quote>
</rhs>
</production>
</productionset>
</member>
<member>
A <refterm primary='" double-quoted
style'><uquote>"</uquote></refterm> (<userinput>#22</userinput>,
double quote) surrounds a <refterm primary="style" secondary="flow"
tertiary="double-quoted">double-quoted flow scalar</refterm>.
</member>
</simplelist>
<example>
<title>Quoted Scalar Indicators</title>
<simplelist type="horiz" columns="2">
<member>
<programlisting>single: <hl1>'</hl1>text<hl1>'</hl1><sbr/>
double: <hl2>"</hl2>text<hl2>"</hl2>
</programlisting>
<synopsis>Legend:
<hl1><link linkend="c-single-quote">c-single-quote</link></hl1> <hl2><link linkend="c-double-quote">c-double-quote</link></hl2>
</synopsis>
</member>
<member>
<programlisting>%YAML 1.2<sbr/>
---
!!map {
? !!str "single"
: !!str "text",
? !!str "double"
: !!str "text",
}
</programlisting>
</member>
</simplelist>
</example>
<pagebreak/>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-directive">
<lhs>c-directive</lhs>
<rhs>
<quote>%</quote>
</rhs>
</production>
</productionset>
</member>
<member>
A <refterm primary="% directive"><uquote>%</uquote></refterm>
(<userinput>#25</userinput>, percent) denotes a <refterm
primary="directive">directive</refterm> line.
</member>
</simplelist>
<example>
<title>Directive Indicator</title>
<simplelist type="horiz" columns="2">
<member>
<programlisting><hl1>%</hl1>YAML 1.2<sbr/>
--- text
</programlisting>
<synopsis>Legend:
<hl1><link linkend="c-directive">c-directive</link></hl1>
</synopsis>
</member>
<member>
<programlisting>%YAML 1.2<sbr/>
---
!!str "text"
</programlisting>
</member>
</simplelist>
</example>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="c-reserved">
<lhs>c-reserved</lhs>
<rhs>
<quote>@</quote> | <quote>`</quote>
</rhs>
</production>
</productionset>
</member>
<member>
The <defterm primary="@ reserved
indicator"><uquote>@</uquote></defterm> (<userinput>#40</userinput>,
at) and <defterm primary="' reserved
indicator"><uquote>`</uquote></defterm> (<userinput>#60</userinput>,
grave accent) are <defterm primary="indicator"
secondary="reserved">reserved</defterm> for future use.
</member>
</simplelist>
<example>
<title>Invalid use of Reserved Indicators</title>
<simplelist type="horiz" columns="2">
<member>
<screen>commercial-at: <hl1>@</hl1>text<sbr/>
grave-accent: <hl1>`</hl1>text
</screen>
</member>
<member>
<screen>ERROR:<sbr/>
<hl1>Reserved indicators</hl1> can't
start a plain scalar.
</screen>
</member>
</simplelist>
</example>
<keep-together>
<para>
Any indicator character:
</para>
<productionset>
<production id="c-indicator">
<lhs>c-indicator</lhs>
<rhs>
&nbsp;&nbsp;<nonterminal
def="#c-sequence-entry"><quote>-</quote></nonterminal>
| <nonterminal
def="#c-mapping-key"><quote>?</quote></nonterminal>
| <nonterminal
def="#c-mapping-value"><quote>:</quote></nonterminal>
| <nonterminal
def="#c-collect-entry"><quote>,</quote></nonterminal>
| <nonterminal
def="#c-sequence-start"><quote>[</quote></nonterminal>
| <nonterminal
def="#c-sequence-end"><quote>]</quote></nonterminal>
| <nonterminal
def="#c-mapping-start"><quote>{</quote></nonterminal>
| <nonterminal
def="#c-mapping-end"><quote>}</quote></nonterminal><sbr/>
| <nonterminal
def="#c-comment"><quote>#</quote></nonterminal>
| <nonterminal
def="#c-anchor"><quote>&amp;</quote></nonterminal>
| <nonterminal def="#c-alias"><quote>*</quote></nonterminal>
| <nonterminal
def="#c-tag"><quote>!</quote></nonterminal>
| <nonterminal
def="#c-literal"><quote>|</quote></nonterminal>
| <nonterminal
def="#c-folded"><quote>&gt;</quote></nonterminal>
| <nonterminal
def="#c-single-quote"><quote>'</quote></nonterminal>
| <nonterminal
def="#c-double-quote"><quote>"</quote></nonterminal><sbr/>
| <nonterminal
def="#c-directive"><quote>%</quote></nonterminal>
| <nonterminal def="#c-reserved"><quote>@</quote>
| <quote>`</quote></nonterminal>
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<para>
The <refterm primary="[ start flow
sequence"><uquote>[</uquote></refterm>, <refterm primary="] end flow
sequence"><uquote>]</uquote></refterm>, <refterm primary="{ start
flow mapping"><uquote>{</uquote></refterm>, <refterm primary="} end
flow mapping"><uquote>}</uquote></refterm> and <refterm primary=",
end flow entry"><uquote>,</uquote></refterm> indicators denote
structure in <refterm primary="style" secondary="flow"
tertiary="collection">flow collections</refterm>. They are therefore
forbidden in some cases, to avoid ambiguity in several constructs.
This is handled on a case-by-case basis by the relevant productions.
</para>
<productionset>
<production id="c-flow-indicator">
<lhs>c-flow-indicator</lhs>
<rhs>
<nonterminal
def="#c-collect-entry"><quote>,</quote></nonterminal>
| <nonterminal
def="#c-sequence-start"><quote>[</quote></nonterminal>
| <nonterminal
def="#c-sequence-end"><quote>]</quote></nonterminal>
| <nonterminal
def="#c-mapping-start"><quote>{</quote></nonterminal>
| <nonterminal
def="#c-mapping-end"><quote>}</quote></nonterminal><sbr/>
</rhs>
</production>
</productionset>
</keep-together>
</sect1>
<sect1>
<title>Line Break Characters</title>
<keep-together>
<para>
YAML recognizes the following ASCII <defterm primary="line
break">line break</defterm> characters. Note that the form feed
(<userinput>#x0C</userinput>) is not considered to be a line break
character.
</para>
<productionset>
<production id="b-line-feed">
<lhs>b-line-feed</lhs>
<rhs>
#xA &nbsp;&nbsp;&nbsp;/* LF */
</rhs>
</production>
<production id="b-carriage-return">
<lhs>b-carriage-return</lhs>
<rhs>
#xD &nbsp;&nbsp;&nbsp;/* CR */
</rhs>
</production>
<production id="b-char">
<lhs>b-char</lhs>
<rhs>
<nonterminal def="#b-line-feed"/>
| <nonterminal def="#b-carriage-return"/>
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<para>
All other characters are considered to be non-break characters. Note
these include the <defterm primary="line break"
secondary="non-ASCII">non-ASCII line breaks</defterm>: next line
(<userinput>#x85</userinput>), line separator
(<userinput>#x2028</userinput>) and paragraph separator
(<userinput>#x2029</userinput>).
</para>
<para>
<anchor id="non-ASCII line breaks"/> <refterm primary="YAML 1.1
processing">YAML version 1.1</refterm> did support the above line
break characters; however, JSON does not. Hence, to ensure <refterm
primary="JSON compatibility">JSON compatibility</refterm>, YAML
treats them as non-break characters as of version 1.2. In theory this
would cause incompatibility with <refterm primary="YAML 1.1
processing">version 1.1</refterm>; in practice these characters were
rarely (if ever) used. YAML 1.2 <refterm
primary="processor">processors</refterm> <refterm
primary="parse">parsing</refterm> a <refterm primary="YAML 1.1
processing">version 1.1</refterm> <refterm
primary="document">document</refterm> should therefore treat these
line breaks as non-break characters, with an appropriate warning.
</para>
<productionset>
<production id="nb-char">
<lhs>nb-char</lhs>
<rhs>
<nonterminal def="#c-printable"/>
- <nonterminal def="#b-char"/>
- <nonterminal def="#c-byte-order-mark"/>
</rhs>
</production>
<production id="nb-json">
<lhs>nb-json</lhs>
<rhs>
<nonterminal def="#c-json"/>
- <nonterminal def="#b-char"/>
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<para>
Line breaks are interpreted differently by different systems, and
have several widely used forms.
</para>
<productionset>
<production id="b-break">
<lhs>b-break</lhs>
<rhs>
&nbsp;&nbsp;( <nonterminal def="#b-carriage-return"/>
<nonterminal def="#b-line-feed"/> ) /* DOS, Windows */<sbr/>
| <nonterminal def="#b-carriage-return"/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /* MacOS upto 9.x */<sbr/>
| <nonterminal def="#b-line-feed"/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp; /* UNIX, MacOS X */
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<para>
Line breaks inside <refterm primary="scalar">scalar
content</refterm> must be <defterm primary="line break"
secondary="normalization">normalized</defterm> by the YAML <refterm
primary="processor">processor</refterm>. Each such line break must be
<refterm primary="parse">parsed</refterm> into a single line feed
character.
</para>
<para>
The original line break form is a <refterm primary="presentation"
secondary="detail">presentation detail</refterm> and must not be used
to convey <refterm primary="content">content</refterm> information.
</para>
<productionset>
<production id="b-as-line-feed">
<lhs>b-as-line-feed</lhs>
<rhs>
<nonterminal def="#b-break"/>
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<para>
Outside <refterm primary="scalar">scalar content</refterm>, YAML
allows any line break to be used to terminate lines.
</para>
<productionset>
<production id="b-non-content">
<lhs>b-non-content</lhs>
<rhs>
<nonterminal def="#b-break"/>
</rhs>
</production>
</productionset>
</keep-together>
<para>
On output, a YAML <refterm primary="processor">processor</refterm> is
free to emit line breaks using whatever convention is most appropriate.
</para>
<keep-together>
<para>
In the examples, line breaks are sometimes displayed using the
<uquote>&darr;</uquote> glyph for clarity.
</para>
<example>
<title>Line Break Characters</title>
<simplelist type="horiz" columns="2">
<member>
<programlisting>|<sbr/>
Line break (no glyph)
Line break (glyphed)<hl1>&darr;</hl1>
</programlisting>
<synopsis>Legend:
<hl1><link linkend="b-break">b-break</link></hl1>
</synopsis>
</member>
<member>
<programlisting>%YAML 1.2<sbr/>
---
!!str "line break (no glyph)\n\
line break (glyphed)\n"
</programlisting>
</member>
</simplelist>
</example>
</keep-together>
</sect1>
<sect1>
<title>White Space Characters</title>
<keep-together>
<para>
YAML recognizes two <defterm primary="space" secondary="white">white
space</defterm> characters: <defterm primary="space">space</defterm>
and <defterm primary="tab">tab</defterm>.
</para>
<productionset>
<production id="s-space">
<lhs>s-space</lhs>
<rhs>
#x20 /* SP */
</rhs>
</production>
<production id="s-tab">
<lhs>s-tab</lhs>
<rhs>
#x9 &nbsp;/* TAB */
</rhs>
</production>
<production id="s-white">
<lhs>s-white</lhs>
<rhs>
<nonterminal def="#s-space"/> | <nonterminal def="#s-tab"/>
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<para>
The rest of the (<refterm primary="printable
character">printable</refterm>) non-<refterm primary="line
break">break</refterm> characters are considered to be non-space
characters.
</para>
<productionset>
<production id="ns-char">
<lhs>ns-char</lhs>
<rhs>
<nonterminal def="#nb-char"/> - <nonterminal def="#s-white"/>
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<para>
In the examples, tab characters are displayed as the glyph
<uquote>&rarr;</uquote>. Space characters are sometimes displayed as
the glyph <uquote>&middot;</uquote> for clarity.
</para>
<example>
<title>Tabs and Spaces</title>
<simplelist type="horiz" columns="2">
<member>
<programlisting># Tabs and spaces<sbr/>
quoted:<hl1>&middot;</hl1>"Quoted <hl2>&rarr;</hl2>"
block:<hl2>&rarr;</hl2>|
<hl1>&middot;&middot;</hl1>void main() {
<hl1>&middot;&middot;</hl1><hl2>&rarr;</hl2>printf("Hello, world!\n");
<hl1>&middot;&middot;</hl1>}
</programlisting>
<synopsis>Legend:
<hl1><link linkend="s-space">s-space</link></hl1> <hl2><link linkend="s-tab">s-tab</link></hl2>
</synopsis>
</member>
<member>
<programlisting>%YAML 1.2<sbr/>
---
!!map {
? !!str "quoted"
: "Quoted \t",
? !!str "block"
: "void main() {\n\
\tprintf(\"Hello, world!\\n\");\n\
}\n",
}
</programlisting>
</member>
</simplelist>
</example>
</keep-together>
</sect1>
<sect1>
<title>Miscellaneous Characters</title>
<keep-together>
<para>
The YAML syntax productions make use of the following additional
character classes:
</para>
<itemizedlist>
<listitem>
A decimal digit for numbers:
</listitem>
</itemizedlist>
<productionset>
<production id="ns-dec-digit">
<lhs>ns-dec-digit</lhs>
<rhs>
[#x30-#x39] /* 0-9 */
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<itemizedlist>
<listitem>
A hexadecimal digit for <refterm primary="escaping" secondary="in
double-quoted scalars">escape sequences</refterm>:
</listitem>
</itemizedlist>
<productionset>
<production id="ns-hex-digit">
<lhs>ns-hex-digit</lhs>
<rhs>
&nbsp;&nbsp;<nonterminal def="#ns-dec-digit"/><sbr/>
| [#x41-#x46] /* A-F */ | [#x61-#x66] /* a-f */
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<itemizedlist>
<listitem>
ASCII letter (alphabetic) characters:
</listitem>
</itemizedlist>
<productionset>
<production id="ns-ascii-letter">
<lhs>ns-ascii-letter</lhs>
<rhs>
[#x41-#x5A] /* A-Z */ | [#x61-#x7A] /* a-z */
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<itemizedlist>
<listitem>
Word (alphanumeric) characters for identifiers:
</listitem>
</itemizedlist>
<productionset>
<production id="ns-word-char">
<lhs>ns-word-char</lhs>
<rhs>
<nonterminal def="#ns-dec-digit"/>
| <nonterminal def="#ns-ascii-letter"/>
| <quote>-</quote>
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<itemizedlist>
<listitem>
<para>
URI characters for <refterm primary="tag">tags</refterm>, as
specified in <ulink
url="http://www.ietf.org/rfc/rfc2396.txt">RFC2396</ulink>, with
the addition of the <uquote>[</uquote> and <uquote>]</uquote> for
presenting IPv6 addresses as proposed in <ulink
url="http://www.ietf.org/rfc/rfc2732.txt">RFC2732</ulink>.
</para>
<para>
By convention, any URIs characters other than the allowed
printable ASCII characters are first <defterm primary="character
encoding" secondary="in URI">encoded</defterm> in UTF-8, and then
each byte is <defterm primary="escaping" secondary="in
URIs">escaped</defterm> using the <defterm primary="% escaping in
URI"><uquote>%</uquote></defterm> character.
</para>
</listitem>
</itemizedlist>
<productionset>
<production id="ns-uri-char">
<lhs>ns-uri-char</lhs>
<rhs>
&nbsp;&nbsp;<nonterminal def="#ns-word-char"/>
| <quote>%</quote> <nonterminal def="#ns-hex-digit"/>
<nonterminal def="#ns-hex-digit"/><sbr/>
| <quote>;</quote> | <quote>/</quote> | <quote>?</quote>
| <quote>:</quote> | <quote>@</quote> | <quote>&amp;</quote>
| <quote>=</quote> | <quote>+</quote> | <quote>$</quote>
| <quote>,</quote><sbr/>
| <quote>_</quote> | <quote>.</quote> | <quote>!</quote>
| <quote>~</quote> | <quote>*</quote> | <quote>'</quote>
| <quote>(</quote> | <quote>)</quote> | <quote>[</quote>
| <quote>]</quote>
</rhs>
</production>
</productionset>
</keep-together>
<keep-together>
<itemizedlist>
<listitem>
The <link linkend="c-tag"><uquote>!</uquote></link> character is
used to indicate the end of a <refterm primary="tag"
secondary="handle" tertiary="named">named tag handle</refterm>;
hence its use in <refterm primary="tag" secondary="shorthand">tag
shorthands</refterm> is restricted. In addition, such <refterm
primary="tag" secondary="shorthand">shorthands</refterm> must not
contain the <refterm primary="[ start flow
sequence"><uquote>[</uquote></refterm>, <refterm primary="] end
flow sequence"><uquote>]</uquote></refterm>, <refterm primary="{
start flow mapping"><uquote>{</uquote></refterm>, <refterm
primary="} end flow mapping"><uquote>}</uquote></refterm> and
<refterm primary=", end flow entry"><uquote>,</uquote></refterm>
characters. These characters would cause ambiguity with <refterm
primary="style" secondary="flow" tertiary="collection">flow
collection</refterm> structures.
</listitem>
</itemizedlist>
<productionset>
<production id="ns-tag-char">
<lhs>ns-tag-char</lhs>
<rhs>
<nonterminal def="#ns-uri-char"/>
- <nonterminal def="#c-tag"><quote>!</quote></nonterminal>
- <nonterminal def="#c-flow-indicator"/>
</rhs>
</production>
</productionset>
</keep-together>
</sect1>
<sect1>
<title>Escaped Characters</title>
<keep-together>
<para>
All non-<refterm primary="printable character">printable</refterm>
characters must be <defterm primary="escaping" secondary="in
double-quoted scalars">escaped</defterm>. YAML escape sequences use
the <defterm primary="\ escaping in double-quoted
scalars"><uquote>\</uquote></defterm> notation common to most modern
computer languages. Each escape sequence must be <refterm
primary="parse">parsed</refterm> into the appropriate Unicode
character.
</para>
<para>
The original escape sequence form is a <refterm
primary="presentation" secondary="detail">presentation
detail</refterm> and must not be used to convey <refterm
primary="content">content</refterm> information.
</para>
<para>
Note that escape sequences are only interpreted in <refterm
primary="style" secondary="flow"
tertiary="double-quoted">double-quoted scalars</refterm>. In all
other <refterm primary="style" secondary="scalar">scalar
styles</refterm>, the <uquote>\</uquote> character has no special
meaning and non-<refterm primary="printable
character">printable</refterm> characters are not available.
</para>
<productionset>
<production id="c-escape">
<lhs>c-escape</lhs>
<rhs>
<quote>\</quote>
</rhs>
</production>
</productionset>
</keep-together>
<para>
YAML escape sequences are a superset of C<q/>s escape sequences:
</para>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-null">
<lhs>ns-esc-null</lhs>
<rhs>
<quote>0</quote>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped ASCII null (<userinput>#x0</userinput>) character.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-bell">
<lhs>ns-esc-bell</lhs>
<rhs>
<quote>a</quote>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped ASCII bell (<userinput>#x7</userinput>) character.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-backspace">
<lhs>ns-esc-backspace</lhs>
<rhs>
<quote>b</quote>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped ASCII backspace (<userinput>#x8</userinput>) character.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-horizontal-tab">
<lhs>ns-esc-horizontal-tab</lhs>
<rhs>
<quote>t</quote>
| #x9
</rhs>
</production>
</productionset>
</member>
<member>
Escaped ASCII horizontal tab (<userinput>#x9</userinput>) character.
This is useful at the start or the end of a line to force a leading
or trailing tab to become part of the <refterm
primary="content">content</refterm>.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-line-feed">
<lhs>ns-esc-line-feed</lhs>
<rhs>
<quote>n</quote>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped ASCII line feed (<userinput>#xA</userinput>) character.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-vertical-tab">
<lhs>ns-esc-vertical-tab</lhs>
<rhs>
<quote>v</quote>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped ASCII vertical tab (<userinput>#xB</userinput>) character.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-form-feed">
<lhs>ns-esc-form-feed</lhs>
<rhs>
<quote>f</quote>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped ASCII form feed (<userinput>#xA</userinput>) character.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-carriage-return">
<lhs>ns-esc-carriage-return</lhs>
<rhs>
<quote>r</quote>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped ASCII carriage return (<userinput>#xD</userinput>) character.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-escape">
<lhs>ns-esc-escape</lhs>
<rhs>
<quote>e</quote>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped ASCII escape (<userinput>#x1B</userinput>) character.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-space">
<lhs>ns-esc-space</lhs>
<rhs>
#x20
</rhs>
</production>
</productionset>
</member>
<member>
Escaped ASCII space (<userinput>#x20</userinput>) character. This is
useful at the start or the end of a line to force a leading or
trailing space to become part of the <refterm
primary="content">content</refterm>.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-double-quote">
<lhs>ns-esc-double-quote</lhs>
<rhs>
<nonterminal def="#c-double-quote"><quote>"</quote></nonterminal>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped ASCII double quote (<userinput>#x22</userinput>).
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-slash">
<lhs>ns-esc-slash</lhs>
<rhs>
<quote>/</quote>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped ASCII slash (<userinput>#x2F</userinput>) character. This is
required for <refterm primary="JSON compatibility">JSON
compatibility</refterm>.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-backslash">
<lhs>ns-esc-backslash</lhs>
<rhs>
<nonterminal def="#c-escape"><quote>\</quote></nonterminal>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped ASCII back slash (<userinput>#x5C</userinput>).
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-next-line">
<lhs>ns-esc-next-line</lhs>
<rhs>
<quote>N</quote>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped Unicode next line (<userinput>#x85</userinput>) character.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-non-breaking-space">
<lhs>ns-esc-non-breaking-space</lhs>
<rhs>
<quote>_</quote>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped Unicode non-breaking space (<userinput>#xA0</userinput>)
character.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-line-separator">
<lhs>ns-esc-line-separator</lhs>
<rhs>
<quote>L</quote>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped Unicode line separator (<userinput>#x2028</userinput>)
character.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-paragraph-separator">
<lhs>ns-esc-paragraph-separator</lhs>
<rhs>
<quote>P</quote>
</rhs>
</production>
</productionset>
</member>
<member>
Escaped Unicode paragraph separator (<userinput>#x2029</userinput>)
character.
</member>
</simplelist>
<pagebreak/>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-8-bit">
<lhs>ns-esc-8-bit</lhs>
<rhs>
<quote>x</quote><sbr/>
( <nonterminal def="#ns-hex-digit"/> &times; 2 )
</rhs>
</production>
</productionset>
</member>
<member>
Escaped 8-bit Unicode character.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-16-bit">
<lhs>ns-esc-16-bit</lhs>
<rhs>
<quote>u</quote><sbr/>
( <nonterminal def="#ns-hex-digit"/> &times; 4 )
</rhs>
</production>
</productionset>
</member>
<member>
Escaped 16-bit Unicode character.
</member>
</simplelist>
<simplelist type="horiz" columns="2">
<member>
<productionset>
<production id="ns-esc-32-bit">
<lhs>ns-esc-32-bit</lhs>
<rhs>
<quote>U</quote><sbr/>
( <nonterminal def="#ns-hex-digit"/> &times; 8 )
</rhs>
</production>
</productionset>
</member>
<member>
Escaped 32-bit Unicode character.
</member>
</simplelist>
<keep-together>
<para>
Any escaped character:
</para>
<productionset>
<production id="c-ns-esc-char">
<lhs>c-ns-esc-char</lhs>
<rhs>
<nonterminal def="#c-escape"><quote>\</quote></nonterminal><sbr/>
(&nbsp;<nonterminal def="#ns-esc-null"/>
| <nonterminal def="#ns-esc-bell"/>
| <nonterminal def="#ns-esc-backspace"/><sbr/>
| <nonterminal def="#ns-esc-horizontal-tab"/>
| <nonterminal def="#ns-esc-line-feed"/><sbr/>
| <nonterminal def="#ns-esc-vertical-tab"/>
| <nonterminal def="#ns-esc-form-feed"/><sbr/>
| <nonterminal def="#ns-esc-carriage-return"/>
| <nonterminal def="#ns-esc-escape"/>
| <nonterminal def="#ns-esc-space"/><sbr/>
| <nonterminal def="#ns-esc-double-quote"/>
| <nonterminal def="#ns-esc-slash"/>
| <nonterminal def="#ns-esc-backslash"/><sbr/>
| <nonterminal def="#ns-esc-next-line"/>
| <nonterminal def="#ns-esc-non-breaking-space"/><sbr/>
| <nonterminal def="#ns-esc-line-separator"/>
| <nonterminal def="#ns-esc-paragraph-separator"/><sbr/>
| <nonterminal def="#ns-esc-8-bit"/>
| <nonterminal def="#ns-esc-16-bit"/>
| <nonterminal def="#ns-esc-32-bit"/> )<sbr/>
</rhs>
</production>
</productionset>
</keep-together>
<example>
<title>Escaped Characters</title>
<simplelist type="horiz" columns="2">
<member>
<programlisting>"Fun with <hl1>\\</hl1><sbr/>
<hl1>\"</hl1> <hl1>\a</hl1> <hl1>\b</hl1> <hl1>\e</hl1> <hl1>\f</hl1> <hl1>\&darr;</hl1>
<hl1>\n</hl1> <hl1>\r</hl1> <hl1>\t</hl1> <hl1>\v</hl1> <hl1>\0</hl1> <hl1>\&darr;</hl1>
<hl1>\&nbsp;</hl1> <hl1>\_</hl1> <hl1>\N</hl1> <hl1>\L</hl1> <hl1>\P</hl1> <hl1>\&darr;</hl1>
<hl1>\x41</hl1> <hl1>\u0041</hl1> <hl1>\U00000041</hl1>"
</programlisting>
<synopsis>Legend:
<hl1><link linkend="c-ns-esc-char">c-ns-esc-char</link></hl1>
</synopsis>
</member>
<member>
<programlisting>%YAML 1.2<sbr/>
---
"Fun with \x5C
\x22 \x07 \x08 \x1B \x0C
\x0A \x0D \x09 \x0B \x00
\x20 \xA0 \x85 \u2028 \u2029
A A A"
</programlisting>
</member&g