Skip to content
This repository

Welcome to the scala-xml wiki! This is a Scala Incubator project for discussing and prototyping Scala support for XML.

Using the Scala Incubator’s ‘scala-xml’ project

  • Download the Scala trunk code from GitHub into a directory called ‘scala.git’. The checkout URL is: git://github.com/paulp/scala.git
  • Download the Scala Incubator ‘scala-xml’ code from GitHub into a directory called ‘scala-xml.git’. This must have the same parent directory as ‘scala.git’. The checkout URL is: git@github.com:scala-incubator/scala-xml.git
  • You need Ant and a JDK installed and in your path
  • In ‘scala.git’, run ‘ant all.clean’, then ‘ant’, then ‘ant test’. The tests should run successfully. When you run ‘ant test’, ‘JAVA_HOME’ needs to point to a JDK, not a JRE
  • In ‘scala-xml.git’, run ‘ant all.clean’, then ‘ant build’. This compiles the incubator code, unpacks that JARs created in ‘scala.git’, overwrites the classes with the classes from ‘scala-xml.git’ and then re-builds the JARs. In this way, you create a complete working set of Scala JARs, without the Scala Incubator project having to contain a full set of Scala sources.
  • Intellij project files are in ‘src/intellij’. At present, there are no other IDE project files (contributions welcome).

Short Overview of XML in Scala

Scala differs from many programming languages by integrating XML as a first-class data type. If you run ‘scala’ to get the command-line interpreter, and then type

val someXml = <a>this is some xml</a>

the result is

someXml: scala.xml.Elem = <a>this is some xml</a>

You see that the XML has been interpreted as XML, the result type is “scala.xml.Elem”. The result isn’t just a string, Scala understands the angle-bracket syntax of XML.

By contrast, if you type

val someValue = "<a>this is some xml</a>"

with double quotes around the XML, what you get is

someValue: java.lang.String = <a>this is some xml</a>

i.e. just a string.

What is XML?

It may seem funny to ask what XML is. After all, if you are reading this page, you almost certainly know and care about XML. However, the term “XML” is used to refer to both

In building the libraries (API) for Scala, it is important to be clear about which parts of the XML ecosystem will be covered, and how deep the integration into Scala will be. For example, you have already seen that basic XML parsing is already built into Scala. However, there are no Scala implementations of XSLT or XQuery currently. If you want those, you can use Scala’s ability to use Java APIs as you would from Java (and, hopefully soon, Scala’s ability to use .NET APIs as you would from .NET). In considering how Scala can best support the needs of XML users, the choices are typically something like

  • do nothing in particular, let the user use the underlying non-Scala (Java or .NET) API;
  • provide a Scala layer that makes it easier or more concise to use the underlying API;
  • full or partially re-implement the underlying non-Scala API as an equivalent Scala API.

Further questions that are worth asking, when doing an implementation of XML functionality (for any language, not just Scala), are:

  • Are you providing support for reading/writing in-memory strings in XML?
  • Are you providing support for reading/writing files in XML? (this is like reading/writing strings, but you need to deal with character encodings)
  • Are you providing support for reading/writing huge files (i.e. too large to hold in memory at one time)? Is it forwards-only support, or full random-access support? Does it apply to reading huge files, writing huge files, or both?
  • Are you providing support for XML namespaces? If so, how are the prefix bindings specified?
  • Do you provide fine control of how the XML is formatted when written to files, e.g. wrapping, indenting, ordering of attributes in an element, location of namespace declarations, prefixes used for particular namespaces?
  • Are you providing support for validating XML, e.g. using DTDs, XML Schemas, RELAX NG, or Schematron? If so, does the validation operate before reading, any time from after reading to before writing, or after writing?
  • Are you providing support for custom user-written validations?
  • Do you provide an API for accessing XML based on some standard information model, e.g. the W3C XML Infoset , the XML Schema Post-Schema Validation Infoset or the XQuery/XPath Data Model ?
  • Is your API read-only, or can you create/manipulate XML data in-memory as well? Do you have a separate read-only API that provides some advantages over the read-write API, e.g. performance advantages?
  • Is your API built on your language’s existing sequence/list/tree/etc. data structures, or do XML objects have their own structures? Do you allow non-XML data objects (strings, decimals, booleans, etc.) to be direct members of sequences/lists/etc., or is every object XML-specific?
  • How does your API deal with unnamed types in XML Schemas (unnamed “local” complex types or simple types). Can it support them unnamed, or does your API require them to be named (either manually or automatically)? APIs with automatically-generated names are not friendly for developers using those APIs.
  • Are you implementing support for binding XML data to user-defined object models?
  • Are you implementing standard APIs like SAX, DOM, StAX, JAXP, JAXB, .NET XML API?
  • Are you implementing XPath support? If so, XPath 1.0 or 2.0? Are you implementing all of XPath, or a subset? If you are implementing XPath 2.0, which contains much of XQuery 1.0, do you implement XQuery as well, or not? If you implement XQuery, do you implement XSLT 2.0 as well?
  • Do your XPaths only apply to special XML objects, or can they be used with non-XML objects (e.g. in the same manner as Jaxen for Java)?
  • Are you trying to hide the fact that the data is XML from developers, or are you trying to expose all of the XML-specifics? Are you trying to make it possible to work either way?
  • Do you need to support specific XML formats specially, e.g. XHTML?
  • Are you providing equal support for working with both “data-oriented” and “document-oriented” XML (in the sense of XML with little or no mixed content, versus XML that is mostly mixed content like XHTML)?
  • Are you providing support for people who want to work with XML in an order-insensitive way? (the XML default is that order is important, except for order of attributes in an element)

Further Topics

Short Overview of XML in Scala

Scala differs from many programming languages by integrating XML as a first-class data type. If you run ‘scala’ to get the command-line interpreter, and then type

val someXml = <a>this is some xml</a>

the result is

someXml: scala.xml.Elem = <a>this is some xml</a>

You see that the XML has been interpreted as XML, the result type is “scala.xml.Elem”. The result isn’t just a string, Scala understands the angle-bracket syntax of XML.

By contrast, if you type

val someValue = "<a>this is some xml</a>"

with double quotes around the XML, what you get is

someValue: java.lang.String = <a>this is some xml</a>

i.e. just a string.

What is XML?

It may seem funny to ask what XML is. After all, if you are reading this page, you almost certainly know and care about XML. However, the term “XML” is used to refer to both

In building the libraries (API) for Scala, it is important to be clear about which parts of the XML ecosystem will be covered, and how deep the integration into Scala will be. For example, you have already seen that basic XML parsing is already built into Scala. However, there are no Scala implementations of XSLT or XQuery currently. If you want those, you can use Scala’s ability to use Java APIs as you would from Java (and, hopefully soon, Scala’s ability to use .NET APIs as you would from .NET). In considering how Scala can best support the needs of XML users, the choices are typically something like

  • do nothing in particular, let the user use the underlying non-Scala (Java or .NET) API;
  • provide a Scala layer that makes it easier or more concise to use the underlying API;
  • full or partially re-implement the underlying non-Scala API as an equivalent Scala API.

Further questions that are worth asking, when doing an implementation of XML functionality (for any language, not just Scala), are:

  • Are you providing support for reading/writing in-memory strings in XML?
  • Are you providing support for reading/writing files in XML? (this is like reading/writing strings, but you need to deal with character encodings)
  • Are you providing support for reading/writing huge files (i.e. too large to hold in memory at one time)? Is it forwards-only support, or full random-access support? Does it apply to reading huge files, writing huge files, or both?
  • Are you providing support for XML namespaces? If so, how are the prefix bindings specified?
  • Do you provide fine control of how the XML is formatted when written to files, e.g. wrapping, indenting, ordering of attributes in an element, location of namespace declarations, prefixes used for particular namespaces?
  • Are you providing support for validating XML, e.g. using DTDs, XML Schemas, RELAX NG, or Schematron? If so, does the validation operate before reading, any time from after reading to before writing, or after writing?
  • Are you providing support for custom user-written validations?
  • Do you provide an API for accessing XML based on some standard information model, e.g. the W3C XML Infoset , the XML Schema Post-Schema Validation Infoset or the XQuery/XPath Data Model ?
  • Is your API read-only, or can you create/manipulate XML data in-memory as well? Do you have a separate read-only API that provides some advantages over the read-write API, e.g. performance advantages?
  • Is your API built on your language’s existing sequence/list/tree/etc. data structures, or do XML objects have their own structures? Do you allow non-XML data objects (strings, decimals, booleans, etc.) to be direct members of sequences/lists/etc., or is every object XML-specific?
  • How does your API deal with unnamed types in XML Schemas (unnamed “local” complex types or simple types). Can it support them unnamed, or does your API require them to be named (either manually or automatically)? APIs with automatically-generated names are not friendly for developers using those APIs.
  • Are you implementing support for binding XML data to user-defined object models?
  • Are you implementing standard APIs like SAX, DOM, StAX, JAXP, JAXB, .NET XML API?
  • Are you implementing XPath support? If so, XPath 1.0 or 2.0? Are you implementing all of XPath, or a subset? If you are implementing XPath 2.0, which contains much of XQuery 1.0, do you implement XQuery as well, or not? If you implement XQuery, do you implement XSLT 2.0 as well?
  • Do your XPaths only apply to special XML objects, or can they be used with non-XML objects (e.g. in the same manner as Jaxen for Java)?
  • Are you trying to hide the fact that the data is XML from developers, or are you trying to expose all of the XML-specifics? Are you trying to make it possible to work either way?
  • Do you need to support specific XML formats specially, e.g. XHTML?
  • Are you providing equal support for working with both “data-oriented” and “document-oriented” XML (in the sense of XML with little or no mixed content, versus XML that is mostly mixed content like XHTML)?
  • Are you providing support for people who want to work with XML in an order-insensitive way? (the XML default is that order is important, except for order of attributes in an element)

Further Topics

Something went wrong with that request. Please try again.