Skip to content

Getting started

Aaron S. Hawley edited this page May 14, 2021 · 22 revisions

The following documentation assumes you're familiar with and already using both the Scala programming language and the sbt build tool:

In Scala 2.11 and later, add the following to your build.sbt file's libraryDependencies:

"org.scala-lang.modules" %% "scala-xml" % "1.3.0"

For Scala 2.12 and later, you can use Scala XML 2.0 version:

"org.scala-lang.modules" %% "scala-xml" % "2.0.0"

You also need to use sbt version 1.1.2, or later. For earlier version of sbt, you need to add the following to your sbt build.sbt file:

fork := true

This runs your code in a forked process in sbt 1.1.1 and earlier, so that it doesn't conflict with the scala-xml that the Scala compiler depends.

Note: Earlier versions of sbt would ignore the scala-xml version without forking and just compile, run, and run tests against the version of scala-xml that the Scala compiler depends on (for Scaladoc).

Note: In Scala 2.12 and earlier, regardless of whether you included scala-xml as a dependency or not, you could use scala-xml from the console in sbt. Furthermore, there was no way to use a different version of scala-xml from the console in sbt than the one used by the compiler. This wasn't fixed until Scala 2.13 and later.

Once you've added the dependency, you can use XML literals, for example:

val book: scala.xml.Elem = <book id="b20234">Magic of scala-xml</book>

You can query XML values with an XPath-like syntax:

val id = book \@ "id"
id: String = b20234

val text = book.text
text: String = Magic of scala-xml

XML more often has sub-elements:

val books = 
  <books>
    <book id="b1615">Don Quixote</book>
    <book id="b1867">War and Peace</book>
  </books>

Retrieving the child elements is possible, but a little more complicated:

val titles = (books \ "book").map(book => book.text).toList
titles: List[String] = List(Don Quixote, War and Peace)

Many return types of scala-xml are Scala collections. If you aren't familiar with Scala collections, you should read the documentation for Scala collections.

Extracting the attribute values can be done with:

val ids = (books \ "book").map(book => book \@ "id").toList
ids: List[String] = List(b1615, b1867)

Many users assume they can write it as:

val ids = books \ "book" \@ "id"
ids: String = ""

The above compiles, but fails to return the attributes.

This is because the first path for book returns a collection of book elements:

val bookSeq = books \ "book"
bookSeq: NodeSeq = NodeSeq(<book id="b1615">...</book>, <book id="b1867">...</book)

And a sequence of book nodes do not have a single attribute.

Finding the text of an XML element by its id:

val quixote = (books \ "book").find(book => (book \@ "id") == "b1615").map(_.text)
quixote: Option[String] = Some(Don Quixote)

Most operations on collections can use Scala's for-comprehension. For example, consider the following XML data representing a purchase order:

<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
  <shipTo country="US">
    <name>Alice Smith</name> <street>123 Maple Street</street>
    <city>Mill Valley</city> <state>CA</state> <zip>90952</zip>
  </shipTo>
  <billTo country="US">
    <name>Robert Smith</name> <street>8 Oak Avenue</street>
    <city>Old Town</city> <state>PA</state> <zip>95819</zip>
  </billTo>
  <comment>Hurry, my lawn is going wild!</comment>
  <items>
    <item partNum="872-AA">
      <productName>Lawnmower</productName> <quantity>1</quantity>
      <USPrice>148.95</USPrice> <comment>Confirm this is electric</comment>
    </item>
    <item partNum="926-AA">
      <productName>Baby Monitor</productName> <quantity>1</quantity>
    <USPrice>39.98</USPrice> <shipDate>1999-05-21</shipDate>
    </item>
  </items>
</purchaseOrder>

This example is derived from similar code in the article "Scalable Programming Abstractions for XML Services" by Burak Emir, Sebastian Maneth and Martin Odersky. Here a file is loaded, and prices are retrieved for each item and summed together.

val doc = XML.loadFile("po.xml")
var total = BigDecimal(0).setScale(2, scala.math.BigDecimal.RoundingMode.HALF_UP)
for {
  item  <- doc \\ "item"
  price <- item \ "USPrice"
} yield {
  println("partnum: " + item \@ "partNum")
  total += price.text.toDouble
}
println(s"Grand total " + total)

The program will output:

partnum: 872-AA
partnum: 926-AA
Grand total 188.93

To open XML from files use scala.xml.XML:

val books = scala.xml.XML.loadFile("books.xml")

To write XML to a file:

scala.xml.XML.save("books.xml", books)

To format XML use the scala.xml.PrettyPrinter to configure the line length and indentation level:

val pp = new scala.xml.PrettyPrinter(24, 4)
pp.format(books)

The code above will output the following XML:

<books>
    <book id="b1615">
        Don Quixote
    </book>
    <book id="b1867">
        War and Peace
    </book>
</books>

To transform your XML based on pattern matches, use the scala.xml.transform.RuleTransformer in combination with one more scala.xml.transform.RewriteRule definitions.

For example, consider the following XML value for calendar data:

val doc = 
  <calendar>
    <week>
      <day>Monday</day>
      <day>Tuesday</day>
      <day>Wednesday</day>
      <day>Thursday</day>
      <day>Friday</day>
    </week>
    <year>
      <month>January</month>
      <month>February</month>
      <month>March</month>
    </year>
  </calendar>

Here's a rule for abbreviating just the days of the week:

val abbreviateDayRule = new RewriteRule {
  override def transform(n: Node): Seq[Node] = n match {
    case elem: Elem if elem.label == "day" =>
      elem.copy(child = elem.child collect {
        case Text(data) => Text(data.take(3))
      })
    case n => n
  }
}

You can then create a transformer, and transform the document:

val transform = new RuleTransformer(abbreviateDayRule)
transform(doc)

Producing:

<calendar>
  <week>
    <day>Mon</day>
    <day>Tue</day>
    <day>Wed</day>
    <day>Thu</day>
    <day>Fri</day>
  </week>
  <year>
    <month>January</month>
    <month>February</month>
    <month>March</month>
  </year>
</calendar>

Multiple rules can be combined together. Here is a rule for removing Fridays, and adding Saturdays.

val addSaturdayRule = new RewriteRule {
  override def transform(n: Node): Seq[Node] = n match {
    case elem: Elem if elem.label == "week" =>
      elem.copy(child = (elem.child ++ <day>Saturday</day>))
    case n => n
  }
}
val deleteFridayRule = new RewriteRule {
  override def transform(n: Node): Seq[Node] = n match {
    case elem: Elem if elem.label == "day" && elem.text == "Friday" => NodeSeq.Empty
    case n => n
  }
}
val transform = new RuleTransformer(addSaturdayRule, deleteFridayRule)
transform(doc)

Here the day Friday is removed and Saturday is added:

<calendar>
  <week>
    <day>Monday</day>
    <day>Tuesday</day>
    <day>Wednesday</day>
    <day>Thursday</day>
    <day>Saturday</day>
  </week>
  <year>
    <month>January</month>
    <month>February</month>
    <month>March</month>
  </year>
</calendar>

Keep in mind that rewrite rules won't compose if they modify children, or modify values that other rewrite rules depend on.

For example, abbreviating the days, and then trying to delete Friday won't work since "Friday" no longer exists.

val transform = new RuleTransformer(abbreviateDayRule, deleteFridayRule)
transform(doc)

Produces a calendar with fridays, still:

<calendar>
  <week>
    <day>Mon</day>
    <day>Tue</day>
    <day>Wed</day>
    <day>Thu</day>
    <day>Fri</day>
  </week>
  <year>
    <month>January</month>
    <month>February</month>
    <month>March</month>
  </year>
</calendar>