New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More secure parsing #17

adriaanm opened this Issue Nov 20, 2013 · 15 comments


None yet

adriaanm commented Nov 20, 2013

@jroper says to add the following to XMLLoader.parser:


try { 
  f.setFeature("", false);
  f.setFeature("", true);
} catch {
  case e: ParserConfigurationException => // warn that the SAXParserFactory supplied by the JDK doesn't support this feature, and that the application may therefore be vulnerable to external entity attacks, encourage to define your own parser instead
  case e: SAXNotRecognizedExcetpion => // as above
  case e: SaxNotSupportedException => // as above

This comment has been minimized.

milessabin commented Mar 23, 2015

This IETF BCP is worth a read:


This comment has been minimized.


retronym commented Mar 23, 2015

Thanks, @milessabin!

Quoting the juicy parts:

XML mechanisms that follow external references (Section 4.14) may
also expose an implementation to various threats by causing the
implementation to access external resources automatically. It is
important to disallow arbitrary access to such external references
within XML data from untrusted sources. Many XML grammars define
constructs using URIs for external references; in such cases, the
same precautions must be taken.


The authors would like to thank the following people who have
provided significant contributions to the development of this

Mark Baker, Tim Berners-Lee, Tim Bray, James Clark, ... , Miles Sabin ...

A blast from the past :)


This comment has been minimized.

jroper commented Mar 23, 2015

Something not mentioned in the RFC is the memory issues that doctypes introduce, including the billion laughs vulnerability (recursive entity references leading to exponential memory usage) and the quadratic blowup vulnerability (many references to a single large entity leading to quadratic memory usage). Both of these vulnerabilities allow an attacker, with a small payload (as small as 100B for billion laughs, as small as 200KB for quadratic blowup) to cause a JVM to OOME. While the JDKs XML parser does have protection against billion laughs (recursion limits, but off by default), it doesn't have any protection against quadratic blowup. So the only safe way to handle untrusted XML on the JVM is to disallow doctypes altogether. It would be nice if the JDK XML parsers allowed you to simple ignore doctypes, but unfortunately, they don't, either it accepts the doctype, or fails if the doctype is present.

Disallowing doctypes is likely to cause some problems for users - many systems sending XML will automatically add a doctype, and most users will be frustrated that Scala doesn't accept this. So, we need to consider that if we decide to introduce this as a default. On Play, we do have users every so often report this issue, but we find that after explaining to them that if doctypes were allowed, the attacker could take down their webapp with a single request, they're happy to change the sending system to not include a doctype.


This comment has been minimized.

milessabin commented Mar 23, 2015

Indeed ... I think Billion Laughs came a little later.


This comment has been minimized.

dnvriend commented Mar 27, 2015

Last year I got a warning from a friendly hacker that warned me about this issue in JBoss Fuse 6.0.0 that we used then. We also use spray and accept XML. Other components also use the SecureXML. The xxeFilter is also used by other routes. The code I created then:

object SecureXml {
  def loadString(xml: String): NodeSeq = {
      val spf = SAXParserFactory.newInstance()
      spf.setFeature("", false)
      spf.setFeature("", true)
      val saxParser = spf.newSAXParser()

The 'xxeFilter' is:

trait XxeFilter {
  def log: LoggingAdapter

  implicit def executionContext: ExecutionContext

  def xxeFilter(xxe: String): Future[String] = {"Validating xml: [{}]", xxe)

  def throwError(xxe: String, message: String): String = {
    log.error("{}:\n [{}]", message, xxe)
    throw new Error(message)

  private def xmlMustNotContain(xxe: String): String = xxe match {
    case _ if xxe.toUpperCase.contains("<!DOCTYPE") => throwError(xxe, "XML contains DOCTYPE declaration")
    case _ if xxe.toUpperCase.contains("<!ENTITY") => throwError(xxe, "XML contains ENTITY declaration")
    case _ if xxe.toUpperCase.contains("<!ELEMENT") => throwError(xxe, "XML contains ELEMENT declaration")
    case _ if xxe.toUpperCase.contains("SYSTEM") => throwError(xxe, "XML contains SYSTEM declaration")
    case _ => xxe

  private def checkForXXE(xxe: String): String = {

  private def xmlDataMustStartWith(xxe: String): String = xxe match {
    case _ if xxe.startsWith("//removed - non open source//") && xxe.contains("""xmlns="//removed non open-source"""") => checkForXXE(xxe)
    // snip
    case _ => throwError(xxe, "received XML does not start with xml declaration or //snip // element or namespace is not correct.")

  private def xmlMustStartWith(xxe: String): String = xxe match {
    case _ if xxe.startsWith("<?xml") => xmlDataMustStartWith(xxe.substring(xxe.indexOf("?>") + 2, xxe.size).trim)
    case _ => xmlDataMustStartWith(xxe)


trait BasicRoute extends Directives {
  implicit def executionContext: ExecutionContext
  implicit def timeout: Timeout
  implicit def system: ActorSystem

  def badRequestXml = "//snip//"

  def internalServiceErrorXml(description: String, t: Option[Throwable]) = "//snip//"

  def validationError(code: String, description: String) = "//snip//"

  def completeWithBadRequest = complete(BadRequest, badRequestXml.toString())

  def completeWithError(throwable: Throwable) = complete(InternalServerError, internalServiceErrorXml(s"An error occurred: ${throwable.getMessage}", Option(throwable)))

  def completeWithError = complete(InternalServerError, internalServiceErrorXml(s"An error occurred", None))

  def completeWithValidationError(code: String, description: String) = complete(BadRequest, validationError(code, description))

use in route:

trait ARoute extends BasicRoute with XxeFilter {

  lazy val ARoute: Route =
    path("//snip//") {
      post {
        entity(as[String]) { xxe =>
          onComplete(xxeFilter(xxe)) {
            case Success(xml) =>
                camel(xml).map { _ =>
                }.recover { case t: Throwable =>
                }.getOrElse {
            case Failure(ex) => completeWithBadRequest

  def camel(xml: String): Try[Unit]

This comment has been minimized.


adriaanm commented Mar 27, 2015

Thanks, @dnvriend! Maybe we could include this so that others can just call 'xml.beSafe'?


This comment has been minimized.

dnvriend commented Mar 27, 2015

Or just be safe... implicitly :)


This comment has been minimized.

jawshooah commented Aug 3, 2015

Any plans to implement this? I'd love it if scala-xml had XXE prevention bundled in.


This comment has been minimized.


biswanaths commented Sep 11, 2015

Let me have a look at this.


This comment has been minimized.


EdgeCaseBerg commented Oct 1, 2015

curious, has any work been done on this?


This comment has been minimized.


biswanaths commented Mar 14, 2016

Changing the parser features as suggest while creating the parser should be simple enough.

What should be the route to upgrade ? We can change the code but not the data. Anybody who upgrades to latest version there is a possibility that some data path might fail.

  • Option A:
    Provide a non-default secure parser. With warning generated, for using the default non-safe parser.
  • Option B:
    Make default parser to safe and allow a non-default, unsafe parser if anybody wants to fall back on.

Please let me know, how we should proceed with this.


This comment has been minimized.


EdgeCaseBerg commented Mar 15, 2016

If Option A: Then people who upgrade their code will see the warnings right away and know they should put some time into implementing the necessary changes.

If Option B: Then people who upgrade their code will be happily unaware of issues with any of their other software that might be at risk if using non-updated versions.

Option A seems to increase awareness of these type of issues in general which I'd count as a +1 to that approach, but I can see why many would like to have safety by default and would advocate for that change instead.


This comment has been minimized.

v6ak commented Mar 15, 2016

I am against option B for some other major reason: It might increase fear of security updates (as it might break things), which is bad for security. What is worse, everything will compile after the change, unit tests will likely also pass, but the app still might break. (I know, integration tests should theoretically catch it, but still… Even in the case that integration tests catch the issue, it does not encourage that updates are painless.)

Maybe the only advantage of Option B is forcing old code to secure mode. Maybe this goal might be achieved with some tradeoff. For example, we would have three objects:

  • scala.xml.SecureXml – newly added object with secure behavior
  • scala.xml.InsecureXml – newly added object with compatible (potentially insecure) behavior
  • scala.xml.Xml – old deprecated object; By default, it behaves as scala.xml.InsecureXml for compatibility reasons. However, one might globally change the behavior to scala.xml.SecureXml by switching some system property. While I am generally not an advocate of global config, I consider this to be the least evil there.

We might call this tradeoff as Option C.

Finally, there is an Option D. It is like Option C, but with scala.xml.Xml object completely removed. As a result, all programmers would be forced to rewrite the code using the legacy scala.xml.Xml. Or maybe all the methods of scala.xml.Xml object would be marked as synthetic and start throwing an exception.

Scenarios of upgrade:

  1. Programmer upgrades to a new version of the library. A: Programmer is warned and will hopefully take the action. B: Code will compile without warning, but might silently break. C: Programmer is warned and will hopefully take the action. D: Programmer is forced to take the action.
  2. Programmer upgrades another library that evicts the old version. The code is, however, compiled against the old version, maybe because it is part of a 3rd party library. A: Programmer is not warned. B: Code will compile without warning, but might silently break. C: Programmer is not warned, the code will not break (but might be insecure). D: The code will suddenly start throwing weird errors at runtime. (Grrrr!)
  3. Someone wants to force the secure behavior for all the application (with taking the risks of potential incompatibility). I suppose that there is the new XML library version in the project. A: No way. B: Automatically done. C: Can be done by setting a global system property. D: Forced manual action.

The Option A covers only one of these three scenarios.

The Option B covers all these three scenarios. The cost is, however, potential silent BC breaks and discouraging users from security updates.

Option C covers scenarios 1 and 3 and performs definitely better than Option A. While Option C seems to fail at the scenario 2, it is not as bad as it might sound. If the programmer also uses the XML library directly, the scenario 3 might be triggered.

The Option D might look like the safest option, because it forces taking some action, but it is not so simple. At the scenario 3, the programmer can't simply evict the old version of scala-xml. All the relevant libraries are required to be upgraded before, which can actually delay the fix. In such case, the Option D performs significantly worse than others.

In short term: I am in favour of Option C. The Option A looks the second best for me. Options B and D look wrong for me in the short term.

In long term: After the old object or methods being deprecated for some period of time, I'd use the Option D and remove the deprecated scala.xml.Xml object.


This comment has been minimized.


biswanaths commented Sep 22, 2016

Anybody else has any opinion please let us know. I think we should try to get a satisfactory solution to this.


This comment has been minimized.

adrianbn commented May 17, 2017

Just curious since this issue seems old and still open: what is the recommended way to disable insecure features with scala-xml? Maybe override XMLLoader.parser with an implementation that includes the following?

spf.setFeature("", true);
spf.setFeature("", false)
spf.setFeature("", false)

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment