Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
More secure parsing #17
@jroper says to add the following to XMLLoader.parser:
Quoting the juicy parts:
A blast from the past :)
Something not mentioned in the RFC is the memory issues that doctypes introduce, including the billion laughs vulnerability (recursive entity references leading to exponential memory usage) and the quadratic blowup vulnerability (many references to a single large entity leading to quadratic memory usage). Both of these vulnerabilities allow an attacker, with a small payload (as small as 100B for billion laughs, as small as 200KB for quadratic blowup) to cause a JVM to OOME. While the JDKs XML parser does have protection against billion laughs (recursion limits, but off by default), it doesn't have any protection against quadratic blowup. So the only safe way to handle untrusted XML on the JVM is to disallow doctypes altogether. It would be nice if the JDK XML parsers allowed you to simple ignore doctypes, but unfortunately, they don't, either it accepts the doctype, or fails if the doctype is present.
Disallowing doctypes is likely to cause some problems for users - many systems sending XML will automatically add a doctype, and most users will be frustrated that Scala doesn't accept this. So, we need to consider that if we decide to introduce this as a default. On Play, we do have users every so often report this issue, but we find that after explaining to them that if doctypes were allowed, the attacker could take down their webapp with a single request, they're happy to change the sending system to not include a doctype.
Last year I got a warning from a friendly hacker that warned me about this issue in JBoss Fuse 6.0.0 that we used then. We also use spray and accept XML. Other components also use the SecureXML. The xxeFilter is also used by other routes. The code I created then:
The 'xxeFilter' is:
use in route:
Changing the parser features as suggest while creating the parser should be simple enough.
What should be the route to upgrade ? We can change the code but not the data. Anybody who upgrades to latest version there is a possibility that some data path might fail.
Please let me know, how we should proceed with this.
If Option A: Then people who upgrade their code will see the warnings right away and know they should put some time into implementing the necessary changes.
If Option B: Then people who upgrade their code will be happily unaware of issues with any of their other software that might be at risk if using non-updated versions.
Option A seems to increase awareness of these type of issues in general which I'd count as a +1 to that approach, but I can see why many would like to have safety by default and would advocate for that change instead.
I am against option B for some other major reason: It might increase fear of security updates (as it might break things), which is bad for security. What is worse, everything will compile after the change, unit tests will likely also pass, but the app still might break. (I know, integration tests should theoretically catch it, but still… Even in the case that integration tests catch the issue, it does not encourage that updates are painless.)
Maybe the only advantage of Option B is forcing old code to secure mode. Maybe this goal might be achieved with some tradeoff. For example, we would have three objects:
We might call this tradeoff as Option C.
Finally, there is an Option D. It is like Option C, but with scala.xml.Xml object completely removed. As a result, all programmers would be forced to rewrite the code using the legacy scala.xml.Xml. Or maybe all the methods of scala.xml.Xml object would be marked as synthetic and start throwing an exception.
Scenarios of upgrade:
The Option A covers only one of these three scenarios.
The Option B covers all these three scenarios. The cost is, however, potential silent BC breaks and discouraging users from security updates.
Option C covers scenarios 1 and 3 and performs definitely better than Option A. While Option C seems to fail at the scenario 2, it is not as bad as it might sound. If the programmer also uses the XML library directly, the scenario 3 might be triggered.
The Option D might look like the safest option, because it forces taking some action, but it is not so simple. At the scenario 3, the programmer can't simply evict the old version of scala-xml. All the relevant libraries are required to be upgraded before, which can actually delay the fix. In such case, the Option D performs significantly worse than others.
In short term: I am in favour of Option C. The Option A looks the second best for me. Options B and D look wrong for me in the short term.
In long term: After the old object or methods being deprecated for some period of time, I'd use the Option D and remove the deprecated scala.xml.Xml object.
Just curious since this issue seems old and still open: what is the recommended way to disable insecure features with scala-xml? Maybe override XMLLoader.parser with an implementation that includes the following?