Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please document a way to not wait for high-quality entropy for hash tables #91

Closed
smcv opened this issue Jul 20, 2017 · 7 comments
Closed

Comments

@smcv
Copy link

smcv commented Jul 20, 2017

dbus-daemon --system is a system service that parses XML configuration using libexpat. It is rather important on many recent OSs, and is typically started very early in the boot process, especially if process 1 is systemd.

@hewittc reported a boot sequence issue on an ARM device running Arch Linux, apparently triggered by a libexpat upgrade: with libexpat >= 2.2.1, system services like systemd-logind and systemd process 1 would frequently fail to connect to the dbus-daemon. This caused subsequent uses of systemd-logind (in particular system logins) to fail, making the system rather broken.

Enabling verbose logging for the dbus-daemon indicated that the dbus-daemon starts 7.2 seconds after boot, stops logging anything at 7.3 seconds, then wakes up again and resumes logging things at 38.1 seconds. That's shortly after the kernel reports this:

[   37.789094] ngsa-2 kernel: random: nonblocking pool is initialized

which I think is the time at which getrandom() stops blocking. So I suspect that the dbus-daemon is blocking in libexpat for approximately 30 seconds while waiting for entropy. 30ish seconds is too long for the timeouts used in systemd, systemd-logind and systemd-networkd - by the time the dbus-daemon has woken up, those system services have already timed out.

I assume that the motivation for using high-quality entropy was to avoid hash collision complexity attacks involving crafted XML files that collide? If this is the case, then dbus-daemon does not actually need this: the only XML that we parse is completely trusted (it's the dbus-daemon's configuration). So I think this failure mode could be avoided by asking libexpat to not wait for high-quality entropy.

Is XML_SetHashSalt() the API I am looking for? It looks as though, if I call that with some arbitrary nonzero hash_salt parameter (1 would do), generate_hash_secret_salt() and hence writeRandomBytes_getrandom() might be skipped? However, I'm a little wary about that because this usage (and the special case that 0 doesn't work) is not documented, and #47 suggests that how it works might change.

Alternatively, I could add a call to some newly added function, XML_UseWeakRandomness() or something, with a configure check to make it compile-time optional. Or both.

@smcv
Copy link
Author

smcv commented Jul 20, 2017

Answering the questions from #90.

(For context, I am a dbus maintainer.)

Is the XML content read from the network? (I wonder how much entropy/DoS matters here.)

It does not matter at all. The XML that dbus-daemon reads is totally trusted: it's the dbus-daemon's own configuration file. So if this high-quality randomness is just a way to defeat hash table algorithmic complexity attacks, we don't need it.

If we had a callback to have the code calling Expat serve entropy to it, the calling code would be dbus (not systemd, right?) and it would need to know that it it's during early boot stage and that serving crappy entropy is okay and be told about that but made to stop that behavior later?

I would prefer to have a way to tell Expat "don't bother using a hash salt, we trust this XML" or "the hash salt is 42", rather than supplying it with a callback to produce entropy and using https://xkcd.com/221/ for that callback - because if I did the latter, I'd be worried that Expat might start using my entropy callback for some other purpose in addition to salt for the hash, for which it would be unsuitable.

Doing this on a per-XML_Parser basis seems good, if that's straightforward to implement - when we instantiate the XML_Parser, we know what it will be parsing and whether we trust it. In the case of dbus-daemon we currently trust all the XML we parse, but that might not be true for other API users.

I don't need to know whether the parsing is taking place during early boot, because if I trust the XML, it's always acceptable to use a weak hash (we aren't going to get pathological input designed to break the hash algorithm anyway), and if I don't trust the XML, it's always wrong to use a weak hash.

@hartwork
Copy link
Member

Hi Simon,

excellent report, thanks for that! Related pull request #90 came in only 8 hours ago. I dumped a few thoughts over there already but let's do the discussion here. @cometzero, please feel invited.

More later.

@smcv
Copy link
Author

smcv commented Jul 20, 2017

if I don't trust the XML, it's always wrong to use a weak hash

Having said that... I'm not sure whether you really need a blocking getrandom() to defeat the algorithmic complexity attack, or whether it's enough to just use a best-effort non-blocking approach. The difference only matters when the system is sufficiently early in its boot process that the kernel is not yet confident that the requested amount of entropy has been collected.

If something is parsing untrusted XML during early boot, and it's something important enough that it runs at that stage, then I'm not sure that the deliberate denial of service caused by malicious people submitting crafted XML files that will make the parser eat CPU time (the attack you are defending against) is actually significantly worse than the accidental denial of service caused by this important service blocking for a while :-)

@smcv
Copy link
Author

smcv commented Jul 20, 2017

@cometzero's report in #90 confirms my informed guess about writeRandomBytes_getrandom() being the function to blame, so that's good to know.

I don't think udev is relevant here: it doesn't use D-Bus or XML. systemd is only relevant here because several of its services make use of D-Bus sufficiently close to the beginning of the boot process that systems with few sources of entropy (embedded systems are typically this) might not have initialized the random pool yet.

@hartwork
Copy link
Member

hartwork commented Jul 20, 2017

For a temporary quickfix, I think XML_SetHashSalt with a non-zero number may work well to communicate "I do not need protection against hash collisions for this XML_Parser instance".
Mid-term, I cannot guarantee that that function will not turn into a no-op (as sizeof(long) is fewer bytes that we want now (issue #47) and there has also been mis-use of XML_SetHashSalt (see #18) at the cost of the end user before #31 helped improve the situation that was part of the problem).

There is no existing API besides XML_SetHashSalt that would allow communicating disinterest in entropy, so XML_SetHashSalt and adapting the call to getrandom may be our only options.

Maybe you're right that entropy for DoS protection is not worth blocking until the nonblocking pool is initialized.

@smcv
Copy link
Author

smcv commented Jul 20, 2017

Mid-term, I cannot guarantee that that function will not turn into a no-op (as sizeof(long) is fewer bytes that we want now, issue #47 and there has also been mis-use of feature (see #18)

I find it hard to have much sympathy for anyone who calls XML_SetHashSalt (parser, 42) and expects it to result in a hash salt with more than 0 bits of entropy!

The code referenced in #18 seems to have been people working around a bug in libexpat's solution to CVE-2012-0876, by disabling that solution and bringing back CVE-2012-0876 (albeit with a different crafted XML document required to get collisions).

I think I'll add a XML_SetHashSalt() call for now - even if it becomes a no-op, it can't make matters any worse. We can add a patch for a new XML_IDontNeedEntropy() (or whatever you call it) if you add one later (please do this before making XML_SetHashSalt() a no-op, if you do).

@hartwork
Copy link
Member

hartwork commented Aug 12, 2017

I believe with #92 merged and released, this ticket can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants