Please document a way to not wait for high-quality entropy for hash tables #91

smcv · 2017-07-20T18:52:20Z

dbus-daemon --system is a system service that parses XML configuration using libexpat. It is rather important on many recent OSs, and is typically started very early in the boot process, especially if process 1 is systemd.

@hewittc reported a boot sequence issue on an ARM device running Arch Linux, apparently triggered by a libexpat upgrade: with libexpat >= 2.2.1, system services like systemd-logind and systemd process 1 would frequently fail to connect to the dbus-daemon. This caused subsequent uses of systemd-logind (in particular system logins) to fail, making the system rather broken.

Enabling verbose logging for the dbus-daemon indicated that the dbus-daemon starts 7.2 seconds after boot, stops logging anything at 7.3 seconds, then wakes up again and resumes logging things at 38.1 seconds. That's shortly after the kernel reports this:

[   37.789094] ngsa-2 kernel: random: nonblocking pool is initialized

which I think is the time at which getrandom() stops blocking. So I suspect that the dbus-daemon is blocking in libexpat for approximately 30 seconds while waiting for entropy. 30ish seconds is too long for the timeouts used in systemd, systemd-logind and systemd-networkd - by the time the dbus-daemon has woken up, those system services have already timed out.

I assume that the motivation for using high-quality entropy was to avoid hash collision complexity attacks involving crafted XML files that collide? If this is the case, then dbus-daemon does not actually need this: the only XML that we parse is completely trusted (it's the dbus-daemon's configuration). So I think this failure mode could be avoided by asking libexpat to not wait for high-quality entropy.

Is XML_SetHashSalt() the API I am looking for? It looks as though, if I call that with some arbitrary nonzero hash_salt parameter (1 would do), generate_hash_secret_salt() and hence writeRandomBytes_getrandom() might be skipped? However, I'm a little wary about that because this usage (and the special case that 0 doesn't work) is not documented, and #47 suggests that how it works might change.

Alternatively, I could add a call to some newly added function, XML_UseWeakRandomness() or something, with a configure check to make it compile-time optional. Or both.

The text was updated successfully, but these errors were encountered:

smcv · 2017-07-20T19:04:14Z

Answering the questions from #90.

(For context, I am a dbus maintainer.)

Is the XML content read from the network? (I wonder how much entropy/DoS matters here.)

It does not matter at all. The XML that dbus-daemon reads is totally trusted: it's the dbus-daemon's own configuration file. So if this high-quality randomness is just a way to defeat hash table algorithmic complexity attacks, we don't need it.

If we had a callback to have the code calling Expat serve entropy to it, the calling code would be dbus (not systemd, right?) and it would need to know that it it's during early boot stage and that serving crappy entropy is okay and be told about that but made to stop that behavior later?

I would prefer to have a way to tell Expat "don't bother using a hash salt, we trust this XML" or "the hash salt is 42", rather than supplying it with a callback to produce entropy and using https://xkcd.com/221/ for that callback - because if I did the latter, I'd be worried that Expat might start using my entropy callback for some other purpose in addition to salt for the hash, for which it would be unsuitable.

Doing this on a per-XML_Parser basis seems good, if that's straightforward to implement - when we instantiate the XML_Parser, we know what it will be parsing and whether we trust it. In the case of dbus-daemon we currently trust all the XML we parse, but that might not be true for other API users.

I don't need to know whether the parsing is taking place during early boot, because if I trust the XML, it's always acceptable to use a weak hash (we aren't going to get pathological input designed to break the hash algorithm anyway), and if I don't trust the XML, it's always wrong to use a weak hash.

hartwork · 2017-07-20T19:06:12Z

Hi Simon,

excellent report, thanks for that! Related pull request #90 came in only 8 hours ago. I dumped a few thoughts over there already but let's do the discussion here. @cometzero, please feel invited.

More later.

smcv · 2017-07-20T19:10:45Z

if I don't trust the XML, it's always wrong to use a weak hash

Having said that... I'm not sure whether you really need a blocking getrandom() to defeat the algorithmic complexity attack, or whether it's enough to just use a best-effort non-blocking approach. The difference only matters when the system is sufficiently early in its boot process that the kernel is not yet confident that the requested amount of entropy has been collected.

If something is parsing untrusted XML during early boot, and it's something important enough that it runs at that stage, then I'm not sure that the deliberate denial of service caused by malicious people submitting crafted XML files that will make the parser eat CPU time (the attack you are defending against) is actually significantly worse than the accidental denial of service caused by this important service blocking for a while :-)

smcv · 2017-07-20T19:14:55Z

@cometzero's report in #90 confirms my informed guess about writeRandomBytes_getrandom() being the function to blame, so that's good to know.

I don't think udev is relevant here: it doesn't use D-Bus or XML. systemd is only relevant here because several of its services make use of D-Bus sufficiently close to the beginning of the boot process that systems with few sources of entropy (embedded systems are typically this) might not have initialized the random pool yet.

hartwork · 2017-07-20T19:36:30Z

For a temporary quickfix, I think XML_SetHashSalt with a non-zero number may work well to communicate "I do not need protection against hash collisions for this XML_Parser instance".
Mid-term, I cannot guarantee that that function will not turn into a no-op (as sizeof(long) is fewer bytes that we want now (issue #47) and there has also been mis-use of XML_SetHashSalt (see #18) at the cost of the end user before #31 helped improve the situation that was part of the problem).

There is no existing API besides XML_SetHashSalt that would allow communicating disinterest in entropy, so XML_SetHashSalt and adapting the call to getrandom may be our only options.

Maybe you're right that entropy for DoS protection is not worth blocking until the nonblocking pool is initialized.

smcv · 2017-07-20T19:48:26Z

Mid-term, I cannot guarantee that that function will not turn into a no-op (as sizeof(long) is fewer bytes that we want now, issue #47 and there has also been mis-use of feature (see #18)

I find it hard to have much sympathy for anyone who calls XML_SetHashSalt (parser, 42) and expects it to result in a hash salt with more than 0 bits of entropy!

The code referenced in #18 seems to have been people working around a bug in libexpat's solution to CVE-2012-0876, by disabling that solution and bringing back CVE-2012-0876 (albeit with a different crafted XML document required to get collisions).

I think I'll add a XML_SetHashSalt() call for now - even if it becomes a no-op, it can't make matters any worse. We can add a patch for a new XML_IDontNeedEntropy() (or whatever you call it) if you add one later (please do this before making XML_SetHashSalt() a no-op, if you do).

hartwork · 2017-08-12T22:05:50Z

I believe with #92 merged and released, this ticket can be closed.

hartwork mentioned this issue Jul 20, 2017

use GRND_NONBLOCK of getrandom call #90

Closed

hartwork mentioned this issue Jul 20, 2017

Nonblock fail getrandom #92

Merged

smcv mentioned this issue Jul 21, 2017

Uses a relatively short timeout when connecting to D-Bus systemd/systemd#6418

Closed

hartwork closed this as completed Aug 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please document a way to not wait for high-quality entropy for hash tables #91

Please document a way to not wait for high-quality entropy for hash tables #91

smcv commented Jul 20, 2017

smcv commented Jul 20, 2017

hartwork commented Jul 20, 2017

smcv commented Jul 20, 2017

smcv commented Jul 20, 2017

hartwork commented Jul 20, 2017 •

edited

Loading

smcv commented Jul 20, 2017

hartwork commented Aug 12, 2017 •

edited

Loading

Please document a way to not wait for high-quality entropy for hash tables #91

Please document a way to not wait for high-quality entropy for hash tables #91

Comments

smcv commented Jul 20, 2017

smcv commented Jul 20, 2017

hartwork commented Jul 20, 2017

smcv commented Jul 20, 2017

smcv commented Jul 20, 2017

hartwork commented Jul 20, 2017 • edited Loading

smcv commented Jul 20, 2017

hartwork commented Aug 12, 2017 • edited Loading

hartwork commented Jul 20, 2017 •

edited

Loading

hartwork commented Aug 12, 2017 •

edited

Loading