multiprocessing.connection challenge implicitly uses MD5 #61460
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
assignee = None closed_at = None created_at = <Date 2013-02-20.20:11:36.540> labels = ['3.7', 'library'] title = 'multiprocessing.connection challenge implicitly uses MD5' updated_at = <Date 2020-06-04.17:22:43.520> user = 'https://github.com/davidmalcolm'
activity = <Date 2020-06-04.17:22:43.520> actor = 'miss-islington' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = <Date 2013-02-20.20:11:36.540> creator = 'dmalcolm' dependencies =  files = ['29134'] hgrepos =  issue_num = 17258 keywords = ['patch'] message_count = 7.0 messages = ['182547', '182550', '182553', '309103', '309134', '370705', '370717'] nosy_count = 7.0 nosy_names = ['barry', 'doko', 'pitrou', 'christian.heimes', 'dmalcolm', 'sbt', 'miss-islington'] pr_nums = ['16264', '20380', '20412', '20626'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = None url = 'https://bugs.python.org/issue17258' versions = ['Python 3.7']
The text was updated successfully, but these errors were encountered:
Within multiprocessing.connection, deliver_challenge() and
hmac implicitly defaults to using MD5.
MD5 should no longer be used for security purposes. See e.g.
This fails in a FIPS-compliant environment (e.g. with the patches I
There's thus a possibility of an attacker defeating the multiprocessing
I'm attaching a patch which changes multiprocessing to use a clearly
It's not clear to me whether hmac.py should also be changed (this would
[Note to self: I'm tracking this downstream for RHEL as
The statement "MD5 should no longer be used for security purposes" is not entirely correct. MD5 should no longer be used as cryptographic hash function for signatures. However HMAC-MD5 is a different story.
The attacks on HMAC-MD5 do not seem to indicate a practical
I agree that we should slowly migrate to a more modern MAC such as HMAC-SHA256. AES-CBC is too hard to get right and most AES implementation are vulnerable to timing attacks, too.
How about we include the name of the MAC in multiprocessing's wire protocol and define "no MAC name given" as HMAC-MD5? Please don't call it SHA256 but HMAC-SHA256, too.
Banning md5 as a matter of policy may be perfectly sensible.
However, I think the way multiprocessing uses hmac authentication is *not* affected by the collision attacks the advisory talks about. These depend on the attacker being able to determine for himself whether a particular candidate string is a "solution".
But with the way multiprocessing uses hmac authentication there is no way for the attacker to check for himself whether a candidate string has the desired hash: he does not know what the desired hash value is, or even what the hash function is. (The effective hash function, though built on top of md5, depends on the secret key.)
Dave, are you still interested to address the issue?
I think it's a good idea to replace HMAC-MD5 in the long run. But instead of hard-coding another hash algorithm, I would like to see an improved handshake protocol that supports flexible authentication algorithms. You could send an algorithm indicator (e.g. HMAC_SHA256) in the request.
It would be really cool to run multiprocessing protocol over TLS with support for SASL with SCRAM or EXTERNAL (TLS cert auth, AF_UNIX PEERCRED, GSSAPI)...
So #20380 is a more complicated version of my draft PR above. In other PRs and issues related to this in the past, I see one claim by @tiran in particular that bothers me - #16264 (comment) - "The change breaks backward compatibility. multiprocessing supports distributed computing across multiple machines and works with multiple Python versions. With the change a controller with Python 3.N+1 would no longer be able to talk to a 3.N server or the other way around."
What evidence is there that multi-python-version use of multiprocessing rather than use as a single Python process launching and controlling a bunch of children is a supported use case? People really should not be using multiprocessing that way. This isn't a distributed computing system.
…tocol (gh-99623) Describe the multiprocessing connection protocol. It isn't a good protocol, but it is what it is. This way we can more easily reason about making changes to it in a backwards compatible way.
bpo-17258: `multiprocessing` now supports stronger HMAC algorithms for inter-process connection authentication rather than only HMAC-MD5. Signed-off-by: Christian Heimes <email@example.com> gpshead: I Reworked to be more robust while keeping the idea. The protocol modification idea remains, but we now take advantage of the message length as an indicator of legacy vs modern protocol version. No more regular expression usage. We now default to HMAC-SHA256, but do so in a way that will be compatible when communicating with older clients or older servers. No protocol transition period is needed. More integration tests to verify these claims remain true are required. I'm unaware of anyone depending on multiprocessing connections between different Python versions. --------- Signed-off-by: Christian Heimes <firstname.lastname@example.org> Co-authored-by: Gregory P. Smith [Google] <email@example.com>