Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of md5 hash causes FIPS error, requires usedforsecurity=False #2270

Closed
darugar opened this issue Feb 13, 2024 · 14 comments
Closed

Use of md5 hash causes FIPS error, requires usedforsecurity=False #2270

darugar opened this issue Feb 13, 2024 · 14 comments
Labels

Comments

@darugar
Copy link

darugar commented Feb 13, 2024

Use of the md5 hashing algorithm is not permitted in FIPS systems, exhibiting the error:

else hashlib.md5(name.encode("utf8")).hexdigest()[:6],
ValueError: [digital envelope routines] unsupported

This can be fixed via hashlib.md5(usedforsecurity=False):

in database.py:

return hashlib.md5(self.name.encode("utf8"), usedforsecurity=False).hexdigest()[:6]

in init.py:

    md5_suffix = hashlib.md5(s.encode("utf8"), usedforsecurity=False).hexdigest()[:6]

I can do a PR if that's easier - I've tested these updates locally and in FIPS environments and they work, but pytest does not pass tests (master is also failing the pytests in the same way).

Note that a similar issue exists in the pint library where hashlib.blake2b is used - in order to allow datasette to run in FIPS this needs to be modified to, for example, hashlib.sha512 .

@simonw
Copy link
Owner

simonw commented Feb 14, 2024

There's one problem here: it looks like usedforsecurity=False was added in Python 3.9, but Datasette still supports Python 3.8 until at least its EOL in October 2024: https://devguide.python.org/versions/

I'm happy to add it with a Python version check or similar, but would the FIPS scanning system identify something like the following?

import hashlib
import sys

def non_security_md5(text):
    try:
        return hashlib.md5(text.encode("utf8"), usedforsecurity=False).hexdigest()
    except TypeError:
        # usedforsecurity is not supported
        return hashlib.md5(text.encode("utf8")).hexdigest()

If that still trips the security filter I'm not sure what to do here - I want to be able to support Python 3.8.

@simonw
Copy link
Owner

simonw commented Feb 14, 2024

Since this is purely a cosmetic thing and we're pre-Datasette-1.0 I'd be OK swapping MD5 for SHA256 here to please the filter.

@simonw
Copy link
Owner

simonw commented Feb 14, 2024

Code in question:

@property
def color(self):
if self.hash:
return self.hash[:6]
return hashlib.md5(self.name.encode("utf8")).hexdigest()[:6]

def to_css_class(s):
"""
Given a string (e.g. a table name) returns a valid unique CSS class.
For simple cases, just returns the string again. If the string is not a
valid CSS class (we disallow - and _ prefixes even though they are valid
as they may be confused with browser prefixes) we strip invalid characters
and add a 6 char md5 sum suffix, to make sure two tables with identical
names after stripping characters don't end up with the same CSS class.
"""
if css_class_re.match(s):
return s
md5_suffix = hashlib.md5(s.encode("utf8")).hexdigest()[:6]
# Strip leading _, -
s = s.lstrip("_").lstrip("-")
# Replace any whitespace with hyphens
s = "-".join(s.split())
# Remove any remaining invalid characters
s = css_invalid_chars_re.sub("", s)
# Attach the md5 suffix
bits = [b for b in (s, md5_suffix) if b]
return "-".join(bits)

The CSS bit is actually a bigger problem, because changing that will change the CSS class name (and the name used for custom templates) for existing Datasette instances - which could result in customized templates or CSS breaking in ways that people might not easily notice.

@simonw
Copy link
Owner

simonw commented Feb 14, 2024

Here's the origin of that usedforsecurity= flag, back in 2010:

@simonw
Copy link
Owner

simonw commented Feb 14, 2024

Oh wait... is the issue here that in a FIPS enabled system simply calling hashlib.md5(...) triggers a runtime error?

If so, the fix could well be to use usedforsecurity=False in Python 3.9+, avoid that parameter in Python 3.8 and document that Datasette on Python 3.8 is incompatible with FIPS, which I imagine is completely fine.

I had originally assumed that this was about a FIPS scanner that statically analyzes Python code looking for insecure uses of hashlib, but I now see that a runtime error is a more likely mechanism here.

@darugar
Copy link
Author

darugar commented Feb 14, 2024

@simonw exactly, calling hashlib.md5(...) triggers a runtime error, adding the flag apparently disables the check and prevents the runtime error. Adding the flag for 3.9+ and documenting for 3.8 seems quite reasonable.

Not sure if you can do anything about the pint issue, but just as a heads-up that also exhibits a similar runtime error in their use of hashlib.blake2b, which overall prevents datasette from running in FIPS systems (we're patching around that for now).

@simonw
Copy link
Owner

simonw commented Feb 14, 2024

OK, I figured out how to replicate this problem using Docker.

I'm using this image: https://hub.docker.com/r/cyberark/ubuntu-ruby-fips

The hardest part was finding an actively maintained FIPS Docker image! I eventually found it via this search for most recently updated images on Docker Hub matching "fips": https://hub.docker.com/search?q=fips&sort=updated_at&order=desc

So I can start the container with:

docker run -it --rm cyberark/ubuntu-ruby-fips /bin/bash

Then I can install stuff I need with:

apt-gen update && apt-get install -y python3 git python3.10-venv

Then:

cd /tmp
git clone https://github.com/simonw/datasette
cd datasette
python3 -m venv venv
source venv/bin/activate
pip install -e '.[test]'
pytest -n auto

This fails a bunch of tests thanks to the FIPS issue - errors like this:

  File "/tmp/datasette/datasette/database.py", line 77, in color
    return hashlib.md5(self.name.encode("utf8")).hexdigest()[:6]
ValueError: [digital envelope routines] unsupported

@simonw
Copy link
Owner

simonw commented Feb 14, 2024

Applying this patch causes the test suite to pass in that FIPS Docker container:

diff --git a/datasette/database.py b/datasette/database.py
index becb552c..94225c47 100644
--- a/datasette/database.py
+++ b/datasette/database.py
@@ -74,7 +74,7 @@ class Database:
     def color(self):
         if self.hash:
             return self.hash[:6]
-        return hashlib.md5(self.name.encode("utf8")).hexdigest()[:6]
+        return hashlib.md5(self.name.encode("utf8"), usedforsecurity=False).hexdigest()[:6]
 
     def suggest_name(self):
         if self.path:
diff --git a/datasette/utils/__init__.py b/datasette/utils/__init__.py
index f2cd7eb0..d8d187ea 100644
--- a/datasette/utils/__init__.py
+++ b/datasette/utils/__init__.py
@@ -713,7 +713,7 @@ def to_css_class(s):
     """
     if css_class_re.match(s):
         return s
-    md5_suffix = hashlib.md5(s.encode("utf8")).hexdigest()[:6]
+    md5_suffix = hashlib.md5(s.encode("utf8"), usedforsecurity=False).hexdigest()[:6]
     # Strip leading _, -
     s = s.lstrip("_").lstrip("-")
     # Replace any whitespace with hyphens

@simonw
Copy link
Owner

simonw commented Feb 14, 2024

@darugar do you think it's worth releasing a 0.64 with this fix, or can I leave it for a Datasette 1.0 alpha and then Datasette 1.0?

@simonw simonw added the bug label Feb 14, 2024
@simonw
Copy link
Owner

simonw commented Feb 14, 2024

Not sure if you can do anything about the pint issue, but just as a heads-up that also exhibits a similar runtime error in their use of hashlib.blake2b, which overall prevents datasette from running in FIPS systems (we're patching around that for now).

How are you patching that?

I'm considering moving Pint out of Datasette core and trying to get it to work as a plugin instead before 1.0 - this may be just the push I need to make that decision.

@darugar
Copy link
Author

darugar commented Feb 14, 2024

@simonw yes we're patching Pint. Since we need to patch that anyway doing the patch for datasette is not much more effort, but of course it'd be nicer if datasette didn't need a patch :-) Completely up to you depending on timing for 1.0 alpha. Ideal scenario for us would be a datasette release with Pint as a plugin (we don't use it) and the md5 issue fixed in the release, but not a huge deal.

@simonw
Copy link
Owner

simonw commented Feb 14, 2024

Wrote this up as a TIL: https://til.simonwillison.net/python/md5-fips

@simonw
Copy link
Owner

simonw commented Feb 14, 2024

@simonw
Copy link
Owner

simonw commented Feb 14, 2024

And a PR against Pint too, which was easy because they've already dropped support for Python 3.8:

simonw added a commit that referenced this issue Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants