Improve input validation to support Unicode/utf-8 charsets #1001

toholdaquill · 2015-04-21T16:40:42Z

Currently, input validation only supports ascii chars, e.g.:

def clean(s, also=''):
    # safe characters for every possible word in the wordlist includes capital
    # letters because codename hashes are base32-encoded with capital letters
    ok = ' !#%$&)(+*-1032547698;:=?@acbedgfihkjmlonqpsrutwvyxzABCDEFGHIJKLMNOPQRSTUVWXYZ'
    for c in s:
        if c not in ok and c not in also:
             raise CryptoException("invalid input: %s" % s)
    # scrypt.hash requires input of type str. Since the wordlist is all ASCII
    # characters, this conversion is not problematic
    return str(s)

As part of developing i18n support for SecureDrop, the crypto_util.py clean function should be updated to sanitize Unicode/utf-8 input.

This is important for two reasons:

if we support codenames in Unicode/utf-8 charsets (issue Support Diceware wordlists in multiple languages as part of i18n efforts #999), then input validation should match the language in question
If SecureDrop decides to accept Transifex translations (see discussion in comments to issue SecureDrop i18n #753, esp. @runasand's remarks), then those translations could also be validated in the same way.

The text was updated successfully, but these errors were encountered:

robbintt · 2015-11-08T01:30:18Z

The clean function can be removed. This will solve internationalization for word lists.

clean is only called in crypto_util.py -- it is called twice.

scrypt.hash in Python 2 needs str() type. This is easy: http://stackoverflow.com/a/1207836

The clean function does manage conversion from unicode to string, so this must be added elsewhere. Not sure what happens with the error handling for CryptoError now, but this raise should minimally be preserved or removed.

###Security
There is a hanging security issue with unicode homoglyphs: https://en.wikipedia.org/wiki/Homoglyph#Unicode_homoglyphs

This is not a huge deal because:
As long as each word list is just one language there shouldn't be any local homoglyphs.

eloquence · 2020-12-11T21:07:46Z

This is still a valid issue, and we should resolve in order to address #999.

nabla-c0d3 · 2020-12-19T22:09:20Z

The CryptoUtil.clean() function has been removed as part of #5600.

eloquence · 2021-02-22T19:09:12Z

Thanks @nabla-c0d3! It looks like we're still doing character validation against a limited character set in a different function now.

https://github.com/freedomofpress/securedrop/blob/develop/securedrop/crypto_util.py#L41-L42
https://github.com/freedomofpress/securedrop/blob/develop/securedrop/crypto_util.py#L318

Therefore keeping this issue open.

heartsucker mentioned this issue Sep 3, 2015

Pybabel and German translations #1103

Closed

toholdaquill mentioned this issue Nov 7, 2015

L10n — source interface #1168

Closed

redshiftzero added this to the 1.0 milestone Nov 3, 2016

redshiftzero removed this from the 1.0 milestone May 11, 2017

ghost added the i18n Anything related to translation or internationalization of SecureDrop label Dec 5, 2017

eloquence mentioned this issue Dec 11, 2020

Add Unicode/utf-8 support for codenames #1000

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve input validation to support Unicode/utf-8 charsets #1001

Improve input validation to support Unicode/utf-8 charsets #1001

toholdaquill commented Apr 21, 2015

robbintt commented Nov 8, 2015

eloquence commented Dec 11, 2020

nabla-c0d3 commented Dec 19, 2020

eloquence commented Feb 22, 2021 •

edited

Loading

Improve input validation to support Unicode/utf-8 charsets #1001

Improve input validation to support Unicode/utf-8 charsets #1001

Comments

toholdaquill commented Apr 21, 2015

robbintt commented Nov 8, 2015

eloquence commented Dec 11, 2020

nabla-c0d3 commented Dec 19, 2020

eloquence commented Feb 22, 2021 • edited Loading

eloquence commented Feb 22, 2021 •

edited

Loading