Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve input validation to support Unicode/utf-8 charsets #1001

Open
toholdaquill opened this issue Apr 21, 2015 · 4 comments
Open

Improve input validation to support Unicode/utf-8 charsets #1001

toholdaquill opened this issue Apr 21, 2015 · 4 comments
Labels
i18n Anything related to translation or internationalization of SecureDrop

Comments

@toholdaquill
Copy link
Contributor

Currently, input validation only supports ascii chars, e.g.:

def clean(s, also=''):
    # safe characters for every possible word in the wordlist includes capital
    # letters because codename hashes are base32-encoded with capital letters
    ok = ' !#%$&)(+*-1032547698;:=?@acbedgfihkjmlonqpsrutwvyxzABCDEFGHIJKLMNOPQRSTUVWXYZ'
    for c in s:
        if c not in ok and c not in also:
             raise CryptoException("invalid input: %s" % s)
    # scrypt.hash requires input of type str. Since the wordlist is all ASCII
    # characters, this conversion is not problematic
    return str(s)

As part of developing i18n support for SecureDrop, the crypto_util.py clean function should be updated to sanitize Unicode/utf-8 input.

This is important for two reasons:

@robbintt
Copy link

robbintt commented Nov 8, 2015

The clean function can be removed. This will solve internationalization for word lists.

clean is only called in crypto_util.py -- it is called twice.

scrypt.hash in Python 2 needs str() type. This is easy: http://stackoverflow.com/a/1207836

The clean function does manage conversion from unicode to string, so this must be added elsewhere. Not sure what happens with the error handling for CryptoError now, but this raise should minimally be preserved or removed.

###Security
There is a hanging security issue with unicode homoglyphs: https://en.wikipedia.org/wiki/Homoglyph#Unicode_homoglyphs

This is not a huge deal because:
As long as each word list is just one language there shouldn't be any local homoglyphs.

@redshiftzero redshiftzero added this to the 1.0 milestone Nov 3, 2016
@redshiftzero redshiftzero removed this from the 1.0 milestone May 11, 2017
@ghost ghost added the i18n Anything related to translation or internationalization of SecureDrop label Dec 5, 2017
@eloquence
Copy link
Member

This is still a valid issue, and we should resolve in order to address #999.

@nabla-c0d3
Copy link
Contributor

The CryptoUtil.clean() function has been removed as part of #5600.

@eloquence
Copy link
Member

eloquence commented Feb 22, 2021

Thanks @nabla-c0d3! It looks like we're still doing character validation against a limited character set in a different function now.

https://github.com/freedomofpress/securedrop/blob/develop/securedrop/crypto_util.py#L41-L42
https://github.com/freedomofpress/securedrop/blob/develop/securedrop/crypto_util.py#L318

Therefore keeping this issue open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n Anything related to translation or internationalization of SecureDrop
Projects
None yet
Development

No branches or pull requests

5 participants