-
Notifications
You must be signed in to change notification settings - Fork 685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
French word list quality is subpar #6652
Comments
It would be interesting to see if we could use the list of pages on the French Wiktionary, maybe with some more filtering to achieve our desired word length ratios: https://dumps.wikimedia.org/frwiktionary/20221001/frwiktionary-20221001-all-titles-in-ns0.gz. This would make it straightforward to add in more languages (that have Wiktionaries at least :)) |
I was surprised to learn today that we actually do have a bunch of checks at runtime(!) to supposedly verify the quality of wordlists: https://github.com/freedomofpress/securedrop/blob/6e4a4363b0da489e77d326d72208ddf56065e8f7/securedrop/passphrases.py. One important note from this is that 1 character "words" are dropped. I think re-evaluating those checks should be part of this. For example, there's a check to verify generated passphrases are long enough, but it does so by taking the shortest word in the list, multiplying the length by the number of words (7), adding in the spaces, and then verifying it's less than the minimum passphrase length (20). In practice, this is a static calculation since There's a check in the opposite direction making sure the word list doesn't have too-long words; again, it's a static calculation, the cap of a 128 character passphrase means your longest word can't be more than 17 characters. On top of that I see no reason they should happen at runtime, a unit test seems more appropriate. I think looking at the distribution of word lengths is more interesting, as it indicates how short/long the average passphrase will be. |
One reason to check at runtime, as it is done now, is to prevent the app from even starting at all if the words list are super insecure/bad (for example, the lists files were somehow modified, etc.). |
Indeed. I think that protects us against two situations:
No 2. isn't stoppable since the attacker (who somehow already has write access) could easily disable whatever runtime checks exist as they insert their own word list. So then we're just protecting from situation No 1., which I'm not sure is worth it (e.g. we don't do integrity verification of other files AFAIK). That said, I doubt any new checks we come up with are going to be noticeably slow, so there's certainly no harm in continuing to check some things at runtime (they should just actually check useful things...). |
Yeah I was mainly thinking about 1. as 2. is a "game-over" scenario, as you mentioned. It looks like being able to swap the words lists used to be a feature (as it's a configuration key in |
The entropy of Diceware passphrases is interesting, and came up in a very similar discussion a while back. The Diceware FAQ says not only do short words not decrease the quality of the passphrases, but they were included by design for usability. The French word list is weaker than the English, but because it has fewer words, not because there are more short ones. I'm not sure what the current state of support for using other wordlists is, but I don't think it's unreasonable for admins to want sources to get passphrases in their preferred languages. Keeping the runtime checks might help prevent an attempt to support that from weakening those passphrases. |
@legoktm I did a light feasibility assessment on the Also the relevant validation logic for a word list appears to be done in the NOTE
|
I had a chat with @epociask yesterday, and just caught up with the conversation in this issue. First, I'd like to highlight what @rmol said: the security properties of a passphrase that's created from words randomly picked from a word list doesn't depend on the words themselves, it depends on the size of the list (how many combinations of Because they're meant to be memorized them over time, I do believe that there is value in people having access to code names (passphrases) in their own language. I have thought extensively about how such word lists can be created, and I am currently convinced that it is something that should be done in coordination with language specialists. (By that I mean people very familiar with the target language and how it is commonly used.) For illustration, I've written down my thoughts on creating a Spanish word list in the past. (Ongoing project.) Some of which may be useful inspiration as we think about non-English word lists for SecureDrop use. Keep in mind, though, that while a word list designed for use with dice to generate passphrases can be used for SecureDrop, a word list for SecureDrop doesn't necessarily have to respond to all the constraints that may be desirable in a word list made to be used with dice. |
@gonzalo-bulnes Somewhat irrelevant but do you think it'd be valuable to add word list specific tests within the repository instead of performing data validation checks inside of the Passphrase constructor? |
Description
We do not allow our users to set their own password, no matter whether they're journalists or sources. Instead we generate dice ware passphrases for them. This means we rely on word lists to set passphrases, and we want those to be good. Unfortunately, when compared to its English counterpart, the French word list isn't very good.
All of our users that selected to use French automatically get a passphrase derived from this list.
Distribution of word lengths
en.txt
fr.txt
3-letter words make up 1% of
en.txt
, while 3-or-less-letter words make up 15% offr.txt
. We also have 26 letters of the alphabet represented and various permutations of characters (for examplevwx
) that are neither easy to remember nor a word. But the worst bit is that on the whole, you're more likely to get a passphrase with worse quality from the French word list than the English one.Comments
I don't think the quality is so terrible that it warrants immediate removal, but it'd be great if we found a free (AGPL compatible) dictionary that we could use to generate a new one.
The text was updated successfully, but these errors were encountered: