Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to French wordlist #6936

Merged
merged 1 commit into from
Sep 26, 2023
Merged

Conversation

sts10
Copy link
Contributor

@sts10 sts10 commented Aug 29, 2023

Status

Ready for review

Description of Changes

I, a person who doesn't speak French, am proposing the following changes to SecureDrop's French word list:

  1. added some words (without accents) from the BIPS French word list. I assume these are reasonable words to make a passphrase memorable for a French speaker.
  2. removed all words with fewer than 3 characters, most of which seem to not be real words like "af" and "b".
  3. removed a handful of words from a list of profane French words I found online.

After these changes, my proposed word list has a healthy 7,886 words, meaning each word adds 12.945 bits of entropy to a passphrase (the existing list has 7,384 words, meaning each word provides 12.85 bits).

Note that my proposed list, like the word list it is replacing, is not uniquely decodable. In practical terms, this means that a separator like a hyphen or space must be used between words of all generated passphrase. I'd be happy to create a uniquely decodable list if that is desired.

Changes proposed in this pull request:

  • Edits to SecureDrop's French word list, as described above.

One concern might be the licensing of the BIPS French word list. I can't find any licensing information in what I assume is the relevant Github repo.

Testing

I don't think this PR needs testing, but I guess you could do generate some French passphrases to make sure nothing weird happens. Also, might be wise to have a French speaker give the added words a look!

Deployment

Any special considerations for deployment?

Changing the passphrase word list shouldn't be too hard to deploy. Existing users, who have words in their passphrase that I'm proposing to remove, shouldn't be affected. New users will get a series of words from the new list to make a hopefully more memorable passphrase for French speakers.

Checklist

Choose one of the following:

  • I have opened a PR in the docs repo for these changes, or will do so later
  • I would appreciate help with the documentation
  • These changes do not require documentation

Choose one of the following:

  • I have performed a diff review and pasted the contents to the packaging wiki
  • I would like someone else to do the diff review
  • I am silencing an alert related to a production dependency, because (please explain below):

@sts10 sts10 requested a review from a team as a code owner August 29, 2023 05:57
@eaon
Copy link
Contributor

eaon commented Aug 31, 2023

IMO this would fix #6652

@sts10
Copy link
Contributor Author

sts10 commented Aug 31, 2023

Re: the discussion in #6652 about getting French words from Wikipedia, I actually, as an experiment, made a separate list by this method as well. I did leave words with accents in the list, though -- that's a usability question I suppose. Happy to find 7,776 words from French Wikipedia without accents if we prefer that.

As pointed out in that issue, this process could be used to create word lists in other languages beyond English and French. Though of course it'd be important to have an expert in the given language review the list before use?

@legoktm
Copy link
Member

legoktm commented Sep 6, 2023

Thanks @sts10, this looks pretty good.

I skimmed through bitcoin/bips#152 and saw that they put a lot of effort into selecting good words and pruning out words that are too similar or obscure so I think we probably don't need to intense of a review from a French speaker, but I'm sure we can find someone for that.

I am slightly worried about the copyright. https://github.com/bitcoin/bips/blob/master/bip-0039/bip-0039-wordlists.md#french indicates that this isn't a pure mechanical list, it's been pretty well curated (related: https://opensource.stackexchange.com/a/10679). But at the same time I don't think Bitcoin intended to create a proprietary list of words? Surely it's shipped in one of their code repos somewhere that has a clear license statement?

@sts10
Copy link
Contributor Author

sts10 commented Sep 6, 2023

I am slightly worried about the copyright.

Totally understandable!

Surely it's shipped in one of their code repos somewhere that has a clear license statement?

You'd think so!

Unfortunately it doesn't seem like any of the other code projects under the "Bitcoin" GitHub organization actually contain copies of the BIPS-0039 word lists within them. I don't know enough about Bitcoin to know if these are the only "official" repos, or if there's even such a concept of "official" in Bitcoin world.

For what it's worth, the BIPS repo lists a "reference implementation" for generating passphrases and links to this Python repo, which is MIT-licensed. This project does include copies of the word lists themselves. Maybe that's enough for us? I doubt they'd proudly point to a project that violates their terms?

There are also plenty of "other implementations" listed that use the MIT License.

I'll keep looking this week, maybe ask in a forum.

@sts10
Copy link
Contributor Author

sts10 commented Sep 6, 2023

I've learned a bit more from BIP-2:

It seems that individual BIPS can be and are licensed under different licenses.

BIP-2 provides a list of recommended licenses. Interesting there are also "Not recommended, but acceptable licenses," implying that individual BIP authors have some leeway in choosing how their work is licensed.

Sadly, BIP-39 seems to be one of the BIPS that does NOT (currently) specify a license. Bummer!

That said, I still feel hopeful that, given that the word lists are included in their reference implementation, which is MIT-licensed, we're probably safe.

@rocodes
Copy link
Contributor

rocodes commented Sep 12, 2023

(If we're worried enough, we can also file an issue or contact them asking for explicit permission and/or offering attribution as per their request. It would be great to land these changes, and there's some good will around our project so I'd hope it would be a well-received request.)

@sts10
Copy link
Contributor Author

sts10 commented Sep 12, 2023

(If we're worried enough, we can also file an issue or contact them asking for explicit permission and/or offering attribution as per their request. It would be great to land these changes, and there's some good will around our project so I'd hope it would be a well-received request.)

So just FYI, I'll fess up a bit here and admit that last week I very clumsily tried to slap a permissive license on the BIPS word lists (their repo doesn't allow issues, so I jumped straight to a PR). I should have thought through the legal impossibilities of such a move. Hopefully I didn't antagonize their maintainers!

@legoktm
Copy link
Member

legoktm commented Sep 25, 2023

For what it's worth, the BIPS repo lists a "reference implementation" for generating passphrases and links to this Python repo, which is MIT-licensed. This project does include copies of the word lists themselves. Maybe that's enough for us? I doubt they'd proudly point to a project that violates their terms?

That's good enough for me IMO, and I'll double-check with the rest of the team that there are no concerns.

Could you add a commit updating the README with the new source information (MIT-licensed trezor/python-mnemonic)?

@sts10
Copy link
Contributor Author

sts10 commented Sep 25, 2023

Could you add a commit updating the README with the new source information (MIT-licensed trezor/python-mnemonic)?

Ah good call. I've done so, in the way I guessed fits. Let me know if that's what you had in mind or not.

@zenmonkeykstop
Copy link
Contributor

The MIT-licenced trezor wordlist is identical to the BIPS one. (And the original PR pulls in all said lists in one go.) I think it's OK to follow their lead in this respect.

@legoktm legoktm added this to the SecureDrop 2.7.0 milestone Sep 26, 2023
@legoktm legoktm self-assigned this Sep 26, 2023
* Add words from the BIP-0039 wordlist contained in
  trezor/python-mnemonic (MIT licensed)
* Remove all words with less than 2 characters
* Remove a handful of words from a list of profane French words found
  online

This will not affect existing passphrases and only be used for newly
generated ones.

Fixes freedomofpress#6652.
Copy link
Member

@legoktm legoktm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I squashed the two commits and copied what you wrote in the PR description for the commit message too. LGTM! Will merge once CI gives the green check.

Thanks for the contribution @sts10 :)

@zenmonkeykstop zenmonkeykstop merged commit f7e001b into freedomofpress:develop Sep 26, 2023
8 checks passed
@sts10 sts10 deleted the fr-wordlist branch October 23, 2023 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants