### BIP 39 Reference:
* [The BIP](https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki)
* [Original Python Reference Implementation](https://github.com/trezor/python-mnemonic/blob/master/mnemonic/mnemonic.py)
* [Go](https://github.com/tyler-smith/go-bip39)
* [Javascript](https://github.com/iancoleman/jsbip39/blob/master/jsbip39.js)
* [Blog Post](https://jmcintyre.net/?p=180)
* [bitcoinlib-js](https://github.com/bitcoinjs/bip39/blob/master/index.js#L93) has the cleanest implementation.

In [102]:
import requests, os, binascii, hashlib, unicodedata, pbkdf2, hmac

# Grab English Wordlist From BIP Git Repo

In [30]:
english_wordlist_url = "https://raw.githubusercontent.com/bitcoin/bips/master/bip-0039/english.txt"
response = requests.get(english_wordlist_url)
wordlist = response.text.splitlines()
assert len(wordlist) == 2048

# Produce Mnemonic From Random Bytes

In [87]:
def generate_entropy(n):
    return os.urandom(n)

def mnemonic_from_entropy(entropy_bytes):
    entropy_bits_length = len(entropy_bytes) * 8
    checksum_bits_length = entropy_bits_length // 32
    sequence_length = (entropy_bits_length + checksum_bits_length) // 11
    
    # Prepare entropy bits
    entropy_bits = bin(int(binascii.hexlify(entropy), 16))[2:]
    padded_entropy_bits = entropy_bits.zfill(entropy_bits_length)
    
    # Prepare checksum
    entropy_hash = hashlib.sha256(entropy).hexdigest()
    entropy_hash_int = int(entropy_hash, 16)
    entropy_hash_bits = bin(int(entropy_hash, 16))[2:]
    padded_entropy_hash_bits = entropy_hash_bits.zfill(256)
    checksum = padded_entropy_hash_bits[:checksum_bits_length]
    
    # Take 11 bit slices of padded_entropy_bits + checksum
    # Interpret as int, pluck work from wordlist at this index
    sequence = padded_entropy_bits + checksum
    result = []
    for i in range(sequence_length):
        index = int(sequence[i * 11:(i + 1) * 11], 2)
        result.append(wordlist[index])
    return " ".join(result)

In [88]:
entropy = generate_entropy(32)
entropy

b'\x93Zi\xc6s40\x19$\x1b\x06\xdfQ\x97\r\x15\xe8\x00\x95A\xbb\x8dNQ\xb3\xc9\x86\xfe\x1b\x8d\xdb!'

In [110]:
mnemonic = mnemonic_from_entropy(entropy)
mnemonic

'nation spy immense toy drink arrive motor general tent million select cloth length enjoy alone immune exact mirror venue assume ticket shoulder hold dance'

# Produce Seed From Mnemonic

BIP 39 uses a the PBKDF2 key-stretching algorithm to take the mnemonic seed (128, 192, or 256 bits) and a passphrase and produce a 512 bit "seed" which is then used to create BIP 32 HD Wallets.

[Here's a nice video about PBKDF2](https://www.youtube.com/watch?v=yelMxr7UErk)
* gpg, pgp, openssh applications all use it.
* pbkdf2, bcrypt, and scrypt all use this same algorithm.

Why PBKDF2?

[" PBKDF2 with 2048 rounds makes bruteforcing it 2049x harder"](https://www.reddit.com/r/Bitcoin/comments/2cm3zu/does_anyone_know_why_trezor_used_pbkdf2_in_the/cjgszrv/) and because the [recovery process](https://www.reddit.com/r/Bitcoin/comments/2cm3zu/does_anyone_know_why_trezor_used_pbkdf2_in_the/cjgtvpb/) will cause the user to leak their 24 words in random order (so brute forcing would take 24! = 620448401733239439360000 iterations)


In [111]:
def normalize_string(txt):
    if isinstance(txt, bytes):
        utxt = txt.decode('utf8')
    elif isinstance(txt, str):
        utxt = txt
    else:
        raise TypeError("String value expected")
        
    return unicodedata.normalize('NFKD', utxt)

def seed_from_mnemonic(mnemonic, passphrase=''):
    mnemonic = normalize_string(mnemonic)
    passphrase = normalize_string(passphrase)
    return pbkdf2.PBKDF2(
        mnemonic, u'mnemonic' + passphrase, 
        iterations=2048, 
        macmodule=hmac, 
        digestmodule=hashlib.sha512
    ).read(64)


In [112]:
seed = seed_from_mnemonic(mnemonic)
seed

b'~\x15@\xbdjE\xcbE\xa0\xdctu1:JS \x17w\x11\xd6i\xe3\xa0\x91\x94\xef]^\xd8\xff\xbd\n\x04\xdf;\x81@\xf6<\x0e\r?\x9b\x1f[\xa1\xb3\xcd\xc1\x92\xcf\xc1\x14\xbf\xba\x90\x0fa\xbe\x11\xb5\x88\xcd'