## Understanding Algorand Addresses
#### 03.2 Winter School on Smart Contracts
##### Peter Gruber (peter.gruber@usi.ch)
2022-01-05

* Understand how Algorand addresses are encoded and related
* Understand how the mnemonic passphrase works

In [1]:
import base64
import algosdk

## Algorand addresses
A private key consists of 32 bytes = 256 bits. Basis is the Ed25519 elliptic-curve signature. How many different addresses are there?<br>
$2^{256} = 10^{77}$<br>
Compare this to the number of atoms in the observable universe, which is estimated to be betwwen $10^{70}$ and $10^{80}$

In [2]:
2**256                      # very hard to read number

115792089237316195423570985008687907853269984665640564039457584007913129639936

In [3]:
f"{2**256:1.3E}"

'1.158E+77'

#### Cracking algorand
If you can guess 1million keys per second, how long would it take to guess them all?

In [4]:
guess = 1E6
print( f"{2**256/guess :1.3E} seconds" )
print( f"{2**256/guess/60/60/24 :1.3E} days" )
print( f"{2**256/guess/60/60/24/365 :1.3E} years" )

1.158E+71 seconds
1.340E+66 days
3.672E+63 years


**EXERCISE:** How long would it take if our computer were 1000 times faster? And if we could use all 2 billion computers in the world?

## Encoding & decoding
* A byte can represent $2^8 = 256$ different possibilities, but there are only 10 digits and 26 letters
* Encodings make a compromise between efficiency and readability
* Below some general examples of decoding and encoding strings and numbers in Python

#### (1) Hex encoding
+ Using numbers `0`-`9` plus letters `a`-`f` $\rightarrow$ 16 symbols
+ One symbol has $16=2^4$ possibilities $\rightarrow$ 4 bits
+ Two symbols = 16x16 = 256 possibilities = 1Byte


In [5]:
numbers = [1, 10, 15, 16, 32]
print(numbers)
#print(hex(numbers))
for number in numbers: print(hex(number), end=" ")

[1, 10, 15, 16, 32]
0x1 0xa 0xf 0x10 0x20 

**EXERCISE:** What is the hexadecimal representation of your year of birth?

In [6]:
text = 'Hello Algorand!'.encode('utf-8')
print(len(text))
print(text.hex())
print(len(text.hex()))

15
48656c6c6f20416c676f72616e6421
30


#### (2) Base 32 encoding
+ Using 32 symbols, e.g. numbers `0`-`9` and letters `A`-`V`
+ Not standardized
+ One symbol has $32 = 2^5$ possibilities $\rightarrow$ 5 bits

In [7]:
print( base64.b32encode(text) )
print( len(base64.b32encode(text)) )

b'JBSWY3DPEBAWYZ3POJQW4ZBB'
24


In [8]:
# Quick check: does it work in both directions?
base64.b32decode('JBSWY3DPEBAWYZ3POJQW4ZBB')

b'Hello Algorand!'

#### (3) Base 64 encoding
+ Using 64 symbols, e.g. numbers `0`-`9` and letters `A`-`Z`, `a`-`z` and special characters `+`, `/`
+ Not standardized
+ One symbol has $64 = 2^6$ possibilities $\rightarrow$ 6 bits

In [9]:
print( base64.b64encode(text) )
print( len(base64.b64encode(text)) )

b'SGVsbG8gQWxnb3JhbmQh'
20


#### Comparison

In [10]:
# Calcualtion of bits
print(15*8)          # Normal text
print(30*4)          # Hex encoding
print(24*5)          # Base 32 encoding
print(20*6)          # Base 64 encoding

120
120
120
120


### Algosdk and addresses encoding

In [11]:
#Â Account info generation.
private_key, public_key = algosdk.account.generate_account()

In [12]:
print( private_key )
print( public_key )
print( len(private_key) )
print( len(public_key) )

diAeognsn5IV2FTA9cXslqfO4p0pIdlmF6Ca0/rZglujsM+Q5xLF9UurnkSiPZGQwDJMCGYw6G29bn6bBil/ZA==
UOYM7EHHCLC7KS5LTZCKEPMRSDADETAIMYYOQ3N5NZ7JWBRJP5SMFADTGE
88
58


#### Default encoding of Algorand keys
+ Private key is base64 encoded and contains the public key
+ Public key is base32 encoded and contains a 4-Byte checksum

In [13]:
len(private_key) * 6 / 2    # 6bits per symbol, two adresses encoded

264.0

In [14]:
len(public_key) * 5  - 32   # 5bits per symbol, minus 4bytes = 32 bits checksum

258

#### Inside the public key

In [15]:
decoded_addr = algosdk.encoding.decode_address(public_key)

# Show the 32 bytes of the address
for my_byte in decoded_addr:
    print(f'{my_byte:d}', end=' ')
print('\n')

# Show the 256 bits bytes of the address
for my_byte in decoded_addr:
    print(f'{my_byte:>08b}', end=' ')
print('\n')

# alterative hex repsresentation
print(decoded_addr.hex())

163 176 207 144 231 18 197 245 75 171 158 68 162 61 145 144 192 50 76 8 102 48 232 109 189 110 126 155 6 41 127 100 

10100011 10110000 11001111 10010000 11100111 00010010 11000101 11110101 01001011 10101011 10011110 01000100 10100010 00111101 10010001 10010000 11000000 00110010 01001100 00001000 01100110 00110000 11101000 01101101 10111101 01101110 01111110 10011011 00000110 00101001 01111111 01100100 

a3b0cf90e712c5f54bab9e44a23d9190c0324c086630e86dbd6e7e9b06297f64


In [16]:
# Encode back decoded address
address = base64.b32encode(decoded_addr).decode()
print( address )
print (len(address) )

UOYM7EHHCLC7KS5LTZCKEPMRSDADETAIMYYOQ3N5NZ7JWBRJP5SA====
56


### The Mnemonic key or passphrase
Consists of 25 words, each chose of a list of $2^{11} = 2048$ words. The encoding is simple, we chosse the *n-th* word in the **bip-0039 English word list**<br>
Each word encodes 11 bits, total is

In [17]:
25 * 11

275

24 words would be enough, but again there is 2-byte hash of the private key

In [18]:
25 * 11 - 16

259

#### All 2048 words from the BIP 39  english wordlist
* BIP stands for Bitcoin Improval Proposal.
* BIP 39 was proposed in 2013
* BIP 39 has its critics, because there are many variants of the standard, see https://electrum.readthedocs.io/en/latest/seedphrase.html#motivation

**Hint** Double-click into the blue bar to hide the output.

In [19]:
print(algosdk.wordlist.word_list_raw())

abandon
ability
able
about
above
absent
absorb
abstract
absurd
abuse
access
accident
account
accuse
achieve
acid
acoustic
acquire
across
act
action
actor
actress
actual
adapt
add
addict
address
adjust
admit
adult
advance
advice
aerobic
affair
afford
afraid
again
age
agent
agree
ahead
aim
air
airport
aisle
alarm
album
alcohol
alert
alien
all
alley
allow
almost
alone
alpha
already
also
alter
always
amateur
amazing
among
amount
amused
analyst
anchor
ancient
anger
angle
angry
animal
ankle
announce
annual
another
answer
antenna
antique
anxiety
any
apart
apology
appear
apple
approve
april
arch
arctic
area
arena
argue
arm
armed
armor
army
around
arrange
arrest
arrive
arrow
art
artefact
artist
artwork
ask
aspect
assault
asset
assist
assume
asthma
athlete
atom
attack
attend
attitude
attract
auction
audit
august
aunt
author
auto
autumn
average
avocado
avoid
awake
aware
away
awesome
awful
awkward
axis
baby
bachelor
bacon
badge
bag
balance
balcony
ball
bamboo
banana
banner
bar
barely
bargain
barre

Menomics require only the first 4 letters

In [20]:
passphrase = algosdk.mnemonic.from_private_key(private_key)
print(passphrase)

attract joy speed scene divert gorilla scheme feel retreat blind island just outer judge erupt cancel swamp frown exotic defy volcano reopen swing able pole


In [21]:
# Same as above
algosdk.mnemonic.to_private_key(passphrase)

'diAeognsn5IV2FTA9cXslqfO4p0pIdlmF6Ca0/rZglujsM+Q5xLF9UurnkSiPZGQwDJMCGYw6G29bn6bBil/ZA=='

### Only the first four letters matter

In [22]:
# A quick function that only keeps the first 4 letters per word (see below for details)
def four_letters(string):
    words = string.split(' ')
    short_words=[word[0:4] for word in words]
    return(' '.join(short_words))

In [23]:
# Compare to passphrase above
short_passphrase = four_letters(passphrase)
print(short_passphrase)

attr joy spee scen dive gori sche feel retr blin isla just oute judg erup canc swam frow exot defy volc reop swin able pole


In [24]:
algosdk.mnemonic.to_private_key(short_passphrase)   # same as above!

'diAeognsn5IV2FTA9cXslqfO4p0pIdlmF6Ca0/rZglujsM+Q5xLF9UurnkSiPZGQwDJMCGYw6G29bn6bBil/ZA=='

#### A quick note on passphrases
+ Some software, e.g. the Algorand wallet iOS app, requires the entire words

## Ethereum

Addresses are 42 characters long Hex string. They derive from the last 20 bytes of the public key.

In [25]:
eth_add = '0x71C7656EC7ab88b098defB751B7401B5f6d8976F'
# removing 0x
eth_add = '71C7656EC7ab88b098defB751B7401B5f6d8976F'
len(eth_add)

40

In [26]:
sample_string_bytes = bytes.fromhex(eth_add)
sample_string_bytes
for my_byte in sample_string_bytes:
    print(f'{my_byte:d}', end=' ')
print('\n')

113 199 101 110 199 171 136 176 152 222 251 117 27 116 1 181 246 216 151 111 



In [27]:
# This is how a base32 ETH address would look like
print( base64.b32encode(sample_string_bytes) )
len(base64.b32encode(sample_string_bytes) )

b'OHDWK3WHVOELBGG67N2RW5ABWX3NRF3P'


32

## Appendix: how four_letters() works
A step-by-step construction of the function `four_letters()`

In [28]:
string = 'Welcome to the WSC blochchain school'

In [29]:
string[0:4]                 # first four letters

'Welc'

In [30]:
string.split(' ')           # split at space --> create a list

['Welcome', 'to', 'the', 'WSC', 'blochchain', 'school']

In [31]:
# List expression
words = string.split(' ')
[word[0:4] for word in words]

['Welc', 'to', 'the', 'WSC', 'bloc', 'scho']

In [32]:
# Join the list to a string again
words = string.split(' ')
short_words=[word[0:4] for word in words]
print(' '.join(short_words))

Welc to the WSC bloc scho


In [33]:
# package as a function
def four_letters(string):
    words = string.split(' ')
    short_words=[word[0:4] for word in words]
    return(' '.join(short_words))

## Appendix: can you create your own Mnemonic?
* You could choose from 2048 words and tell a story for better memorizing
* However, it is not possible, because you would not be able to choose the correct 25th word
* It is also not advisible, as "hand picked" values are not very random and easier to crack