# Practical 1: Create a block from scratch using Python

This jupyter notebook can be downloaded and run locally with anaconda. Jupyter and anaconda should be installed in all AUT engineering and computer science labs. The benefit of using jupyter is that code snippets can be run live (python is running in the background).

A static version can be found on github at https://github.com/millecodex/COMP842/. All code can be copied and pasted into your favourite text editior or IDE and *might* run with Python 3.x.

You are encouraged to use any programming language you feel comfortable with, this is simply an example using python (and jupyter is designed for python demonstrations). AUT lab computers also have a java interpreter (and maybe a C++?) installed.

In [1]:
# the hash library has many built-in hash functions such as SHA-1 and MD5
import hashlib as hash

Our block of data will contain many fields such as:<br>
 - identifier
 - time
 - previous hash
 - merkle root
 - list of transactions
These can be stored in a python dictionary which is a key-value structure

`dict = {key_1:value_1,
         key_2:value_2,
        .
        .
        .
        key_n:value_n   
}`

## Initialize a new block. This one will be the _genesis_ block

In [2]:
# initialize a block. Note 'transactions' is initialized as an empty list
block = {
    'height':1,
    'time':0,
    'prevHash':'this is the genesis block',
    'merkleRoot': '',
    'transactions': []
        }
print(block)

{'height': 1, 'time': 0, 'prevHash': 'this is the genesis block', 'merkleRoot': '', 'transactions': []}


## Create a transaction and hash it

Let's create a transaction to store in our blockchain. Remember a transaction is just data; this can be anything represented as a digital object.

In [3]:
# create a transaction (string)
transaction='Pay $1,000,000 to Jeff'
print(transaction)

Pay $1,000,000 to Jeff


To store the transaction object, we will hash it to create a unique identifier of the information

In [4]:
#hashed_tx = hash.sha1(transaction)
#print(hashed_tx)

```p
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-6b646aa88a56> in <module>
----> 1 hashed_tx = hash.sha1(transaction)
      2 print(hashed_tx)

TypeError: Unicode-objects must be encoded before hashing
```

The error message above is telling us that we cannot hash a string object such as 'Pay $1,000,000 to Jeff'. (Why not?)
First is must be encoded.

In [5]:
encoded_tx = transaction.encode()
print(encoded_tx)

b'Pay $1,000,000 to Jeff'


the 'b' is telling us that the string is now a byte object

In [6]:
hashed_tx = hash.sha1(encoded_tx)
print(hashed_tx)

<sha1 HASH object @ 0x000002A2D4A73B20>


This shows a SHA-1 hash object at the specified memory address. Unfortuantely this isn't human-readable and we can't copy and paste it for verification elsewhere.

The `digest()` and `hexdigest()` methods will output byte objects and hex strings respectively.

In [7]:
print(hashed_tx.digest())
print(hashed_tx.hexdigest())

b'\xbd\xda`\xde\x96+k\xec\x1b\x7f\x05\xd4\x8c\xe3\x8f\xdb%\xbf\xf2\x1d'
bdda60de962b6bec1b7f05d48ce38fdb25bff21d


## Add the transaction to the block

In [8]:
hex_tx = hashed_tx.hexdigest()
block["transactions"].append(hex_tx)
print(block)

{'height': 1, 'time': 0, 'prevHash': 'this is the genesis block', 'merkleRoot': '', 'transactions': ['bdda60de962b6bec1b7f05d48ce38fdb25bff21d']}


## Create a new block and append it to the chain

This block only has a single transaction (perhaps its the block reward to Jeff ;) Now we will create a new block and append it to the chain. The block is created in the same manner, except we must update the prevHash field with the hash of the genesis block. This will ensure the state of the blockchain is preserved moving forward.

In [9]:
# some attributes have been hard-coded for simplicity
block2 = {
    'height':2,
    'time':1,
    'prevHash':'null',
    'merkleRoot': 'null',
    'transactions': []
        }
# create a transaction and add it to the block
tx = hash.sha1('Alice +10'.encode()).hexdigest()
block2["transactions"].append(tx)
block2["merkleRoot"] = tx
print(block2)

{'height': 2, 'time': 1, 'prevHash': 'null', 'merkleRoot': '9726fd28f4baeeef320445819ce41b02ca756e19', 'transactions': ['9726fd28f4baeeef320445819ce41b02ca756e19']}


The only thing left is to link the blocks. For this we need to hash the entire genesis block object. Proceeding as before:

In [10]:
hash_block_1 = hash.sha1(block.encode())

AttributeError: 'dict' object has no attribute 'encode'

This is a uniquely python error. We need to convert the block (dictionary) into a byte object. To do this we need to use the pickle functionality that is built in. You may know this as serialization. Once pickled, we can hash and store as a hex digest.

In [11]:
import pickle
# convert to a byte object
byte_genesis = pickle.dumps(block)
print(byte_genesis)

# compress to a human-readable SHA-1 digest
hash_genesis = hash.sha1(byte_genesis).hexdigest()
print('\n')
print(hash_genesis)

b'\x80\x03}q\x00(X\x06\x00\x00\x00heightq\x01K\x01X\x04\x00\x00\x00timeq\x02K\x00X\x08\x00\x00\x00prevHashq\x03X\x19\x00\x00\x00this is the genesis blockq\x04X\n\x00\x00\x00merkleRootq\x05X\x00\x00\x00\x00q\x06X\x0c\x00\x00\x00transactionsq\x07]q\x08X(\x00\x00\x00bdda60de962b6bec1b7f05d48ce38fdb25bff21dq\tau.'


94c058271f08c4597a8ef04771b81793619b7d6e


The byte_genesis output is much longer than our previous byte outputs. Hashing is advantageous because the output is always a fixed length.

Set the prevHash pointer in block2 to the hash of the genesis block.

In [12]:
# set the prevHash and print the block
block2["prevHash"] = hash_genesis
for key, value in block2.items():
    print(key+': '+str(value)) 

height: 2
time: 1
prevHash: 94c058271f08c4597a8ef04771b81793619b7d6e
merkleRoot: 9726fd28f4baeeef320445819ce41b02ca756e19
transactions: ['9726fd28f4baeeef320445819ce41b02ca756e19']


# Modify a transaction to attack the chain

A hash produces randomized output without any discernable patter relating to the original data. Let test this by modifying the transaction in the genesis block, rehashing, and comparing to the prevHash pointer in block2.

In [13]:
# changing a single transaction modifies the block hash and will invalidate the entire chain
#
# change the dollar sign to a negative sign in the original transaction
new_transaction = 'Pay -1,000,000 to Jeff'
hashed_new_tx=hash.sha1(new_transaction.encode()).hexdigest()
# update the block with the new tx
block["transactions"][0]=hashed_new_tx

# hash the updated block
import pickle
byte_genesis_new = pickle.dumps(block)
hash_genesis = hash.sha1(byte_genesis_new).hexdigest()

# compare hashes
if block2["prevHash"] != hash_genesis:
    print('Your chain has been attacked!!')

Your chain has been attacked!!


## Summary

In this tutorial we have:<br>
 - created a block structure including a list of transactions (data)
 - hashed the transaction and added it to the block
 - hashed the entire block
 - added a new block 
 - linked the two blocks with a previous hash field to create a block chain
 
What we have __not__ done is:<br>
 - use a merkle tree to store the transactions
 - store the merkle root in our block structure
 
Python libraries that this code depends on:
 - __[hashlib](https://docs.python.org/3/library/hashlib.html)__
 - __[pickle](https://docs.python.org/3/library/pickle.html)__

## Exercise

Create a merkle root of the transactions from bitcon block 566446. A \*.csv file can be downloaded from blackboard or this __[repo](https://github.com/millecodex/COMP842/blob/master/tx_list_bitcoin_566446.csv)__.