# Introduction
This [Jupyter](https://jupyter.org/) notebook can be run using [colab.research.google.com](https://colab.research.google.com). See [here](https://colab.research.google.com/notebooks/intro.ipynb) for an intro. Alternatively this can be downloaded and run locally with [Anaconda](https://docs.anaconda.com/anaconda/navigator/). Jupyter and Anaconda should be installed in all AUT engineering and computer science labs. I recommend using the Goolge web-interface.

The benefit of using Jupyter is that code snippets can be run live (Python is running in the background).

The version on Github is static; markdown is rendered but code cannot be executed. All code can be copied and pasted into your favourite text editor or IDE and *should* run with Python 3.x ;)

You are encouraged to use any programming language you feel comfortable with, this is simply an example using Python (and Jupyter is designed for Python demonstrations).

---

#Tutorial: Create a block from scratch using Python

Blocks can contain anything digital. This 'data' is what is written to the blockchain when a new block is added. This data could be prices of stocks, reddit posts, digital signatures, images (not a good idea), or getting creative it could even be data from another blockchain.

To be useful the data will contain many fields such as:<br>
 - identifier
 - timestamp
 - previous hash (to make the chain)
 - merkle root
 - list of transactions (the data of interest)
 
These can be stored in a python [dictionary](https://docs.python.org/3/tutorial/datastructures.html?#dictionaries) which is a key-value structure. Think of the key as an index.

`dict = {key_1:value_1,
         key_2:value_2,
        .
        .
        .
        key_n:value_n   
}`

## Initialize a new block. This one will be the _genesis_ block
Press ```shift+enter``` to run the individual code cell. Or mouse over and click the play button. See the ```Runtime``` menu for all options.

You may have to wait for the environment to initialize if this is the first time. There is a status bar above.

The out put will appear directly below the code block.

In [None]:
# initialize a block. Note 'transactions' is initialized as an empty list
block = {
    'height':1,
    'time':0,
    'prevHash':'this is the genesis block',
    'merkleRoot': '',
    'transactions': []
        }
print(block)

## Create a transaction and hash it

Let's create a transaction to store in our blockchain. Remember a transaction is just data; this can be anything represented as a digital object. 

In [None]:
# create a transaction, in this case a string
transaction='Pay $1,000,000 to Jeff'
print(transaction)

To store the transaction object, we will hash it to create a unique identifier of the information. First we will need access to python's hashing library. Then we will hash the 'transaction' object we created above. In the third line we will output the new hashed value.

The [hash library](https://docs.python.org/3/library/hashlib.html?#module-hashlib) has access to many standard hash functions. ```SHA-256``` or secure hashing algorithm with a 256 bit output is particularly famous in cryptocurrency.


In [None]:
# the hash library has many built-in hash functions such as SHA-1 and MD5
import hashlib as hash
hashed_tx = hash.sha1(transaction)
print(hashed_tx)


```p
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-8b54d0eb476f> in <module>()
      1 # the hash library has many built-in hash functions such as SHA-1 and MD5
      2 import hashlib as hash
----> 3 hashed_tx = hash.sha1(transaction)
      4 print(hashed_tx)

TypeError: Unicode-objects must be encoded before hashing
```

Here we have an **error in line three**, note the green arrow ```----> 3```

The error message is telling us that we cannot hash a string object such as 'Pay $1,000,000 to Jeff'. (Why not?)

First it must be encoded.

In [None]:
encoded_tx = transaction.encode()
print(encoded_tx)

Note the output begins with a ```b'``` which is telling us that the string is now a byte object. We can successfully hash a byte object.

In [None]:
hashed_tx = hash.sha1(encoded_tx)
print(hashed_tx)

This shows a ```SHA-1``` hash object at the specified memory address. Unfortuantely this isn't human-readable and we can't copy and paste it for verification elsewhere. Note that your address is different from your neighbours is different from mine: ```0x7fed...```

The `digest()` and `hexdigest()` methods will output byte objects and hex strings respectively. Which type of object would you rather work with?

In [None]:
print(hashed_tx.digest())
print(hashed_tx.hexdigest())

## Add the transaction to the block

We now have a hashed object. This should be unique -- only the specific string ```Pay $1,000,000 to Jeff``` should have the hashed hex output of ```bdda60de962b6bec1b7f05d48ce38fdb25bff21d```. There also should be no **collisions** meaning that if your create a transaction and hash it you will not get the same output. The last thing to note is that this object is a **fixed length**. So if the data was very long (this whole notebook text file) I would still get a 160 bit output (40 hex digits).

Lets store this transaction and add it to the ```block``` we created above.

In [None]:
hex_tx = hashed_tx.hexdigest()
block["transactions"].append(hex_tx)
print(block)

```transactions``` is an array ```[]``` and we can see the tx output

## Create a new block and append it to the chain

This block only has a single transaction (perhaps its the block reward to Jeff ;) Now we will create a new block and append it to the chain. The block is created in the same manner, except we must make a few updates:

1.   the blockheight is now incremented by 1
2.   the time is incremented by 1
3.   the prevHash field with the hash of the genesis block. This will ensure the state of the blockchain is preserved moving forward.

In [None]:
# some attributes have been hard-coded for simplicity
block2 = {
    'height':2,
    'time':1,
    'prevHash':'null',
    'merkleRoot': 'null',
    'transactions': []
        }
# create a transaction and add it to the block
tx = hash.sha1('Alice +10'.encode()).hexdigest()
block2["transactions"].append(tx)
block2["merkleRoot"] = tx
print(block2)

Note there was only 1 transaction and so this became the ```merkleRoot```. A proper [Merkle root](https://en.wikipedia.org/wiki/Merkle_tree) represents the root of a pairwise transaction tree where every non-leaf node holds the hash of the two child nodes.

The only thing left is to link the blocks. For this we need to hash the entire genesis block object. Proceeding as before:

In [None]:
hash_block_1 = hash.sha1(block.encode())

Another **error**! This is a uniquely python error. We need to convert the block (dictionary) into a byte object. To do this we need to use the pickle functionality that is built in. You may know this as serialization. Once pickled, we can hash and store as a hex digest.

In [None]:
import pickle
# convert to a byte object
byte_genesis = pickle.dumps(block)
print(byte_genesis)

# compress to a human-readable SHA-1 digest
hash_genesis = hash.sha1(byte_genesis).hexdigest()
print('\n')
print(hash_genesis)

Earlier we said hashing has the benefit of being fixed length. Here you can see the ```byte_genesis``` output is much longer than our previous byte outputs. 

Set the ```prevHash``` pointer in ```block2``` to the hash of the genesis block.

In [None]:
# set the prevHash and print the block
block2["prevHash"] = hash_genesis
for key, value in block2.items():
    print(key+': '+str(value)) 

###That's the main concept of creating a blockchain!

As noted above, the consensus mechanism is a whole other part, but this (hopefully) shows that coding a blockchain is not that intense.

# Modify a transaction to attack the chain

A hash produces randomized output without any discernable pattern relating to the original data. Lets test this by modifying the transaction in the genesis block, rehashing, and comparing to the prevHash pointer in block2.

Changing a single transaction modifies the Merkle root which modifies the block hash and will invalidate the entire chain up to that point in history.


In [None]:
# change the dollar sign to a negative sign in the original transaction
new_transaction = 'Pay -1,000,000 to Jeff'
hashed_new_tx=hash.sha1(new_transaction.encode()).hexdigest()
# update the block with the new tx; recall 'block' is the original or genesis block; our tx was at position 0
block["transactions"][0]=hashed_new_tx

# hash the updated block
import pickle
byte_genesis_new = pickle.dumps(block)
hash_genesis = hash.sha1(byte_genesis_new).hexdigest()

# compare hashes
if block2["prevHash"] != hash_genesis:
    print('Your chain has been attacked!!')

## Summary

In this tutorial we have:<br>
 - created a block structure including a list of transactions (data)
 - hashed the transaction and added it to the block
 - hashed the entire block
 - added a new block 
 - linked the two blocks with a previous hash field to create a block chain
 
What we have __not__ done is:<br>
 - used a real timestamp
 - use a merkle tree to store the transactions
 - store the merkle root in our block structure
 
Python libraries that this code depends on:
 - [hashlib](https://docs.python.org/3/library/hashlib.html)
 - [pickle](https://docs.python.org/3/library/pickle.html)

Other resources:
- [python 3 docs](https://docs.python.org/3/)
- [Google colab faq](https://research.google.com/colaboratory/faq.html)

## Exercises

1. Create your blocks using a real time stamp. (Is there a difference between this and an indexed method?)

2. Create a merkle root of the transactions from bitcoin block [641818](https://blockchair.com/bitcoin/block/641818) mined on August 2nd, 2020. A \*.csv file of the 1870 transactions can be downloaded from this [repo](https://github.com/millecodex/COMP726/blob/master/bitcoin_block_641818.csv). In github click **raw** to see unformatted text. Just use the ```hash``` column representing the tx hash; the other data is not stored in the tree.
