# Questions

# Q1
> Why is hash collision evitable mathematically? 

# Q2 
> Suppose that Trudy is in a room containing a total of N people (including herself). What is the probability that at least one of the other N - 1 people have the same birthday as Trudy? What is the minimum N such that the probability is over 50%? 

# Q3 
> What is the probability that any two (or more) people in a room share the same birthday, where there are N people (N<=365) in the room? What is the minimum number of N such that the chance is over 50%? 

# Q4
> What is the main idea behind the birthday attack on hash? How can a birthday attack improve the efficiency of an attack compared to a naive brute force attack?

# Q5 
> What is the main potential issue for hash function that is constructed using Merkle-Damagurd Construction process? 

# Q6 
> (optional) Open discussion on the main theme of the project: can we predict the future?

# Code Project 

# Problem 1 
> Merkel tree implementation

a.	use SHA256 or any other hash algorithm like SHA3. You do not need to implement hash algorithms unless you want to. Explore your favorable package. I believe pycryptodome should have hash algorithms.

b.	leaf node represents a plain text file in which you fill in any contents.

c.	Test case 1 with four leaf nodes and test case 2 with six leaf nodes. Print out the tree structure and the corresponding hashes.


In [75]:
from hashlib import sha256
from typing import List, Optional
from IPython.display import Markdown
from project6.mermaid import Mermaid

class Node:
    left: Optional["Node"] = None
    right: Optional["Node"] = None
    parent: Optional["Node"] = None
    _value: Optional[str] = None

    def __init__(self, value: Optional[str] = None):
        self._value = value

    @property
    def digest(self) -> bytes:
        m = sha256()
        if self.left is not None and self.right is not None:
            m.update(self.left.digest)
            m.update(self.right.digest)
        else:
            assert self._value is not None
            m.update(self._value.encode())

        return m.digest()

    @property
    def value(self):
        return self.digest.hex()[:8]


class Tree:
    @staticmethod
    def build(nodes: List[Node]):
        n = len(nodes)
        for i in range(n):
            if i * 2 + 1 < n:
                nodes[i].left = nodes[i * 2 + 1]
                nodes[i * 2 + 1].parent = nodes[i]
            if i * 2 + 2 < n:
                nodes[i].right = nodes[i * 2 + 2]
                nodes[i * 2 + 2].parent = nodes[i]
        return nodes[0]


In [76]:
four_leaves = [
    Node(),
    Node(),
    Node(),
    Node('A'),
    Node('B'),
    Node('C'),
    Node('D'),
]
six_leaves = [
    Node(),
    Node(),
    Node(),
    Node(),
    Node(),
    Node('C'),
    Node('D'),
    Node('A'),
    Node('B'),
    Node('E'),
    Node('F'),
]

root_four_leaves = Tree.build(four_leaves)
root_six_leaves = Tree.build(six_leaves)

In [77]:
Markdown(f"""
### 4 Leaf Nodes

{Mermaid.render_binary_search_tree(root_four_leaves)}
""")


### 4 Leaf Nodes

```mermaid
flowchart TD

_ROOT[1b3faa3f]
_ROOT[1b3faa3f] --> _ROOT-L1[63956f0c]
_ROOT-L1[63956f0c] --> _ROOT-L1-L2[559aead0]
_ROOT-L1[63956f0c] --> _ROOT-L1-R2[df7e70e5]
_ROOT[1b3faa3f] --> _ROOT-R1[98a2fbfd]
_ROOT-R1[98a2fbfd] --> _ROOT-R1-L2[6b23c0d5]
_ROOT-R1[98a2fbfd] --> _ROOT-R1-R2[3f39d5c3]
```



In [78]:

Markdown(f"""
### 6 Leaf Nodes

{Mermaid.render_binary_search_tree(root_six_leaves)}
""")


### 6 Leaf Nodes

```mermaid
flowchart TD

_ROOT[e94c754c]
_ROOT[e94c754c] --> _ROOT-L1[2893d352]
_ROOT-L1[2893d352] --> _ROOT-L1-L2[63956f0c]
_ROOT-L1-L2[63956f0c] --> _ROOT-L1-L2-L3[559aead0]
_ROOT-L1-L2[63956f0c] --> _ROOT-L1-L2-R3[df7e70e5]
_ROOT-L1[2893d352] --> _ROOT-L1-R2[ea737dc6]
_ROOT-L1-R2[ea737dc6] --> _ROOT-L1-R2-L3[a9f51566]
_ROOT-L1-R2[ea737dc6] --> _ROOT-L1-R2-R3[f67ab10a]
_ROOT[e94c754c] --> _ROOT-R1[98a2fbfd]
_ROOT-R1[98a2fbfd] --> _ROOT-R1-L2[6b23c0d5]
_ROOT-R1[98a2fbfd] --> _ROOT-R1-R2[3f39d5c3]
```



# Problem 2 
> Root hash Observation

a.	Under four leaf nodes, alter one of the leaf node (change some content for the text file) and compare the root hash with the one with original file. What can you say?

Any changes to any of the node's value, will result in the root node being changed as well. 


In [79]:
four_leaves = [
    Node(),
    Node(),
    Node(),
    Node('Z'),
    Node('B'),
    Node('C'),
    Node('D'),
]

root_four_leaves = Tree.build(four_leaves)

In [80]:
Markdown(f"""
### 4 Leaf Nodes with change

`A` has been changed to `Z`. Root hash is now `81205ce6` instead of `1b3faa3f`.

{Mermaid.render_binary_search_tree(root_four_leaves)}
""")


### 4 Leaf Nodes with change

`A` has been changed to `Z`. Root hash is now `81205ce6` instead of `1b3faa3f`.

```mermaid
flowchart TD

_ROOT[81205ce6]
_ROOT[81205ce6] --> _ROOT-L1[0c87492c]
_ROOT-L1[0c87492c] --> _ROOT-L1-L2[bbeebd87]
_ROOT-L1[0c87492c] --> _ROOT-L1-R2[df7e70e5]
_ROOT[81205ce6] --> _ROOT-R1[98a2fbfd]
_ROOT-R1[98a2fbfd] --> _ROOT-R1-L2[6b23c0d5]
_ROOT-R1[98a2fbfd] --> _ROOT-R1-R2[3f39d5c3]
```



# Problem 3 
> Hash Collision and Hash Puzzle

a.	Take any one text file that is attached to one leaf node, Generate as many text files as possible with the same meaning. For example, adding more spaces to the file that does not change the meaning of the file but should generate different hash. Test your luck if you can find a hash collision. 

b.	In general, with hash256, it is rarely luck to find a collision. Change the hash to a weak one, say MD4. I guess unless you are using specific technique like herding, you will not be able to find collision either. Again, if you are luck, you can.

c.	Then you shall move to hash puzzle. It is basically finding certain hash that satisfy set conditions. For example, the first 8 bits are zero. For our project, you start from 1 bit zero and gradually move to 8 bit zeros. You can stop anytime. 

d.	Summarize what you have found.
