# Simulating IPFS and IPNS Systems

This notebook provides a Python-based simulation of IPFS (InterPlanetary File System) and IPNS (InterPlanetary Naming System) to test various linking strategies for storing and retrieving IPAROs.

The notebook uses three classes to simulate these systems:
- **IPARO**: Represents the storage object on IPFS.
- **IPNS**: Keeps track of the latest capture for different websites.
- **IPFS**: Simulates the hashing, storage, and retrieval of IPARO objects.

The goal of the simulation is to test various linking strategies.

In [1]:
# Importing the necessary libraries
import hashlib
import random

## IPARO Object

**Properties:**
- `CID`: The CID (Content Identifier) generated by IPFS.
- `Data`: The data of the capture.
- `Linked Node CID(s)`: The CID(s) of the nodes linked to it.

**Functions:**
- `get_cid`: Returns the CID of the IPARO.
- `get_linked_cids`: Returns the CID(s) of the linked node(s).
- `get_content`: Returns the content of the IPARO.
- `__str__`: Returns a string representation of the IPARO object.

In [2]:
class IPARO:
    def __init__(self, cid: str, linked_cids: list, content: str, timestamp: str):
        """
        Initialize an IPARO object with its CID, linked CID(s), and content.

        Args:
            cid (str): The CID of the IPARO.
            linked_cids (list): List of CIDs of linked nodes.
            content (str): The content of the IPARO.
        """
        self.__cid = cid
        self.__linked_cids = linked_cids
        self.__content = content
        self.__timestamp = timestamp

    def get_cid(self) -> str:
        '''
        Returns the CID of the IPARO.

        Returns:
            str: The CID of the IPARO.
        '''
        return self.__cid

    def get_linked_cids(self) -> list:
        '''
        Returns the CID(s) of linked nodes.

        Returns:
            list: List of linked node CIDs.
        '''
        return self.__linked_cids

    def get_content(self) -> str:
        '''
        Returns the content of the IPARO.

        Returns:
            str: The content stored in the IPARO.
        '''
        return self.__content

    def get_timestamp(self) -> str:
        '''
        Returns the timestamp of the IPARO

        Returns:
            str: The timestamp in seconds since the epoch
        '''
        return self.__timestamp

    def __str__(self):
        '''
        Returns a string representation of the IPARO object.

        Returns:
            str: A string containing the CID, linked CID(s), and content of the IPARO.
        '''
        iparo = {
            "CID": self.__cid,
            "Content": self.__content,
            "Linked CID(s)": self.__linked_cids,
            "Timestamp": self.__timestamp,
        }
        return str(iparo)

## IPNS Object

**Description:**
- The IPNS class stores and maps the latest CID of a website.
- Tracks the number of operations (get and update) performed.

**Functions:**
- `update`: Updates the latest CID of a website.
- `get_cid`: Retrieves the CID of the latest capture for a website.
- `get_counts`: Returns the number of operations performed.
- `reset_counts`: Resets the counters for operations.

In [3]:
class IPNS:
    def __init__(self):
        """
        Initialize the IPNS object with an empty hashmap for storing CIDs 
        and counters for tracking operations.
        """
        self.data = {}
        self.update_count = 0
        self.get_count = 0

    def update(self, url, cid):
        '''
        Updates the latest CID for a given URL.

        Args:
            url (str): The URL of the website.
            cid (str): The CID of the latest capture.
        '''
        self.update_count += 1
        self.data[url] = cid

    def get_cid(self, url) -> str:
        '''
        Retrieves the latest CID for a given URL.

        Args:
            url (str): The URL of the website.

        Returns:
            str: The CID of the latest capture for the given URL.
        '''
        self.get_count += 1
        return self.data[url]

    def get_counts(self) -> dict:
        '''
        Returns the number of update and get operations performed.

        Returns:
            dict: Dictionary with the counts of update and get operations.
        '''
        counts = {"get": self.get_count, "update": self.update_count}
        return counts

    def reset_counts(self):
        """
        Resets the operation counters.
        """
        self.update_count = 0
        self.get_count = 0

## IPFS Object

**Description:**
- The IPFS class stores the nodes and simulates the hashing, storage, and retrieval operations.
- Tracks the number of operations (hash, store, retrieve).

**Functions:**
- `hash`: Hashes the content of a node to generate its CID.
- `store`: Stores a node with its CID.
- `retrieve`: Retrieves a node using its CID.
- `get_counts`: Returns the number of operations performed.
- `reset_counts`: Resets the counters for operations.

In [4]:
class IPFS:
    def __init__(self):
        '''
        Initialize the IPFS object with an empty hashmap for storing nodes
        and counters for tracking operations.
        '''
        self.data = {}
        self.hash_count = 0
        self.store_count = 0
        self.retrieve_count = 0

    def hash(self, content: str) -> str:
        '''
        Hashes the content to generate a CID.

        Args:
            content (str): The content of the node.

        Returns:
            str: The generated CID.
        '''
        sha256_hash = hashlib.sha256(content.encode()).hexdigest()
        self.hash_count += 1
        return 'Qm' + sha256_hash[:34]

    def store(self, cid: str, node: IPARO):
        '''
        Stores a node with its CID.

        Args:
            cid (str): The CID of the node.
            node (IPARO): The IPARO object to store.
        '''
        self.store_count += 1
        self.data[cid] = node

    def retrieve(self, cid) -> IPARO:
        '''
        Retrieves a node using its CID.

        Args:
            cid (str): The CID of the node to retrieve.

        Returns:
            IPARO: The retrieved IPARO object.
        '''
        self.retrieve_count += 1
        return self.data[cid]

    def get_counts(self) -> dict:
        '''
        Returns the number of hash, store, and retrieve operations performed.

        Returns:
            dict: Dictionary with counts of hash, store, and retrieve operations.
        '''
        counts = {"hash": self.hash_count, "store": self.store_count,
                  "retrieve": self.retrieve_count}
        return counts

    def reset_counts(self):
        """
        Resets the operation counters.
        """
        self.hash_count = 0
        self.store_count = 0
        self.retrieve_count = 0

    def reset_data(self):
        self.data = {}

    def get_data(self) -> dict:
        """Returns the data stored by IPFS (for debugging)."""
        return self.data

## Initialization and Operation Tracking

Here, we initialize the IPFS and IPNS objects and define a helper function `get_op_counts()` to display the number of operations performed.


In [5]:
# Initializing the simulated IPFS and IPNS
ipfs = IPFS()
ipns = IPNS()


def get_op_counts():
    '''
    Displays the number of operations performed by IPNS and IPFS.
    '''
    print("Number of operations IPNS performed:")
    print(ipns.get_counts())
    print("Number of operations IPFS performed:")
    print(ipfs.get_counts())

## Testing Different Linking Strategies

### 1. Linking to Only the Previous Node

In this test, each node will link only to the previous node in the chain. This strategy will be used to simulate a simple sequential storage system.


#### Storing Nodes

In [6]:
import time

# Testing parameters
NODE_NUM = 100
URL = "example.com"

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)


# Automate the creation of additional nodes
for i in range(1, NODE_NUM):
    content = f"Node {i}"
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    linked_cids = [ipns.get_cid(URL)]  # Link to the previous node
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

get_op_counts()  # Output operation counts

Number of operations IPNS performed:
{'get': 99, 'update': 100}
Number of operations IPFS performed:
{'hash': 100, 'store': 100, 'retrieve': 0}


#### Retrieving Nodes

The following section tests retrieval of nodes by simulating a random node search.


In [7]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Traverse back through the linked nodes to find the target
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
while node.get_content() != target_content:
    node = ipfs.retrieve(node.get_linked_cids()[0])

# Output the found node
print(f"Found node: {node}")
get_op_counts()

Looking for node with content: Node 94
Found node: {'CID': 'Qm28de24571502f3dd124f62479d22c09928', 'Content': 'Node 94', 'Linked CID(s)': ['Qm132611ae6e39d2ac33380d2eaa7e3142fc'], 'Timestamp': 1732065991.908295}
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 6}


### 2. Linking to all previous nodes

In this test, each node will link to all the previous nodes in the chain.

#### Storing Nodes

In [8]:
# Resetting IPFS from the last test
ipfs.reset_data()

# Testing parameters
NODE_NUM = 100
URL = "example.com"

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)

From here, there are 2 ways of creating a new node for this linking strategy, since this is a simulation, no data corruption can happen but that might not be true in practice. When retrieving the latest node which should contain the CIDs of all the previous node, two scenarios can happen:
1. The data is intact and the CIDs in the list is "correct" (which we really can't know for sure) and we can just add it to the new node we're creating
2. The data is corrupt and one or more of the CIDs is wrong or unfinished, in which case we have to recheck every CID to rebuild a new list of linked CIDs (not to mention fixing all the corrupted nodes)

So, for the purpose of this simulation, we will perform a check for every CID in the linked CID list of an IPARO to simulate the worst case scenario every time

In [9]:
# To automate adding the rest of the nodes
for i in range(2, NODE_NUM):
    content = "Node "+str(i)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    linked_cids = []
    latest_node_linked_cids = latest_node.get_linked_cids()
    linked_cids = latest_node_linked_cids
    for link_cid in latest_node_linked_cids:
        ipfs.retrieve(link_cid)
        # Checking and repairing nodes goes here
    linked_cids.append(latest_node_cid)
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

# print(ipfs.get_data())
get_op_counts()  # Output operation counts

Number of operations IPNS performed:
{'get': 100, 'update': 100}
Number of operations IPFS performed:
{'hash': 100, 'store': 100, 'retrieve': 4955}


In this worst case scenario, where we have to retrieve and verify every CIDs in the linked CIDs of an IPARO, the retrieve count goes to almost 5000 (if we're storing 100 nodes)\
Of course this trade off makes it really easy to navigate to all the nodes just from the latest nodes

#### Retrieving nodes

The following section tests retrieval of nodes by simulating a random node search.

In [10]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Traverse back through the linked nodes to find the target
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.get_linked_cids()
for linked_cid in linked_cids:
    node = ipfs.retrieve(linked_cid)
    if node.get_content() == target_content:
        break

# Output the found node
print(f"Found node: {node}")
get_op_counts()

Looking for node with content: Node 42
Found node: {'CID': 'Qm9bbc52fbd4529a6bb0aa22a371c19a05a1', 'Content': 'Node 42', 'Linked CID(s)': ['Qm3bfbe4bddcd39c398727c9fe56691a97c9', 'Qm20251bc29c597374d12e4bf1a6ccd936bf', 'Qm362ee97b028ce03436e084cd2f4f94f7f4', 'Qm2a9bd593006fbd0b84150f7951ce4dc324', 'Qmacbb9d9e69ef5b63649f7c3ec8b0d4d7ad', 'Qm04de198dcd287a1c63152f9d09bfecd07a', 'Qm97eba64d75445f64b6c52af91b842e0c89', 'Qm1c0581c235d7655e7336f44ddfc1e90755', 'Qm7b49e158aba728b47665c7493b6ba3b0dd', 'Qm835348af2f32a651c17990daceac738aa6', 'Qm1151c18844ee168417364ff5b5e3f58621', 'Qmfb239c7a2e9b290932b7bb968a79e23a19', 'Qm7198e61aac8e35e22658e1aca990db39bf', 'Qm4e52e13eaec07a246f45e5b129f81c144a', 'Qm7f179472f71dee2dc3132496a4b9befb3e', 'Qm3e83b2057d411686b0f1ab8426242e5faf', 'Qmf6db185d7f4682a233653ce25a74a7aea7', 'Qm6a6915acd0ebb1bd4f71550eafe224d802', 'Qm093be6239ce965ba709e8a13c6d000b10c', 'Qm6501c4a4a13f6bba1f3b5ae39d5f85ca5f', 'Qm074b479ac4ee59dbb145d2c11646c4338b', 'Qm3e54a0bf09e618e5cf

### 3. Linking to previous and first node

In this test, each node will link to the previous node and the first node in the chain.

#### Storing Nodes

In [11]:
# Resetting IPFS from the last test
ipfs.reset_data()

# Testing parameters
NODE_NUM = 100
URL = "example.com"

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)

In [12]:
# To automate creating and adding the remaining nodes
for i in range(2, NODE_NUM):
    content = "Node "+str(i)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    linked_cids = []
    latest_node_linked_cids = latest_node.get_linked_cids()
    linked_cids.append(latest_node_linked_cids[0])
    linked_cids.append(latest_node_cid)
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

# print(ipfs.get_data())
get_op_counts()  # Output operation counts

Number of operations IPNS performed:
{'get': 100, 'update': 100}
Number of operations IPFS performed:
{'hash': 100, 'store': 100, 'retrieve': 142}


#### Retrieving Nodes

In [13]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Check if the first node is the desired node then search the other nodes
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.get_linked_cids()
first_node_cid = linked_cids[0]
first_node = ipfs.retrieve(first_node_cid)
if first_node.get_content() == target_content:
    print(f"Found node: {node}")
else:
    while True:
        node = ipfs.retrieve(linked_cids[1])
        linked_cids = node.get_linked_cids()
        if node.get_content() == target_content:
            break

# Output the found node
print(f"Found node: {node}")
get_op_counts()

Looking for node with content: Node 85
Found node: {'CID': 'Qm820ddadec30147ac577de08c9180407ef5', 'Content': 'Node 85', 'Linked CID(s)': ['Qm656bd171d76fe39c63a742e41698897c8d', 'Qmc32f3bd8c4f250befe632727ae84203bf0'], 'Timestamp': 1732065991.994444}
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 16}


### 4. Linking to K-previous and first node

In this test, each node will link to K previous node and the first node in the chain.

#### Storing Nodes

In [14]:
# Resetting IPFS from the last test
ipfs.reset_data()

# Testing parameters
NODE_NUM = 100
URL = "example.com"
K = 5

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)

In [15]:
# To automate creating and adding the remaining nodes
for i in range(2, NODE_NUM):
    content = "Node "+str(i)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    latest_node_linked_cids = latest_node.get_linked_cids()
    linked_cids = []
    linked_cids.append(latest_node_linked_cids[0])
    if len(latest_node_linked_cids) == K+1:
        linked_cids.extend(latest_node_linked_cids[2:])
        linked_cids.append(latest_node_cid)
    else:
        linked_cids.extend(latest_node_linked_cids[1:])
        linked_cids.append(latest_node_cid)
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

# print(ipfs.get_data())
get_op_counts()  # Output operation counts

Number of operations IPNS performed:
{'get': 100, 'update': 100}
Number of operations IPFS performed:
{'hash': 100, 'store': 100, 'retrieve': 114}


#### Retrieving Nodes

In [16]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Check if the first node is the desired node then search the other nodes
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.get_linked_cids()
first_node_cid = linked_cids[0]
first_node = ipfs.retrieve(first_node_cid)
if first_node.get_content() == target_content:
    print(f"Found node: {node}")
else:
    # run through the rest of the linked cids list to check
    for cid in linked_cids:
        node = ipfs.retrieve(cid)
        if node.get_content() == target_content:
            break
    # Get the linked cids of the node at index 1
    node = ipfs.retrieve(linked_cids[1])
    linked_cids = node.get_linked_cids()
    while True:
        # Flag to determine if we should exit the while loop
        found = False

        # Run through the rest of the linked CIDs list to check
        for cid in linked_cids[1:]:
            node = ipfs.retrieve(cid)
            if node.get_content() == target_content:
                found = True
                break

        if found:
            break  # Exit the while loop if the node was found

        if len(linked_cids) >= 2:
            node = ipfs.retrieve(linked_cids[1])
            linked_cids = node.get_linked_cids()
        else:
            print('Can\'t find node')
            break

# Output the found node
print(f"Found node: {node}")
get_op_counts()

Looking for node with content: Node 48
Found node: {'CID': 'Qmcb7a8450120e832d7c76137f4021134176', 'Content': 'Node 48', 'Linked CID(s)': ['Qm5352c38f80d9d8d7caefb9ff9c29dbf324', 'Qma11ce9161ca9a3e719637216ab0b6e716b', 'Qmdcdbdf686e632621bb149fe06f309de850', 'Qmcb3eb99e7f0a27971e7fcc07f8d18fa8d6', 'Qm7790335b990a338bf4be384b6571e3a16f', 'Qm810d790dc6bea7f0f66835499f070218fc'], 'Timestamp': 1732065992.0272117}
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 68}


### Other strategies to be tested