# Simulating IPFS and IPNS Systems

This notebook provides a Python-based simulation of IPFS (InterPlanetary File System) and IPNS (InterPlanetary Naming System) to test various linking strategies for storing and retrieving IPAROs.

The notebook uses three classes to simulate these systems:
- **IPARO**: Represents the storage object on IPFS.
- **IPNS**: Keeps track of the latest capture for different websites.
- **IPFS**: Simulates the hashing, storage, and retrieval of IPARO objects.

The goal of the simulation is to test various linking strategies.

In [2]:
# Importing the necessary libraries
import hashlib
import random
import datetime

## IPARO Object

**Properties:**
- `CID`: The CID (Content Identifier) generated by IPFS.
- `Data`: The data of the capture.
- `Linked Node CID(s)`: The CID(s) of the nodes linked to it.

**Functions:**
- `get_cid`: Returns the CID of the IPARO.
- `get_linked_cids`: Returns the CID(s) of the linked node(s).
- `get_content`: Returns the content of the IPARO.
- `__str__`: Returns a string representation of the IPARO object.

In [3]:
class IPARO:
    def __init__(self, cid: str, number: int, linked_cids: list, content: str, timestamp: str):
        """
        Initialize an IPARO object with its CID, linked CID(s), and content.

        Args:
            cid (str): The CID of the IPARO.
            linked_cids (list): List of CIDs of linked nodes.
            content (str): The content of the IPARO.
        """
        self.cid = cid
        self.linked_cids = linked_cids
        self.content = content
        self.timestamp = timestamp
        self.number = number

    def __str__(self):
        '''
        Returns a string representation of the IPARO object.

        Returns:
            str: A string containing the CID, linked CID(s), and content of the IPARO.
        '''
        iparo = {
            "CID": self.cid,
            "Content": self.content,
            "Linked CID(s)": self.linked_cids,
            "Timestamp": self.timestamp,
        }
        return str(iparo)

## IPNS Object

**Description:**
- The IPNS class stores and maps the latest CID of a website.
- Tracks the number of operations (get and update) performed.

**Functions:**
- `update`: Updates the latest CID of a website.
- `get_cid`: Retrieves the CID of the latest capture for a website.
- `get_counts`: Returns the number of operations performed.
- `reset_counts`: Resets the counters for operations.

In [4]:
class IPNS:
    def __init__(self):
        """
        Initialize the IPNS object with an empty hashmap for storing CIDs 
        and counters for tracking operations.
        """
        self.data = {}
        self.update_count = 0
        self.get_count = 0

    def update(self, url, cid):
        '''
        Updates the latest CID for a given URL.

        Args:
            url (str): The URL of the website.
            cid (str): The CID of the latest capture.
        '''
        self.update_count += 1
        self.data[url] = cid

    def get_cid(self, url) -> str:
        '''
        Retrieves the latest CID for a given URL.

        Args:
            url (str): The URL of the website.

        Returns:
            str: The CID of the latest capture for the given URL.
        '''
        self.get_count += 1
        return self.data[url]

    def get_counts(self) -> dict:
        '''
        Returns the number of update and get operations performed.

        Returns:
            dict: Dictionary with the counts of update and get operations.
        '''
        counts = {"get": self.get_count, "update": self.update_count}
        return counts

    def reset_counts(self):
        """
        Resets the operation counters.
        """
        self.update_count = 0
        self.get_count = 0

## IPFS Object

**Description:**
- The IPFS class stores the nodes and simulates the hashing, storage, and retrieval operations.
- Tracks the number of operations (hash, store, retrieve).

**Functions:**
- `hash`: Hashes the content of a node to generate its CID.
- `store`: Stores a node with its CID.
- `retrieve`: Retrieves a node using its CID.
- `get_counts`: Returns the number of operations performed.
- `reset_counts`: Resets the counters for operations.

In [5]:
class IPFS:
    def __init__(self):
        '''
        Initialize the IPFS object with an empty hashmap for storing nodes
        and counters for tracking operations.
        '''
        self.data = {}
        self.hash_count = 0
        self.store_count = 0
        self.retrieve_count = 0

    def hash(self, content: str) -> str:
        '''
        Hashes the content to generate a CID.

        Args:
            content (str): The content of the node.

        Returns:
            str: The generated CID.
        '''
        sha256_hash = hashlib.sha256(content.encode()).hexdigest()
        self.hash_count += 1
        return 'Qm' + sha256_hash[:34]

    def store(self, cid: str, node: IPARO):
        '''
        Stores a node with its CID.

        Args:
            cid (str): The CID of the node.
            node (IPARO): The IPARO object to store.
        '''
        self.store_count += 1
        self.data[cid] = node

    def retrieve(self, cid) -> IPARO:
        '''
        Retrieves a node using its CID.

        Args:
            cid (str): The CID of the node to retrieve.

        Returns:
            IPARO: The retrieved IPARO object.
        '''
        self.retrieve_count += 1
        return self.data[cid]

    def get_counts(self) -> dict:
        '''
        Returns the number of hash, store, and retrieve operations performed.

        Returns:
            dict: Dictionary with counts of hash, store, and retrieve operations.
        '''
        counts = {"hash": self.hash_count, "store": self.store_count,
                  "retrieve": self.retrieve_count}
        return counts

    def reset_counts(self):
        """
        Resets the operation counters.
        """
        self.hash_count = 0
        self.store_count = 0
        self.retrieve_count = 0

    def reset_data(self):
        self.data = {}

    def get_data(self) -> dict:
        """Returns the data stored by IPFS (for debugging)."""
        return self.data

### IPAROFactory

In [None]:
class IPAROFactory:        
    def create_iparo(self, url: str, content: str) -> IPARO:
        """
        Create an IPARO object, hash its content to generate a CID, and store it in IPFS.

        Args:
            url (str): The URL associated with the IPARO.
            content (str): The content to be captured.

        Returns:
            IPARO: The created IPARO object.
        """
        timestamp = datetime.now().isoformat(timespec='seconds')
        cid = self.ipfs.hash(content)
        linked_cid = {}
        number = len(self.ipfs.get_data()) # not sure yet

        iparo = IPARO(url=url, cid=cid, number=number, linked_cids=linked_cid, content=content, timestamp=timestamp)

        self.ipfs.store(cid, iparo)

        return iparo

### Link Strategies

In [None]:
class LinkStrategy:
    # abstract method for different strategies to use
    def get_linked_nodes(self, url: str, latest_node_cids: dict, num_nodes: int) -> dict:
        """
        return a dictionary of cids and timestamp for linkning
        """
        pass
       

## Initialization and Operation Tracking

Here, we initialize the IPFS and IPNS objects and define a helper function `get_op_counts()` to display the number of operations performed.


In [6]:
# Initializing the simulated IPFS and IPNS
ipfs = IPFS()
ipns = IPNS()


def get_op_counts():
    '''
    Displays the number of operations performed by IPNS and IPFS.
    '''
    print("Number of operations IPNS performed:")
    print(ipns.get_counts())
    print("Number of operations IPFS performed:")
    print(ipfs.get_counts())

## Testing Different Linking Strategies

### 1. Linking to Only the Previous Node

In this test, each node will link only to the previous node in the chain. This strategy will be used to simulate a simple sequential storage system.


#### Storing Nodes

In [7]:
import time

# Testing parameters
NODE_NUM = 100
URL = "example.com"

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)


# Automate the creation of additional nodes
for i in range(1, NODE_NUM):
    content = f"Node {i}"
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    linked_cids = [ipns.get_cid(URL)]  # Link to the previous node
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

get_op_counts()  # Output operation counts

Number of operations IPNS performed:
{'get': 99, 'update': 100}
Number of operations IPFS performed:
{'hash': 100, 'store': 100, 'retrieve': 0}


#### Retrieving Nodes

The following section tests retrieval of nodes by simulating a random node search.


In [8]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Traverse back through the linked nodes to find the target
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
while node.content != target_content:
    node = ipfs.retrieve(node.linked_cids[0])

# Output the found node
print(f"Found node: {node}")
get_op_counts()

Looking for node with content: Node 71
Found node: {'CID': 'Qma6d8e2423552ef3951f50d899e8bc70efe', 'Content': 'Node 71', 'Linked CID(s)': ['Qme4c48c5c42580d13db900168d4d6907614'], 'Timestamp': 1737586700.179211}
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 29}


### 2. Linking to all previous nodes

In this test, each node will link to all the previous nodes in the chain.

#### Storing Nodes

In [9]:
# Resetting IPFS from the last test
ipfs.reset_data()

# Testing parameters
NODE_NUM = 100
URL = "example.com"

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)

From here, there are 2 ways of creating a new node for this linking strategy, since this is a simulation, no data corruption can happen but that might not be true in practice. When retrieving the latest node which should contain the CIDs of all the previous node, two scenarios can happen:
1. The data is intact and the CIDs in the list is "correct" (which we really can't know for sure) and we can just add it to the new node we're creating
2. The data is corrupt and one or more of the CIDs is wrong or unfinished, in which case we have to recheck every CID to rebuild a new list of linked CIDs (not to mention fixing all the corrupted nodes)

So, for the purpose of this simulation, we will perform a check for every CID in the linked CID list of an IPARO to simulate the worst case scenario every time

In [10]:
# To automate adding the rest of the nodes
for i in range(2, NODE_NUM):
    content = "Node "+str(i)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    linked_cids = []
    latest_node_linked_cids = latest_node.linked_cids
    linked_cids = latest_node_linked_cids
    for link_cid in latest_node_linked_cids:
        ipfs.retrieve(link_cid)
        # Checking and repairing nodes goes here
    linked_cids.append(latest_node_cid)
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

# print(ipfs.get_data())
get_op_counts()  # Output operation counts

Number of operations IPNS performed:
{'get': 100, 'update': 100}
Number of operations IPFS performed:
{'hash': 100, 'store': 100, 'retrieve': 4978}


In this worst case scenario, where we have to retrieve and verify every CIDs in the linked CIDs of an IPARO, the retrieve count goes to almost 5000 (if we're storing 100 nodes)\
Of course this trade off makes it really easy to navigate to all the nodes just from the latest nodes

#### Retrieving nodes

The following section tests retrieval of nodes by simulating a random node search.

In [11]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Traverse back through the linked nodes to find the target
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.linked_cids
for linked_cid in linked_cids:
    node = ipfs.retrieve(linked_cid)
    if node.content == target_content:
        break

# Output the found node
print(f"Found node: {node}")
get_op_counts()

Looking for node with content: Node 60
Found node: {'CID': 'Qm1da1654734bcd4c1916abecb53187078a0', 'Content': 'Node 60', 'Linked CID(s)': ['Qm87e94bd507c616d1722ff02dd064c75f07', 'Qm252912dc39ee661294bbeb314735696322', 'Qm85fff761585c14ca022e3cb5ef98bd5b53', 'Qmd819950d6e7f6ff0ae2cd8fbb2eb4f43fc', 'Qm028bf8e8b2db4872aec23e70769408f7de', 'Qmdbe64536d592c6d25aeb2ac0b33471bc34', 'Qme691d3791804c55ac5eb274500224cd399', 'Qm00ad1ba47942cacf117a518af3e7bab22c', 'Qm3d896f5e7c6e56b786f74729bc00b1757b', 'Qm4b3af7388608379f5159b11923e17d8f63', 'Qm49fec5db675584fea5647b927603d283ff', 'Qm2c0b84269832a02e83d101648515340bf4', 'Qm42d798dd864b6b8f28270e76341206f164', 'Qm4683fb47cc9a69f84b82122a60fff9200e', 'Qm3289b0cedae6d96b6aafbe38bfe672932b', 'Qm7aabef6d02221b0930a0ff9cf72dc3cbcb', 'Qm926210aeb5e482cdb64971669029ef9c4f', 'Qm304c8f8dabf8ea5a85e9cc5547d32a2373', 'Qm35cf86b224bb43b71c4cbd36c1ab2f08b8', 'Qm45e64c5fbb3cb9416fcdb9ee2062308faa', 'Qm2af12ba3fe8961a200f6787282780d1cd3', 'Qme466eab1165dde6833

### 3. Linking to previous and first node

In this test, each node will link to the previous node and the first node in the chain.

#### Storing Nodes

In [12]:
# Resetting IPFS from the last test
ipfs.reset_data()

# Testing parameters
NODE_NUM = 100
URL = "example.com"

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)

In [13]:
# To automate creating and adding the remaining nodes
for i in range(2, NODE_NUM):
    content = "Node "+str(i)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    linked_cids = []
    latest_node_linked_cids = latest_node.linked_cids
    linked_cids.append(latest_node_linked_cids[0])
    linked_cids.append(latest_node_cid)
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

# print(ipfs.get_data())
get_op_counts()  # Output operation counts

Number of operations IPNS performed:
{'get': 100, 'update': 100}
Number of operations IPFS performed:
{'hash': 100, 'store': 100, 'retrieve': 160}


#### Retrieving Nodes

In [14]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Check if the first node is the desired node then search the other nodes
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.linked_cids
first_node_cid = linked_cids[0]
first_node = ipfs.retrieve(first_node_cid)
if first_node.content == target_content:
    print(f"Found node: {node}")
else:
    while True:
        node = ipfs.retrieve(linked_cids[1])
        linked_cids = node.linked_cids
        if node.content == target_content:
            break

# Output the found node
print(f"Found node: {node}")
get_op_counts()

Looking for node with content: Node 43
Found node: {'CID': 'Qmc5a1d37f984ab18bb1ac81dab2965af48c', 'Content': 'Node 43', 'Linked CID(s)': ['Qm6d875a687a6fe84af3f690e499a362c675', 'Qmcd7ab5bb680959cb249e5f9ee094d940b3'], 'Timestamp': 1737586700.2084281}
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 58}


### 4. Linking to K-previous and first node

In this test, each node will link to K previous node and the first node in the chain.

#### Storing Nodes

In [15]:
# Resetting IPFS from the last test
ipfs.reset_data()

# Testing parameters
NODE_NUM = 100
URL = "example.com"
K = 5

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)

In [16]:
# To automate creating and adding the remaining nodes
for i in range(2, NODE_NUM):
    content = "Node "+str(i)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    latest_node_linked_cids = latest_node.linked_cids
    linked_cids = []
    linked_cids.append(latest_node_linked_cids[0])
    if len(latest_node_linked_cids) == K+1:
        linked_cids.extend(latest_node_linked_cids[2:])
    else:
        linked_cids.extend(latest_node_linked_cids[1:])
    linked_cids.append(latest_node_cid)
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

get_op_counts()  # Output operation counts

Number of operations IPNS performed:
{'get': 100, 'update': 100}
Number of operations IPFS performed:
{'hash': 100, 'store': 100, 'retrieve': 156}


#### Retrieving Nodes

In [17]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Check if the first node is the desired node then search the other nodes
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.linked_cids
first_node_cid = linked_cids[0]
first_node = ipfs.retrieve(first_node_cid)
if first_node.content == target_content:
    print(f"Found node: {node}")
else:
    # run through the rest of the linked cids list to check
    for cid in linked_cids:
        node = ipfs.retrieve(cid)
        if node.content == target_content:
            break
    # Get the linked cids of the node at index 1
    node = ipfs.retrieve(linked_cids[1])
    linked_cids = node.linked_cids
    while True:
        # Flag to determine if we should exit the while loop
        found = False

        # Run through the rest of the linked CIDs list to check
        for cid in linked_cids[1:]:
            node = ipfs.retrieve(cid)
            if node.content == target_content:
                found = True
                break

        if found:
            break  # Exit the while loop if the node was found

        if len(linked_cids) >= 2:
            node = ipfs.retrieve(linked_cids[1])
            linked_cids = node.linked_cids
        else:
            print('Can\'t find node')
            break

# Output the found node
print(f"Found node: {node}")
get_op_counts()

Looking for node with content: Node 61
Found node: {'CID': 'Qme8ee85530047974dd6fce81f94f7eb61e1', 'Content': 'Node 61', 'Linked CID(s)': ['Qm7002299cee28720f7f6610a5deef274c72', 'Qm035b891928734dcc4e958d10b72c64f155', 'Qme3404c45ce32b2cae3e52802048f287bd0', 'Qm7a116eda93546b04ed5b5da93206324156', 'Qmac3ec569cd2fe5b69f471e9dd4efe1ce0d', 'Qmaeb8a331a50728c73f59bff02269bde370'], 'Timestamp': 1737586700.223305}
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 48}


### 5. Linking to K-random and first node

In this test, each node will link to a random K previous node and the first node in the chain.

#### Storing Nodes

In [18]:
import random

# Resetting IPFS from the last test
ipfs.reset_data()

# Testing parameters
NODE_NUM = 30
URL = "example.com"
Kmin = 5
Kmax = 10

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)

# To automate creating and adding the remaining nodes
for i in range(2, NODE_NUM):
    content = "Node "+str(i)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    latest_node_linked_cids = latest_node.linked_cids
    linked_cids = []
    linked_cids.append(latest_node_linked_cids[0])
    K = random.randint(Kmin, Kmax)
    # Check if the number of linked CIDs is greater than K and add K-1 random linked CIDs
    if len(latest_node_linked_cids) > K:
        linked_cids.extend(random.sample(latest_node_linked_cids[1:], K-1))
    # If the number of linked CIDs is less than K add all the linked CIDs
    else:
        linked_cids.extend(latest_node_linked_cids[1:])
    linked_cids.append(latest_node_cid)
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    print("K = ", K+1)
    print("Length of linked_cids: ", len(linked_cids))
    print(node)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

get_op_counts()  # Output operation counts

K =  11
Length of linked_cids:  2
{'CID': 'Qm4589407531b9ac9d8068039d2fd4281f1d', 'Content': 'Node 2', 'Linked CID(s)': ['Qm58bc5abb4e0ae4b7b18942a800381e293d', 'Qm1e263465f816c6b18846d14a7560ad48a9'], 'Timestamp': 1737586700.235276}
K =  10
Length of linked_cids:  3
{'CID': 'Qmd9d0389117e4d0305757d16762970ce24b', 'Content': 'Node 3', 'Linked CID(s)': ['Qm58bc5abb4e0ae4b7b18942a800381e293d', 'Qm1e263465f816c6b18846d14a7560ad48a9', 'Qm4589407531b9ac9d8068039d2fd4281f1d'], 'Timestamp': 1737586700.235373}
K =  10
Length of linked_cids:  4
{'CID': 'Qm68bea1e9c601261a9817d4256f7ff6b1d9', 'Content': 'Node 4', 'Linked CID(s)': ['Qm58bc5abb4e0ae4b7b18942a800381e293d', 'Qm1e263465f816c6b18846d14a7560ad48a9', 'Qm4589407531b9ac9d8068039d2fd4281f1d', 'Qmd9d0389117e4d0305757d16762970ce24b'], 'Timestamp': 1737586700.235392}
K =  7
Length of linked_cids:  5
{'CID': 'Qm28cbb9c52f43bd1f7e312e4d11b18b9ed4', 'Content': 'Node 5', 'Linked CID(s)': ['Qm58bc5abb4e0ae4b7b18942a800381e293d', 'Qm1e263465f816c6b

#### Retrieving nodes

In [19]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Check if the first node is the desired node then search the other nodes
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.linked_cids
first_node_cid = linked_cids[0]
first_node = ipfs.retrieve(first_node_cid)
if first_node.content == target_content:
    print(f"Found node: {node}")
else:
    # run through the rest of the linked cids list to check
    for cid in linked_cids:
        node = ipfs.retrieve(cid)
        if node.content == target_content:
            break
    # Get the linked cids of the node at index 1
    node = ipfs.retrieve(linked_cids[1])
    linked_cids = node.linked_cids
    while True:
        # Flag to determine if we should exit the while loop
        found = False

        # Run through the rest of the linked CIDs list to check
        for cid in linked_cids[1:]:
            node = ipfs.retrieve(cid)
            if node.content == target_content:
                found = True
                break

        if found:
            break  # Exit the while loop if the node was found

        if len(linked_cids) >= 2:
            node = ipfs.retrieve(linked_cids[1])
            linked_cids = node.linked_cids
        else:
            print('Can\'t find node')
            break

# Output the found node
print(f"Found node: {node}")
get_op_counts()

Looking for node with content: Node 9
Can't find node
Found node: {'CID': 'Qm1e263465f816c6b18846d14a7560ad48a9', 'Content': 'Node 1', 'Linked CID(s)': ['Qm58bc5abb4e0ae4b7b18942a800381e293d'], 'Timestamp': 1737586700.235092}
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 41}


### Linking to Sequential Exponential (Base K, K an integer)

#### Storing nodes

In [20]:
ipfs.reset_data()

# Testing parameters
NODE_NUM = 50
K = 2
URL = "example.com"

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)

# To automate creating and adding the remaining nodes
for i in range(2, NODE_NUM):
    content = "Node " + str(i)
    print(content)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    latest_node_linked_cids = latest_node.linked_cids
    linked_cids = []

    # Link to previous node FIRST
    linked_cids.append(latest_node_linked_cids[0])
    cids = []

    j = 1
    # Temp node = latest node CID inserted (before this one). Call that node K-1.
    if i == 4:
        pass
    temp_node = latest_node
    print(temp_node.content)
    done = False
    while True:
        # Theorem: For any m < len(temp_node_linked_cids), the (1+m)th-to-last position of the linked CIDs list will link
        # K^m nodes away from the node.
        # Base Case: The previous node always gets assigned the last position (m=0), so the base case holds.
        # Inducive Case: Suppose that the property holds for m=k. Then, we need to prove that it holds for m=k+1.
        # The start node is K^k nodes away from the most recent node. But if we use the kth position of the linked
        # CIDs, and travelled the kth link (K - 1) times, then the CID to be added to the linked CIDs (at the (k+1)th
        # position) is K^k + (K-1) * K^k = K*K^k = K^(k+1), which proves the inductive case.
        # Therefore, by the Principle of Mathematical Induction, this theorem holds.
        for _ in range(K - 1):
            temp_cid = temp_node.cid
            temp_node_linked_cids = temp_node.linked_cids
            done = j > len(temp_node_linked_cids)
            if done:
                break
            temp_node = ipfs.retrieve(temp_node_linked_cids[-j])
        if done:
            break
        print(temp_node.content)
        # Ensure no duplicate links
        if temp_node.cid not in linked_cids:
            cids.append(temp_node.cid)
        j += 1
    cids = list(reversed(cids))
    linked_cids.extend(cids)
    linked_cids.append(latest_node_cid)

    print("Length of linked_cids: ", len(linked_cids))
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

get_op_counts()  # Output operation counts

Node 2
Node 1
Node 0
Length of linked_cids:  2
Node 3
Node 2
Node 1
Length of linked_cids:  3
Node 4
Node 3
Node 2
Node 0
Length of linked_cids:  3
Node 5
Node 4
Node 3
Node 1
Length of linked_cids:  4
Node 6
Node 5
Node 4
Node 2
Length of linked_cids:  4
Node 7
Node 6
Node 5
Node 3
Node 0
Length of linked_cids:  4
Node 8
Node 7
Node 6
Node 4
Node 0
Length of linked_cids:  4
Node 9
Node 8
Node 7
Node 5
Node 1
Length of linked_cids:  5
Node 10
Node 9
Node 8
Node 6
Node 2
Length of linked_cids:  5
Node 11
Node 10
Node 9
Node 7
Node 3
Length of linked_cids:  5
Node 12
Node 11
Node 10
Node 8
Node 4
Length of linked_cids:  5
Node 13
Node 12
Node 11
Node 9
Node 5
Node 0
Length of linked_cids:  5
Node 14
Node 13
Node 12
Node 10
Node 6
Node 0
Length of linked_cids:  5
Node 15
Node 14
Node 13
Node 11
Node 7
Node 0
Length of linked_cids:  5
Node 16
Node 15
Node 14
Node 12
Node 8
Node 0
Length of linked_cids:  5
Node 17
Node 16
Node 15
Node 13
Node 9
Node 1
Length of linked_cids:  6
Node 18
Node 

In [21]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Check if the first node is the desired node then search the other nodes
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.linked_cids
first_node_cid = linked_cids[0]
first_node = ipfs.retrieve(first_node_cid)
if first_node.content == target_content:
    print(f"Found node: {node}")
else:
    # run through the rest of the linked cids list to check
    for cid in linked_cids:
        node = ipfs.retrieve(cid)
        if node.content == target_content:
            break
    # Get the linked cids of the node at index -1 (where the latest index is)
    node = ipfs.retrieve(linked_cids[-1])
    linked_cids = node.linked_cids
    while True:
        # Flag to determine if we should exit the while loop
        found = False

        # Run through the rest of the linked CIDs list to check
        for cid in linked_cids[1:]:
            node = ipfs.retrieve(cid)
            if node.content == target_content:
                found = True
                break

        if found:
            print(f"Found node: {node}")
            break  # Exit the while loop if the node was found

        if len(linked_cids) >= 2:
            node = ipfs.retrieve(linked_cids[-1])
            linked_cids = node.linked_cids
            # Output the found node
        else:
            print('Can\'t find node')
            break

get_op_counts()

Looking for node with content: Node 17
Found node: {'CID': 'Qm65f4ea60a75f02e024a3171aaac2bb503a', 'Content': 'Node 17', 'Linked CID(s)': ['Qmba97b515dd042cd5d4c14b9192e4e7cd97', 'Qm4cf86a052fd045c23675bd194af17ab635', 'Qmadd8d2e1a8e5e37d3e3e494406c7be63a7', 'Qmcb754b9cd080484784b2a8c4705fea6dbc', 'Qm54a5099d9a922b1a56bde5e09be246dbef', 'Qmc635416d8199a2ed2ab47c227e1bfa3024'], 'Timestamp': 1737586700.248884}
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 112}


### 7. Linking to sequentially uniform N-prior

In [29]:
ipfs.reset_data()

import time
import math

# Testing parameters
NODE_NUM = 100
URL = "example.com"
N = 10
i = 0
num_links = max(1, math.floor(NODE_NUM / N))
temp_cids = []

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)
temp_cids.append(cid)

latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.get_linked_cids()


print(f"Node {0} created with Linked CIDs: {linked_cids}")

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)
temp_cids.append(cid)

print(f"Node {1} created with Linked CIDs: {linked_cids}")

# Automate adding the rest of the nodes
for i in range(2, NODE_NUM):
    content = "Node"+str(i)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)

    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    latest_node_linked_cids = latest_node.linked_cids
    linked_cids = []

    # Link to previous node 
    linked_cids.append(latest_node_linked_cids[0])
    linked_cids.append(latest_node_cid)

    if i % num_links == 0:
        linked_cids = temp_cids[:]
        temp_cids = []

    # Create the new node
    new_node = IPARO(cid=cid, linked_cids=linked_cids,
                     content=content, timestamp=timestamp)
    ipfs.store(cid, new_node)
    ipns.update(URL, cid)
    temp_cids.append(cid)

    print(f"Node {i} created with Linked CIDs: {linked_cids}")

# Final output
get_op_counts() 

Node 0 created with Linked CIDs: []
Node 1 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da']
Node 2 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm86e1fd7268217acbb34096e13f3ce7ba20']
Node 3 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm4f99bcdeea6c1dbde659b4106aaafe45fd']
Node 4 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm2bc08578405d9704dac1ec6cf6745256db']
Node 5 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm6911d673b184ac20efc254d06b6caf35f7']
Node 6 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm84ffbf9ac9c5419dda55c48e8d06f28ca4']
Node 7 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm5f4eb738ceb9d803680d3bb91fbd7ba955']
Node 8 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm0292e483dc927899733e00cf3e82008581']
Node 9 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qma44417fee31c7a3

In [27]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node{node_num}"
print(f"Looking for node with content: {target_content}")

# Retrieve the starting node
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.linked_cids

# Traverse the graph using a set to avoid revisiting nodes
visited_cids = set()
found = False

while linked_cids and not found:
    next_cids = []
    for cid in linked_cids:
        if cid in visited_cids:
            continue  # Skip already visited nodes
        visited_cids.add(cid)
        node = ipfs.retrieve(cid)
        print(f"Visiting CID: {cid}, Content: {node.content}")  # Debug log
        if node.content == target_content:
            found = True
            break
        # Add new links to the next search queue
        next_cids.extend(node.linked_cids)
    linked_cids = next_cids

if found:
    print(f"Found node: {node}")
else:
    print(f"Can't find node with content: {target_content}")
    all_contents = [ipfs.retrieve(cid).content for cid in visited_cids]
    print(f"Available contents: {all_contents}")

# Output the operation counts
get_op_counts()


Looking for node with content: Node49
Can't find node with content: Node49
Available contents: []
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 1}


### Other strategies to be tested