# Simulating IPFS and IPNS Systems

This notebook provides a Python-based simulation of IPFS (InterPlanetary File System) and IPNS (InterPlanetary Naming System) to test various linking strategies for storing and retrieving IPAROs.

The notebook uses three classes to simulate these systems:
- **IPARO**: Represents the storage object on IPFS.
- **IPNS**: Keeps track of the latest capture for different websites.
- **IPFS**: Simulates the hashing, storage, and retrieval of IPARO objects.

The goal of the simulation is to test various linking strategies.

In [86]:
# Importing the necessary libraries
from abc import ABC, abstractmethod
from dataclasses import dataclass
from datetime import datetime
from enum import Enum
from typing import Optional
import hashlib
import random
import time

## IPARO Object

**Properties:**
- `URL`: The capture URL.
- `Timestamp`: The timestamp of the capture.
- `Content`: The contents of the IPARO.

**Functions:**
- `__str__`: Returns a string representation of the IPARO object.

In [3]:
@dataclass(frozen=True)
class IPARO:
    url: str
    content: bytes
    timestamp: datetime

    def __str__(self):
        """
        Returns a string representation of the IPARO object.

        Returns:
            str: A string containing the URL and content of the IPARO.
        """
        iparo = {
            "URL": self.url,
            "Content": self.content,
            "Timestamp": self.timestamp,
        }
        return str(iparo)


## The IPAROLink Object

**Description:**

- The IPAROLink class stores a link object, which will define links between two IPARO objects.
- The source CID is used to lookup the IPARO objects, so only the target CIDs are recorded in the IPAROLink object.

**Properties:**

- `seq_num`: The sequence number of the link.
- `timestamp`: The timestamp of the link.
- `cid`: The CID of the link.

In [4]:
@dataclass(frozen=True)
class IPAROLink:
    """
    Defines the IPARO Link class that has a sequence number, a datetime, and a CID.
    """
    seq_num: int
    timestamp: datetime
    cid: str


## The IPAROLinkCollection Object

**Description:**

- The IPAROLinkCollection class stores all the links from a source node.
- The source CID is used to lookup the IPARO objects, so only the target CIDs are recorded in the IPAROLink object.
- The previous link has the maximum sequence number of all the links.
    - This property must hold true since all strategies access the previous CID.
    - The current node and all nodes afterwards have not been hashed before the creation of the link collection so it isn't possible for any node to have a higher sequence number.

**Properties:**

- `links`: The list of all links coming from the source IPARO.
- `previous`: The link to the previous CID, which is the linked CID with the maximum sequence number.

**Methods:**
- `__str__`, `__repr__`: This should return the list of `IPAROLink` objects.
- `__len__`: This should return the total number of links in the collection.

In [64]:
class IPAROLinkCollection:
    """
    A helper class for linking IPAROs together. The two attributes are ``links`` (gets the links to other IPAROs)
    and  ``previous`` (the previous IPAROLink).
    """

    def __init__(self, links: list[IPAROLink]):
        """
        Initializes the collection of links on the IPARO object.

        Args:
            iparo (IPARO): The IPARO object to be linked.
            links (list[IPAROLink]): The list of links to target IPAROs.
        """
        self.links: list[IPAROLink] = links

        # We are guaranteed that the previous node is in the list of links, and so would have the largest sequence number
        self.previous: IPAROLink | None = max(links, key=lambda link: link.seq_num) if len(links) > 0 else None

    def __str__(self):
        return str(self.links)

    def __repr__(self):
        return str(self)
        
    def __len__(self):
        return len(self.links)

## IPNS Object

**Description:**
- The IPNS class stores and maps the latest CID of a website.
- Tracks the number of operations (get and update) performed.

**Functions:**
- `update`: Updates the latest CID of a website.
- `get_cid`: Retrieves the CID of the latest capture for a website.
- `get_counts`: Returns the number of operations performed.
- `reset_counts`: Resets the counters for operations.
- `reset`: Resets the data.
- `get_version_counts`: Get number of versions of a URL.

In [92]:
class IPNS:

    def __init__(self):
        """
        Initialize the IPNS object with an empty hashmap for storing CIDs
        and counters for tracking operations.
        """
        self.data: dict[str, str] = {}
        self.version_counts: dict[str, int] = {}
        self.update_count = 0
        self.get_count = 0

    def update(self, url: str, cid: str):
        """
        Updates the latest CID for a given URL.

        Args:
            url (str): The URL of the website.
            cid (str): The CID of the latest capture.
        """
        self.update_count += 1
        self.data[url] = cid
        self.version_counts[url] = self.version_counts.setdefault(url, 0) + 1

    def get_cid(self, url: str) -> Optional[str]:
        """
        Retrieves the latest CID for a given URL if it exists, else None.

        Args:
            url (str): The URL of the website.

        Returns:
            str: The CID of the latest capture for the given URL if it exists, else None.
        """
        self.get_count += 1
        return self.data.get(url)

    def get_number_of_nodes(self, url: str) -> int:
        """
        Retrieves the number of nodes in a given URL.

        Args:
            url (str): The URL of the website.

        Returns:
            int: The number of nodes for a given URL, or zero if the mapping is not present.
        """
        return self.version_counts.get(url, 0)

    def get_counts(self):
        """
        Returns the number of update and get operations performed.

        Returns:
            dict: Dictionary with the counts of update and get operations.
        """
        return {"get": self.get_count, "update": self.update_count}

    def reset_data(self):
        """
        Resets the data.
        """
        self.data: dict[str, str] = {}
        self.version_counts: dict[str, int] = {}

    def reset_counts(self):
        """
        Resets the operating counts. Used for the evaluation phase.
        """
        self.update_count = 0
        self.get_count = 0

## Link Strategy Parameters

**Description:** The LinkStrategyParams class is used to hold the parameters of a linking strategy that are usually needed for the LinkStrategy to properly work, without having to duplicate the code every time we create a linking strategy.

**Properties:**
- `url`: The URL, provided at the constructor level.
- `latest_cid`: The latest CID, which is equal to the CID of the URL (note that this is different from the newly created IPARO object), if it exists.
- `node_num`: The number of nodes created for the URL before the newly created IPARO object.
- `latest_node`: The node with the latest CID. Used for accessing the timestamp.
- `link`: The link to the latest node, if it exists.

In [126]:
class LinkStrategyParams:
    """
    This class is designed to add parameters from the IPNS and the IPFS that are determined at runtime.
    """

    def __init__(self, url: str):
        self.url: str = url
        self.latest_cid: str = ipns.get_cid(url)
        self.node_num: int = ipns.get_number_of_nodes(url)
        self.latest_node: Optional[IPARO] = ipfs.retrieve(self.latest_cid)
        self.latest_node_links: list[IPAROLink] = ipfs.retrieve_links(self.latest_cid).links
        self.link: Optional[IPAROLink] = IPAROLink(seq_num=self.node_num - 1,
                                                   timestamp=self.latest_node.timestamp,
                                                   cid=self.latest_cid) if self.latest_node is not None else None

## Link Strategies
**Description:**
The LinkStrategy class encapsulates a linking strategy. This allows for more structure with regards to the approaches to evaluating them.

**Functions:**

- `get_linked_nodes`: Evaluate which IPARO nodes to link to, given the parameters for the linking strategy.

In [140]:
class LinkStrategy(ABC):

    @abstractmethod
    def get_linked_nodes(self, params: LinkStrategyParams) -> IPAROLinkCollection:
        pass

## IPFS Object

**Description:**
- The IPFS class stores the nodes and simulates the hashing, storage, and retrieval operations.
- Tracks the number of operations (hash, store, retrieve).

**Functions:**
- `hash`: Hashes the content of a node to generate its CID.
- `link`: Links a node with other nodes using a linking strategy.
- `store`: Stores a node with its CID.
- `retrieve`: Retrieves a node using its CID.
- `retrieve_links`: Retrieves the list of all links using the node's CID.
- `retrieve_by_number`: Retrieves a node with a URL and its sequence number.
- `retrieve_by_date`: Retrieves a node with a URL and its date, according to a given mode.
- `get_counts`: Returns the number of operations performed.
- `reset_counts`: Resets the counters for operations.

In [130]:
class IPFS:
    """
    The InterPlanetary File System is responsible for hashing, storing,
    retrieving, and linking IPARO objects.
    """

    def __init__(self):
        self.data: dict[str, IPARO] = {}
        self.links: dict[str, IPAROLinkCollection] = {}
        self.hash_count = 0
        self.retrieve_count = 0
        self.store_count = 0

    def hash(self, iparo: IPARO) -> str:
        """
        Hashes the IPARO to generate a CID.

        Args:
            iparo: IPARO: The IPARO object.

        Returns:
            str: The generated CID.
        """
        self.hash_count += 1
        iparo_string = str(iparo)
        sha256_hash = hashlib.sha256(iparo_string.encode()).hexdigest()
        return 'Qm' + sha256_hash[:34]

    def store(self, iparo: IPARO) -> str:
        """
        Stores a node with its CID.

        Args:
            iparo (IPARO): The IPARO object to store.

        Returns:
            The CID of the newly stored IPARO.
        """
        cid = self.hash(iparo)
        self.store_count += 1
        self.data[cid] = iparo
        return cid

    def link(self, cid: str, strategy: LinkStrategy, params: LinkStrategyParams) -> IPAROLinkCollection:
        """
        Links an IPARO with the specified CID to other IPAROs, according to the provided linking strategy.

        Args:
            cid (str): The CID of the IPARO object.
            strategy (LinkStrategy): The link strategy to use when linking to other IPAROs.
            params (LinkStrategyParams): The parameters for the linking strategy.

        Returns:
            IPAROLinkCollection: The links added for the newly created node.
        """
        links = strategy.get_linked_nodes(params)
        self.links[cid] = links
        return links

    def reset_data(self):
        """
        Resets the data for the IPFS.
        """
        self.data: dict[str, IPARO] = {}
        self.links: dict[str, IPAROLinkCollection] = {}

    def retrieve(self, cid) -> Optional[IPARO]:
        """
        Retrieves the IPARO object corresponding to a given CID, if it exists, otherwise, ``None``.
        """
        self.retrieve_count += 1
        return self.data.get(cid)

    def retrieve_links(self, cid: str) -> IPAROLinkCollection:
        """
        Retrieves the IPARO links corresponding to a given CID. Does not count as an IPARO operation for
        the purposes of evaluation.
        """
        return self.links.get(cid, IPAROLinkCollection([]))

    def retrieve_by_timestamp(self, url: str, target_timestamp: datetime, mode: Mode = Mode.LATEST_BEFORE) -> \
            Optional[str]:
        """
        Retrieves the IPARO versions of a given URL closest to a given datetime if at least one
        IPARO version is stored in the IPFS, otherwise ``None``. Default is the latest timestamp
        that occurs before the target timestamp, but ``Mode.EARLIEST_AFTER`` gives the node with
        the earliest time after a given timestamp, and ``Mode.CLOSEST`` gives the node with the
        closest timestamp to a given timestamp. If the distance from the two closest times to
        the target timetamp are equal, the CID with the earlier timestamp will be chosen.
        """
        self.retrieve_count += 1

        timestamps = sorted((iparo.timestamp, cid) for cid, iparo in self.data.items() if iparo.url == url)

        # Early exit
        if len(timestamps) == 0:
            return None

        timestamp: Optional[datetime]
        if mode == Mode.LATEST_BEFORE:
            timestamps = [t for t in timestamps if t[0] <= target_timestamp]
            _, cid = min(timestamps, default=None, key=lambda ts: target_timestamp - ts[0])
        elif mode == Mode.CLOSEST:
            # Find the key with the minimum difference to the target timestamps
            _, cid = min(timestamps, default=None, key=lambda ts: abs(ts[0] - target_timestamp))
        else:
            timestamps = [t for t in timestamps if t[0] >= target_timestamp]
            _, cid = min(timestamps, default=None, key=lambda ts: ts[0] - target_timestamp)

        return cid

    def retrieve_by_number(self, url: str, number: int) -> str:
        """
        Retrieves the IPARO CID corresponding to a given sequence number and a URL.
        """
        cid = ipns.get_cid(url)
        num_nodes = ipns.get_number_of_nodes(url)
        self.retrieve_count += 1
        for i in range(number, num_nodes - 1):
            links = self.retrieve_links(cid)
            cid = links.previous.cid
        return cid

    def get_counts(self) -> dict:
        """
        Returns the number of hash, store, and retrieve operations performed.

        Returns:
            dict: Dictionary with counts of hash, store, and retrieve operations.
        """
        counts = {"hash": self.hash_count, "store": self.store_count,
                  "retrieve": self.retrieve_count}
        return counts

    def reset_counts(self):
        """
        Resets the operation counters.
        """
        self.hash_count = 0
        self.store_count = 0
        self.retrieve_count = 0

    def get_all_cids(self, url: str) -> tuple[list[str], list[IPARO]]:
        """
        Retrieves the list of all CIDs and IPAROs in the IPFS, corresponding to the given URL.
        The nodes are sorted from latest to earliest.
        """
        cids = []
        iparos = []
        cid = ipns.get_cid(url)
        while True:
            if cid is not None:
                cids.append(cid)
                iparo = self.retrieve(cid)
                if iparo is not None:
                    iparos.append(iparo)
            links = self.retrieve_links(cid)
            if links is None or links.previous is None:
                break
            cid = links.previous.cid

        return cids, iparos

## IPAROFactory

The purpose of the IPAROFactory class is to create IPARO nodes. In particular, we will have this method:

- `create_node`: A method that takes in two arguments and creates an IPARO object out of the URL and the content.

In [18]:
class IPAROFactory:
    @classmethod
    def create_node(cls, url: str, content: bytes) -> IPARO:
        """
        Creates an IPARO object.

        Args:
            url (str): The URL of the IPARO object.
            content (bytes): The contents of the IPARO object.
        """
        timestamp = datetime.now()
        iparo = IPARO(url=url, content=content, timestamp=timestamp)
        return iparo


## Initialization and Operation Tracking

Here, we initialize the IPFS and IPNS objects and define a helper function `get_op_counts()` to display the number of operations performed. Additionally, the `initialize()` operation sets up those objects and the `automate_node_creation()` function takes in a URL, a number of nodes, and a `LinkStrategy` and returns a list of IPARO objects to debug.


In [138]:
def initialize():
    '''
    Initializes the simulated IPFS and IPNS.

    (This function should be in a separate file, but for the sake of simplicity, it is included here.)
    '''
    global ipfs, ipns, iparo_factory
    ipfs = IPFS()
    ipns = IPNS()
    iparo_factory = IPAROFactory()

# # Initializing the simulated IPFS and IPNS
# ipfs = IPFS()
# ipns = IPNS()

def get_op_counts():
    '''
    Displays the number of operations performed by IPNS and IPFS.
    '''
    print("Number of operations IPNS performed:")
    print(ipns.get_counts())
    print("Number of operations IPFS performed:")
    print(ipfs.get_counts())

def setup():
    '''
    Sets up the testing environment for the different linking strategies.
    '''
    ipns.reset_data()
    ipns.reset_counts()
    ipfs.reset_data()
    ipfs.reset_counts()


def test_storage_strategy(strategy: LinkStrategy, url: str, node_num: int):
    setup()
    # Automate the creation of additional nodes
    nodes = []
    for i in range(100):
        content = f"Node {i}"
        iparo = IPAROFactory.create_node(URL, content)
        cid = ipfs.store(iparo)
        params = LinkStrategyParams(URL)
        link_col = ipfs.link(cid, strategy, params)
        ipns.update(URL, cid)
        time.sleep(0.01)
    get_op_counts()  # Output operation counts
    
initialize()

## Testing Different Linking Strategies

### 1. Linking to Only the Previous Node

In this test, each node will link only to the previous node in the chain. This strategy will be used to simulate a simple sequential storage system.


In [142]:
class SingleStrategy(LinkStrategy):

    def get_linked_nodes(self, params: LinkStrategyParams) -> IPAROLinkCollection:
        links = [params.link] if params.latest_node is not None else []
        return IPAROLinkCollection(links)

#### Storing Nodes

In [143]:
URL = "https://www.example.com/"
NODE_NUM = 100
test_storage_strategy(SingleStrategy(), URL, NODE_NUM)

Number of operations IPNS performed:
{'get': 100, 'update': 100}
Number of operations IPFS performed:
{'hash': 100, 'store': 100, 'retrieve': 100}


By adding a `LinkStrategy` class and a method to store all the nodes, we managed to cut down the number of lines of code to 3. How awesome is that!

#### Retrieving Nodes

The following section tests retrieval of nodes by simulating a random node search.


In [118]:
# Reset the operation counts
ipns.reset_counts()
ipfs.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Traverse back through the linked nodes to find the target
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
cid = latest_node_cid
links = ipfs.retrieve_links(cid).links
while node.content != node.target_content and len(links) > 0:
    links = ipfs.retrieve_links(cid).links
    node == 2


# Output the found node
print(f"Node: {node}")
get_op_counts()

Looking for node with content: Node 51
Node: {'URL': 'https://www.example.com/', 'Content': 'Node 99', 'Timestamp': datetime.datetime(2025, 2, 11, 20, 58, 19, 90924)}
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 1}


### 2. Linking to all previous nodes

In this test, each node will link to all the previous nodes in the chain.

In [145]:
class ComprehensiveStrategy(LinkStrategy):

    def get_linked_nodes(self, params: LinkStrategyParams) -> IPAROLinkCollection:
        cids, iparos = ipfs.get_all_cids(params.url)
        n = len(cids)
        return IPAROLinkCollection([IPAROLink(timestamp=iparo.timestamp,
                                              seq_num=n - i - 1,
                                              cid=cid) for i, (cid, iparo) in enumerate(zip(cids, iparos))])

#### Storing Nodes

In [148]:
# Resetting IPFS from the last test
ipfs.reset_data()

# Testing parameters
NODE_NUM = 100
URL = "example.com"

# Create and store the first node
content = "Node 0"
timestamp = time.time()
first_node = IPAROFactory.create_node(url=URL, content=content)
cid = ipfs.store(first_node)
ipns.update(URL, cid)

# Create and store the second node
content = "Node 1"
timestamp = time.time()
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipfs.link(cid, SingleStrategy())
ipns.update(URL, cid)

TypeError: IPARO.__init__() got an unexpected keyword argument 'cid'

From here, there are 2 ways of creating a new node for this linking strategy, since this is a simulation, no data corruption can happen but that might not be true in practice. When retrieving the latest node which should contain the CIDs of all the previous node, two scenarios can happen:
1. The data is intact and the CIDs in the list is "correct" (which we really can't know for sure) and we can just add it to the new node we're creating
2. The data is corrupt and one or more of the CIDs is wrong or unfinished, in which case we have to recheck every CID to rebuild a new list of linked CIDs (not to mention fixing all the corrupted nodes)

So, for the purpose of this simulation, we will perform a check for every CID in the linked CID list of an IPARO to simulate the worst case scenario every time

In [97]:
    def get_linked_nodes(self, params: LinkStrategyParams) -> IPAROLinkCollection:
        cids, iparos = ipfs.get_all_cids(params.url)
        n = len(cids)
        return IPAROLinkCollection([IPAROLink(timestamp=iparo.timestamp,
                                              seq_num=n - i - 1,
                                              cid=cid) for i, (cid, iparo) in enumerate(zip(cids, iparos))])

Number of operations IPNS performed:
{'get': 0, 'update': 100}
Number of operations IPFS performed:
{'hash': 100, 'store': 100, 'retrieve': 0}


In this worst case scenario, where we have to retrieve and verify every CIDs in the linked CIDs of an IPARO, the retrieve count goes to almost 5000 (if we're storing 100 nodes)\
Of course this trade off makes it really easy to navigate to all the nodes just from the latest nodes

#### Retrieving nodes

The following section tests retrieval of nodes by simulating a random node search.

In [11]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Traverse back through the linked nodes to find the target
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.linked_cids
for linked_cid in linked_cids:
    node = ipfs.retrieve(linked_cid)
    if node.content == target_content:
        break

# Output the found node
print(f"Found node: {node}")
get_op_counts()

Looking for node with content: Node 60
Found node: {'CID': 'Qm1da1654734bcd4c1916abecb53187078a0', 'Content': 'Node 60', 'Linked CID(s)': ['Qm87e94bd507c616d1722ff02dd064c75f07', 'Qm252912dc39ee661294bbeb314735696322', 'Qm85fff761585c14ca022e3cb5ef98bd5b53', 'Qmd819950d6e7f6ff0ae2cd8fbb2eb4f43fc', 'Qm028bf8e8b2db4872aec23e70769408f7de', 'Qmdbe64536d592c6d25aeb2ac0b33471bc34', 'Qme691d3791804c55ac5eb274500224cd399', 'Qm00ad1ba47942cacf117a518af3e7bab22c', 'Qm3d896f5e7c6e56b786f74729bc00b1757b', 'Qm4b3af7388608379f5159b11923e17d8f63', 'Qm49fec5db675584fea5647b927603d283ff', 'Qm2c0b84269832a02e83d101648515340bf4', 'Qm42d798dd864b6b8f28270e76341206f164', 'Qm4683fb47cc9a69f84b82122a60fff9200e', 'Qm3289b0cedae6d96b6aafbe38bfe672932b', 'Qm7aabef6d02221b0930a0ff9cf72dc3cbcb', 'Qm926210aeb5e482cdb64971669029ef9c4f', 'Qm304c8f8dabf8ea5a85e9cc5547d32a2373', 'Qm35cf86b224bb43b71c4cbd36c1ab2f08b8', 'Qm45e64c5fbb3cb9416fcdb9ee2062308faa', 'Qm2af12ba3fe8961a200f6787282780d1cd3', 'Qme466eab1165dde6833

### 3. Linking to previous and first node

In this test, each node will link to the previous node and the first node in the chain.

#### Storing Nodes

In [12]:
# Resetting IPFS from the last test
ipfs.reset_data()

# Testing parameters
NODE_NUM = 100
URL = "example.com"

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)

In [13]:
# To automate creating and adding the remaining nodes
for i in range(2, NODE_NUM):
    content = "Node "+str(i)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    linked_cids = []
    latest_node_linked_cids = latest_node.linked_cids
    linked_cids.append(latest_node_linked_cids[0])
    linked_cids.append(latest_node_cid)
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

# print(ipfs.get_data())
get_op_counts()  # Output operation counts

Number of operations IPNS performed:
{'get': 100, 'update': 100}
Number of operations IPFS performed:
{'hash': 100, 'store': 100, 'retrieve': 160}


#### Retrieving Nodes

In [14]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Check if the first node is the desired node then search the other nodes
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.linked_cids
first_node_cid = linked_cids[0]
first_node = ipfs.retrieve(first_node_cid)
if first_node.content == target_content:
    print(f"Found node: {node}")
else:
    while True:
        node = ipfs.retrieve(linked_cids[1])
        linked_cids = node.linked_cids
        if node.content == target_content:
            break

# Output the found node
print(f"Found node: {node}")
get_op_counts()

Looking for node with content: Node 43
Found node: {'CID': 'Qmc5a1d37f984ab18bb1ac81dab2965af48c', 'Content': 'Node 43', 'Linked CID(s)': ['Qm6d875a687a6fe84af3f690e499a362c675', 'Qmcd7ab5bb680959cb249e5f9ee094d940b3'], 'Timestamp': 1737586700.2084281}
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 58}


### 4. Linking to K-previous and first node

In this test, each node will link to K previous node and the first node in the chain.

#### Storing Nodes

In [15]:
# Resetting IPFS from the last test
ipfs.reset_data()

# Testing parameters
NODE_NUM = 100
URL = "example.com"
K = 5

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)

In [16]:
# To automate creating and adding the remaining nodes
for i in range(2, NODE_NUM):
    content = "Node "+str(i)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    latest_node_linked_cids = latest_node.linked_cids
    linked_cids = []
    linked_cids.append(latest_node_linked_cids[0])
    if len(latest_node_linked_cids) == K+1:
        linked_cids.extend(latest_node_linked_cids[2:])
    else:
        linked_cids.extend(latest_node_linked_cids[1:])
    linked_cids.append(latest_node_cid)
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

get_op_counts()  # Output operation counts

Number of operations IPNS performed:
{'get': 100, 'update': 100}
Number of operations IPFS performed:
{'hash': 100, 'store': 100, 'retrieve': 156}


#### Retrieving Nodes

In [17]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Check if the first node is the desired node then search the other nodes
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.linked_cids
first_node_cid = linked_cids[0]
first_node = ipfs.retrieve(first_node_cid)
if first_node.content == target_content:
    print(f"Found node: {node}")
else:
    # run through the rest of the linked cids list to check
    for cid in linked_cids:
        node = ipfs.retrieve(cid)
        if node.content == target_content:
            break
    # Get the linked cids of the node at index 1
    node = ipfs.retrieve(linked_cids[1])
    linked_cids = node.linked_cids
    while True:
        # Flag to determine if we should exit the while loop
        found = False

        # Run through the rest of the linked CIDs list to check
        for cid in linked_cids[1:]:
            node = ipfs.retrieve(cid)
            if node.content == target_content:
                found = True
                break

        if found:
            break  # Exit the while loop if the node was found

        if len(linked_cids) >= 2:
            node = ipfs.retrieve(linked_cids[1])
            linked_cids = node.linked_cids
        else:
            print('Can\'t find node')
            break

# Output the found node
print(f"Found node: {node}")
get_op_counts()

Looking for node with content: Node 61
Found node: {'CID': 'Qme8ee85530047974dd6fce81f94f7eb61e1', 'Content': 'Node 61', 'Linked CID(s)': ['Qm7002299cee28720f7f6610a5deef274c72', 'Qm035b891928734dcc4e958d10b72c64f155', 'Qme3404c45ce32b2cae3e52802048f287bd0', 'Qm7a116eda93546b04ed5b5da93206324156', 'Qmac3ec569cd2fe5b69f471e9dd4efe1ce0d', 'Qmaeb8a331a50728c73f59bff02269bde370'], 'Timestamp': 1737586700.223305}
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 48}


### 5. Linking to K-random and first node

In this test, each node will link to a random K previous node and the first node in the chain.

#### Storing Nodes

In [120]:
import random

# Resetting IPFS from the last test
ipfs.reset_data()

# Testing parameters
NODE_NUM = 30
URL = "example.com"
Kmin = 5
Kmax = 10

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)

# To automate creating and adding the remaining nodes
for i in range(2, NODE_NUM):
    content = "Node "+str(i)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    latest_node_linked_cids = latest_node.linked_cids
    linked_cids = []
    linked_cids.append(latest_node_linked_cids[0])
    K = random.randint(Kmin, Kmax)
    # Check if the number of linked CIDs is greater than K and add K-1 random linked CIDs
    if len(latest_node_linked_cids) > K:
        linked_cids.extend(random.sample(latest_node_linked_cids[1:], K-1))
    # If the number of linked CIDs is less than K add all the linked CIDs
    else:
        linked_cids.extend(latest_node_linked_cids[1:])
    linked_cids.append(latest_node_cid)
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    print("K = ", K+1)
    print("Length of linked_cids: ", len(linked_cids))
    print(node)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

get_op_counts()  # Output operation counts

TypeError: IPARO.__init__() got an unexpected keyword argument 'cid'

#### Retrieving nodes

In [19]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Check if the first node is the desired node then search the other nodes
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.linked_cids
first_node_cid = linked_cids[0]
first_node = ipfs.retrieve(first_node_cid)
if first_node.content == target_content:
    print(f"Found node: {node}")
else:
    # run through the rest of the linked cids list to check
    for cid in linked_cids:
        node = ipfs.retrieve(cid)
        if node.content == target_content:
            break
    # Get the linked cids of the node at index 1
    node = ipfs.retrieve(linked_cids[1])
    linked_cids = node.linked_cids
    while True:
        # Flag to determine if we should exit the while loop
        found = False

        # Run through the rest of the linked CIDs list to check
        for cid in linked_cids[1:]:
            node = ipfs.retrieve(cid)
            if node.content == target_content:
                found = True
                break

        if found:
            break  # Exit the while loop if the node was found

        if len(linked_cids) >= 2:
            node = ipfs.retrieve(linked_cids[1])
            linked_cids = node.linked_cids
        else:
            print('Can\'t find node')
            break

# Output the found node
print(f"Found node: {node}")
get_op_counts()

Looking for node with content: Node 9
Can't find node
Found node: {'CID': 'Qm1e263465f816c6b18846d14a7560ad48a9', 'Content': 'Node 1', 'Linked CID(s)': ['Qm58bc5abb4e0ae4b7b18942a800381e293d'], 'Timestamp': 1737586700.235092}
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 41}


### Linking to Sequential Exponential (Base K, K an integer)

#### Storing nodes

In [20]:
ipfs.reset_data()

# Testing parameters
NODE_NUM = 50
K = 2
URL = "example.com"

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)

# To automate creating and adding the remaining nodes
for i in range(2, NODE_NUM):
    content = "Node " + str(i)
    print(content)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)
    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    latest_node_linked_cids = latest_node.linked_cids
    linked_cids = []

    # Link to previous node FIRST
    linked_cids.append(latest_node_linked_cids[0])
    cids = []

    j = 1
    # Temp node = latest node CID inserted (before this one). Call that node K-1.
    if i == 4:
        pass
    temp_node = latest_node
    print(temp_node.content)
    done = False
    while True:
        # Theorem: For any m < len(temp_node_linked_cids), the (1+m)th-to-last position of the linked CIDs list will link
        # K^m nodes away from the node.
        # Base Case: The previous node always gets assigned the last position (m=0), so the base case holds.
        # Inducive Case: Suppose that the property holds for m=k. Then, we need to prove that it holds for m=k+1.
        # The start node is K^k nodes away from the most recent node. But if we use the kth position of the linked
        # CIDs, and travelled the kth link (K - 1) times, then the CID to be added to the linked CIDs (at the (k+1)th
        # position) is K^k + (K-1) * K^k = K*K^k = K^(k+1), which proves the inductive case.
        # Therefore, by the Principle of Mathematical Induction, this theorem holds.
        for _ in range(K - 1):
            temp_cid = temp_node.cid
            temp_node_linked_cids = temp_node.linked_cids
            done = j > len(temp_node_linked_cids)
            if done:
                break
            temp_node = ipfs.retrieve(temp_node_linked_cids[-j])
        if done:
            break
        print(temp_node.content)
        # Ensure no duplicate links
        if temp_node.cid not in linked_cids:
            cids.append(temp_node.cid)
        j += 1
    cids = list(reversed(cids))
    linked_cids.extend(cids)
    linked_cids.append(latest_node_cid)

    print("Length of linked_cids: ", len(linked_cids))
    node = IPARO(cid=cid, linked_cids=linked_cids,
                 content=content, timestamp=timestamp)
    ipfs.store(cid, node)
    ipns.update(URL, cid)

get_op_counts()  # Output operation counts

Node 2
Node 1
Node 0
Length of linked_cids:  2
Node 3
Node 2
Node 1
Length of linked_cids:  3
Node 4
Node 3
Node 2
Node 0
Length of linked_cids:  3
Node 5
Node 4
Node 3
Node 1
Length of linked_cids:  4
Node 6
Node 5
Node 4
Node 2
Length of linked_cids:  4
Node 7
Node 6
Node 5
Node 3
Node 0
Length of linked_cids:  4
Node 8
Node 7
Node 6
Node 4
Node 0
Length of linked_cids:  4
Node 9
Node 8
Node 7
Node 5
Node 1
Length of linked_cids:  5
Node 10
Node 9
Node 8
Node 6
Node 2
Length of linked_cids:  5
Node 11
Node 10
Node 9
Node 7
Node 3
Length of linked_cids:  5
Node 12
Node 11
Node 10
Node 8
Node 4
Length of linked_cids:  5
Node 13
Node 12
Node 11
Node 9
Node 5
Node 0
Length of linked_cids:  5
Node 14
Node 13
Node 12
Node 10
Node 6
Node 0
Length of linked_cids:  5
Node 15
Node 14
Node 13
Node 11
Node 7
Node 0
Length of linked_cids:  5
Node 16
Node 15
Node 14
Node 12
Node 8
Node 0
Length of linked_cids:  5
Node 17
Node 16
Node 15
Node 13
Node 9
Node 1
Length of linked_cids:  6
Node 18
Node 

In [21]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node {node_num}"
print(f"Looking for node with content: {target_content}")

# Check if the first node is the desired node then search the other nodes
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.linked_cids
first_node_cid = linked_cids[0]
first_node = ipfs.retrieve(first_node_cid)
if first_node.content == target_content:
    print(f"Found node: {node}")
else:
    # run through the rest of the linked cids list to check
    for cid in linked_cids:
        node = ipfs.retrieve(cid)
        if node.content == target_content:
            break
    # Get the linked cids of the node at index -1 (where the latest index is)
    node = ipfs.retrieve(linked_cids[-1])
    linked_cids = node.linked_cids
    while True:
        # Flag to determine if we should exit the while loop
        found = False

        # Run through the rest of the linked CIDs list to check
        for cid in linked_cids[1:]:
            node = ipfs.retrieve(cid)
            if node.content == target_content:
                found = True
                break

        if found:
            print(f"Found node: {node}")
            break  # Exit the while loop if the node was found

        if len(linked_cids) >= 2:
            node = ipfs.retrieve(linked_cids[-1])
            linked_cids = node.linked_cids
            # Output the found node
        else:
            print('Can\'t find node')
            break

get_op_counts()

Looking for node with content: Node 17
Found node: {'CID': 'Qm65f4ea60a75f02e024a3171aaac2bb503a', 'Content': 'Node 17', 'Linked CID(s)': ['Qmba97b515dd042cd5d4c14b9192e4e7cd97', 'Qm4cf86a052fd045c23675bd194af17ab635', 'Qmadd8d2e1a8e5e37d3e3e494406c7be63a7', 'Qmcb754b9cd080484784b2a8c4705fea6dbc', 'Qm54a5099d9a922b1a56bde5e09be246dbef', 'Qmc635416d8199a2ed2ab47c227e1bfa3024'], 'Timestamp': 1737586700.248884}
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 112}


### 7. Linking to sequentially uniform N-prior

In [29]:
ipfs.reset_data()

import time
import math

# Testing parameters
NODE_NUM = 100
URL = "example.com"
N = 10
i = 0
num_links = max(1, math.floor(NODE_NUM / N))
temp_cids = []

# Create and store the first node
content = "Node 0"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
first_node = IPARO(cid=cid, linked_cids=[],
                   content=content, timestamp=timestamp)
ipfs.store(cid, first_node)
ipns.update(URL, cid)
temp_cids.append(cid)

latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.get_linked_cids()


print(f"Node {0} created with Linked CIDs: {linked_cids}")

# Create and store the second node
content = "Node 1"
timestamp = time.time()
to_be_hashed = str({
    "content": content,
    "timestamp": timestamp
})
cid = ipfs.hash(to_be_hashed)
linked_cids = [ipns.get_cid(URL)]
second_node = IPARO(cid=cid, linked_cids=linked_cids,
                    content=content, timestamp=timestamp)
ipfs.store(cid, second_node)
ipns.update(URL, cid)
temp_cids.append(cid)

print(f"Node {1} created with Linked CIDs: {linked_cids}")

# Automate adding the rest of the nodes
for i in range(2, NODE_NUM):
    content = "Node"+str(i)
    timestamp = time.time()
    to_be_hashed = str({
        "content": content,
        "timestamp": timestamp
    })
    cid = ipfs.hash(to_be_hashed)

    latest_node_cid = ipns.get_cid(URL)
    latest_node = ipfs.retrieve(latest_node_cid)
    latest_node_linked_cids = latest_node.linked_cids
    linked_cids = []

    # Link to previous node 
    linked_cids.append(latest_node_linked_cids[0])
    linked_cids.append(latest_node_cid)

    if i % num_links == 0:
        linked_cids = temp_cids[:]
        temp_cids = []

    # Create the new node
    new_node = IPARO(cid=cid, linked_cids=linked_cids,
                     content=content, timestamp=timestamp)
    ipfs.store(cid, new_node)
    ipns.update(URL, cid)
    temp_cids.append(cid)

    print(f"Node {i} created with Linked CIDs: {linked_cids}")

# Final output
get_op_counts() 

Node 0 created with Linked CIDs: []
Node 1 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da']
Node 2 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm86e1fd7268217acbb34096e13f3ce7ba20']
Node 3 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm4f99bcdeea6c1dbde659b4106aaafe45fd']
Node 4 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm2bc08578405d9704dac1ec6cf6745256db']
Node 5 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm6911d673b184ac20efc254d06b6caf35f7']
Node 6 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm84ffbf9ac9c5419dda55c48e8d06f28ca4']
Node 7 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm5f4eb738ceb9d803680d3bb91fbd7ba955']
Node 8 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qm0292e483dc927899733e00cf3e82008581']
Node 9 created with Linked CIDs: ['Qm9d01eb602dad2fffbd3470cc582db1a8da', 'Qma44417fee31c7a3

In [27]:
# Reset the operation counts
ipfs.reset_counts()
ipns.reset_counts()

# Pick a random node to search for
node_num = random.randint(0, NODE_NUM - 1)
target_content = f"Node{node_num}"
print(f"Looking for node with content: {target_content}")

# Retrieve the starting node
latest_node_cid = ipns.get_cid(URL)
node = ipfs.retrieve(latest_node_cid)
linked_cids = node.linked_cids

# Traverse the graph using a set to avoid revisiting nodes
visited_cids = set()
found = False

while linked_cids and not found:
    next_cids = []
    for cid in linked_cids:
        if cid in visited_cids:
            continue  # Skip already visited nodes
        visited_cids.add(cid)
        node = ipfs.retrieve(cid)
        print(f"Visiting CID: {cid}, Content: {node.content}")  # Debug log
        if node.content == target_content:
            found = True
            break
        # Add new links to the next search queue
        next_cids.extend(node.linked_cids)
    linked_cids = next_cids

if found:
    print(f"Found node: {node}")
else:
    print(f"Can't find node with content: {target_content}")
    all_contents = [ipfs.retrieve(cid).content for cid in visited_cids]
    print(f"Available contents: {all_contents}")

# Output the operation counts
get_op_counts()


Looking for node with content: Node49
Can't find node with content: Node49
Available contents: []
Number of operations IPNS performed:
{'get': 1, 'update': 0}
Number of operations IPFS performed:
{'hash': 0, 'store': 0, 'retrieve': 1}


### Other strategies to be tested